使用 PHP 转换 Microsoft Word 特殊字符

发布于 2024-12-04 13:55:42 字数 598 浏览 0 评论 0原文

我正在尝试先转换用户粘贴的包含 MS Word 省略号和长破折号的 Word 文本,然后再进一步处理。

我在这里找到了一个旧的建议解决方案 http://www.codingforums .com/archive/index.php/t-47163.html ,但它对我不起作用。例如,替换省略号后,变量返回为空。以前从未见过这样的事情:

$src = "Long word dash – and weird Word ellipsis…";
$src = str_replace("‘", "'", $src);
$src = str_replace("’", "'", $src);
$src = str_replace("”", '"', $src);
$src = str_replace("“", '"', $src);
$src = str_replace("–", "-", $src);
$src = str_replace("…", "...", $src);
print $src;

有什么想法吗?

I am trying to convert Word text pasted by users that contain MS Word ellipsis and long dash before processing it further.

I found an old proposed solution here to the problem http://www.codingforums.com/archive/index.php/t-47163.html , but it does not work for me. After replacing the ellipsis for example , the variable comes back as empty. Never seen anything like this before:

$src = "Long word dash – and weird Word ellipsis…";
$src = str_replace("‘", "'", $src);
$src = str_replace("’", "'", $src);
$src = str_replace("”", '"', $src);
$src = str_replace("“", '"', $src);
$src = str_replace("–", "-", $src);
$src = str_replace("…", "...", $src);
print $src;

Any ideas?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

又爬满兰若 2024-12-11 13:55:43

对于在 PHP 中遇到菱形问号的人来说,这种替换 UTF-8 字符的方法比使用 chr 函数效果更好。

$search = [                 // www.fileformat.info/info/unicode/<NUM>/ <NUM> = 2018
                "\xC2\xAB",     // « (U+00AB) in UTF-8
                "\xC2\xBB",     // » (U+00BB) in UTF-8
                "\xE2\x80\x98", // ‘ (U+2018) in UTF-8
                "\xE2\x80\x99", // ’ (U+2019) in UTF-8
                "\xE2\x80\x9A", // ‚ (U+201A) in UTF-8
                "\xE2\x80\x9B", // ‛ (U+201B) in UTF-8
                "\xE2\x80\x9C", // “ (U+201C) in UTF-8
                "\xE2\x80\x9D", // ” (U+201D) in UTF-8
                "\xE2\x80\x9E", // „ (U+201E) in UTF-8
                "\xE2\x80\x9F", // ‟ (U+201F) in UTF-8
                "\xE2\x80\xB9", // ‹ (U+2039) in UTF-8
                "\xE2\x80\xBA", // › (U+203A) in UTF-8
                "\xE2\x80\x93", // – (U+2013) in UTF-8
                "\xE2\x80\x94", // — (U+2014) in UTF-8
                "\xE2\x80\xA6"  // … (U+2026) in UTF-8
    ];

    $replacements = [
                "<<", 
                ">>",
                "'",
                "'",
                "'",
                "'",
                '"',
                '"',
                '"',
                '"',
                "<",
                ">",
                "-",
                "-",
                "..."
    ];

    str_replace($search, $replacements, $string);

For anyone getting the diamond question mark in PHP, this method of replacing UTF-8 characters worked better than using the chr function.

$search = [                 // www.fileformat.info/info/unicode/<NUM>/ <NUM> = 2018
                "\xC2\xAB",     // « (U+00AB) in UTF-8
                "\xC2\xBB",     // » (U+00BB) in UTF-8
                "\xE2\x80\x98", // ‘ (U+2018) in UTF-8
                "\xE2\x80\x99", // ’ (U+2019) in UTF-8
                "\xE2\x80\x9A", // ‚ (U+201A) in UTF-8
                "\xE2\x80\x9B", // ‛ (U+201B) in UTF-8
                "\xE2\x80\x9C", // “ (U+201C) in UTF-8
                "\xE2\x80\x9D", // ” (U+201D) in UTF-8
                "\xE2\x80\x9E", // „ (U+201E) in UTF-8
                "\xE2\x80\x9F", // ‟ (U+201F) in UTF-8
                "\xE2\x80\xB9", // ‹ (U+2039) in UTF-8
                "\xE2\x80\xBA", // › (U+203A) in UTF-8
                "\xE2\x80\x93", // – (U+2013) in UTF-8
                "\xE2\x80\x94", // — (U+2014) in UTF-8
                "\xE2\x80\xA6"  // … (U+2026) in UTF-8
    ];

    $replacements = [
                "<<", 
                ">>",
                "'",
                "'",
                "'",
                "'",
                '"',
                '"',
                '"',
                '"',
                "<",
                ">",
                "-",
                "-",
                "..."
    ];

    str_replace($search, $replacements, $string);
铁憨憨 2024-12-11 13:55:43

唔。我使用此函数来清理复制到 RTE 中的文本。在这种情况下它可能起作用,也可能不起作用。它转换为 HTML 实体,但您可以调整它以仅转换为常规字符:

function convertFromCP1252($string)
{
    $search = array('&',
                    '<',
                    '>',
                    '"',
                    chr(212),
                    chr(213),
                    chr(210),
                    chr(211),
                    chr(209),
                    chr(208),
                    chr(201),
                    chr(145),
                    chr(146),
                    chr(147),
                    chr(148),
                    chr(151),
                    chr(150),
                    chr(133),
                    chr(194)
                );

     $replace = array(  '&',
                        '<',
                        '>',
                        '"',
                        '‘',
                        '’',
                        '“',
                        '”',
                        '–',
                        '—',
                        '…',
                        '‘',
                        '’',
                        '“',
                        '”',
                        '–',
                        '—',
                        '…',
                        ''
                    );

    return str_replace($search, $replace, $string);
}

Hmm. I use this function for sanitizing text copied into an RTE. It may or may not work in this case. It converts to HTML entities, but you could tweak it to just convert to regular characters:

function convertFromCP1252($string)
{
    $search = array('&',
                    '<',
                    '>',
                    '"',
                    chr(212),
                    chr(213),
                    chr(210),
                    chr(211),
                    chr(209),
                    chr(208),
                    chr(201),
                    chr(145),
                    chr(146),
                    chr(147),
                    chr(148),
                    chr(151),
                    chr(150),
                    chr(133),
                    chr(194)
                );

     $replace = array(  '&',
                        '<',
                        '>',
                        '"',
                        '‘',
                        '’',
                        '“',
                        '”',
                        '–',
                        '—',
                        '…',
                        '‘',
                        '’',
                        '“',
                        '”',
                        '–',
                        '—',
                        '…',
                        ''
                    );

    return str_replace($search, $replace, $string);
}
尤怨 2024-12-11 13:55:43

很好的解决方案。我复制并粘贴了它,并且没有出现任何问题。在进一步研究中,我添加了一些不在搜索和替换数组中的字符。为了找到 ASCII 字符 ID 号,我编写了一个 PHP 函数,它显示了 ASCII 字符号:

function stdump($s){

  for($i=0;$i<strlen($s);$i++){

    echo substr($s,$i,1) . "(" . ord(substr($s,$i,1)) . ")";

  }

  echo "<br/>";
}

显示该字符,在其旁边的括号中显示 ascii 号。像这样:

echo stdump("GPUs…");

产生:

G(71)P(80)U(85)s(115)â(226)€(128)|(166)

希望这会有所帮助。

——基思

Great solution. I copied and pasted it and it worked with out a problem. On further study, I added a few characters that were not in the search and replace array. In order to find the ASCII character id numbers, I wrote a PHP function which shows what the ASCII character number is:

function stdump($s){

  for($i=0;$i<strlen($s);$i++){

    echo substr($s,$i,1) . "(" . ord(substr($s,$i,1)) . ")";

  }

  echo "<br/>";
}

The character is display and next to it the ascii number is show in parenthesis. Like this:

echo stdump("GPUs…");

produces:

G(71)P(80)U(85)s(115)â(226)€(128)¦(166)

Hope this helps.

--Keith

潦草背影 2024-12-11 13:55:43

它对我有用:

$str=file_get_contents($file); 

$array=array("‘"=>"'","’"=>"'","”"=>'"',"“"=>'"',"–"=>"-","—"=>"-","–"=>"-","…"=>"...");

$str = strtr( $str,$array);

file_put_contents($file,$str);  

it works for me:

$str=file_get_contents($file); 

$array=array("‘"=>"'","’"=>"'","”"=>'"',"“"=>'"',"–"=>"-","—"=>"-","–"=>"-","…"=>"...");

$str = strtr( $str,$array);

file_put_contents($file,$str);  
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文