通过 PHP 替换 Word 错误
内容人员一直在使用 Word 并将内容粘贴到旧的 unicode 系统中。我现在正在尝试使用UTF8。
但是,导入数据后,有些字符我无法删除。
我已尝试以下 stackoverflow 线程,但提供的函数均无法修复此字符串: http:// /snipplr.com/view.php?codeview&id=11171 / 如何在 PHP 中替换 Microsoft 编码的引号
字符串:Dan’s back for more!!
Content people have been using Word and pasting things into the old unicode system. I'm now trying to go UTF8.
However, upon importing the data there are characters I cannot get rid of.
I have tried the following stackoverflow thread and none of the functions provided fix this string: http://snipplr.com/view.php?codeview&id=11171 / How to replace Microsoft-encoded quotes in PHP
String: Danâ??s back for more!!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在这种情况下,我通常从从单词复制粘贴的字符串开始:
并且,我逐字节输出每个字节的十六进制代码:
这给出了如下输出:
然后,通过一些猜测、运气和反复试验,您会发现:
â
是一个适合两个字节的字符:0xc3 0xa2
0xe2 0x80 0x99
提示:当您没有两个特殊字符相互跟随时,会更容易;-)
之后,只需使用 str_replace 将正确的字节序列替换为另一个字符即可;例如,用普通引号替换特殊引号:
将为您提供:
In this kind of situation, I generally start with the string I have copy-pasted from word :
And, going byte-by-byte in it, I output the hexadecimal code of each byte :
Which gives an output such as this one :
Then, with a bit of guessing, luck, and trial-and-error, you'll find out that :
â
is a character that fits on two bytes :0xc3 0xa2
0xe2 0x80 0x99
Hint : it's easier when you don't have two special characters following each other ;-)
After that, it's only a matter of using str_replace to replace the correct sequence of bytes by another character ; for example, to replace the special-quote by a normal one :
Will give you :