UTF-8 xml 文件显示乱码
我有一个 UTF-8 编码的 xml 文件,该文件是从 Wordpress MySQL 数据库导出的。
虽然文件保存为 UTF-8,并且编码也是 UTF-8,但我得到的是乱码,而不是应该在其中的希伯来语文本,如下所示:
™×•×~ות
如何找到原始编码或字符集并将文本转换为正确的希伯来语?
PHP 的 mb_detect_encoding($str);返回 UTF-8
尝试了各种 php 编码函数,具有不同的设置和输入/输出字符集,但它们都只是打印看起来不同的乱码块,例如:
阿姨阿姨
��×שמ×
...有什么想法可以解决这个问题吗?
I have a UTF-8 encoded xml file, which was exported from a Wordpress MySQL database.
While the file is saved as UTF-8, and the encoding is UTF-8, I get gibberish instead of the Hebrew text that is supposed to be in there, which looks like this:
™×•×˜×•×ª
How can I find the original encoding or charset and convert the text into proper Hebrew?
PHP's mb_detect_encoding($str); returns UTF-8
Tried all sorts of php encoding functions, with different settings and input/output charsets, but they all just print different looking gibberish blocks, like:
ÃâÃËÃâ¢Ãâ¢ÃËÃ
and
�� ×שמ×
...Any Ideas how to go about this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
如果您有权访问数据库,则可以通过将其导出为 latin1 并导入为 UTF8 来轻松修复它。正如此处建议的。
In case you have access to the database, you can fix it easily by exporting it as latin1 and importing as UTF8. As it has been suggested here.
这与这个问题非常相似。
据我所知,这是一个损坏的 Unicode 字符串,其中每个 unicode 字符都被编码为两个 unicode 字符。
我想出的代码只是丢弃了空的高位字节并从中重建了原始字节数组。该代码只是一个示例,方法非常简单,但应该可以帮助您实现目标。
This is very similar to this question.
From what I could see, this is a mangled Unicode string, where each unicode character got encoded as two unicode characters.
The code I came up with simply discarded the empty high-order byte and reconstructed the original byte array from that. The code is only an example and is very simplistic in approach, but should help you get there.
看看你的 php 文件,也许它不是 utf-8,这就是你的 xml 查询返回这个不需要的字符串的原因。
take a look at your php file, maybe it isn't utf-8 and thats the reason why your xml query returns this unwanted string.