UTF-8 xml 文件显示乱码

发布于 2024-09-02 01:31:16 字数 444 浏览 8 评论 0原文

我有一个 UTF-8 编码的 xml 文件,该文件是从 Wordpress MySQL 数据库导出的。

虽然文件保存为 UTF-8,并且编码也是 UTF-8,但我得到的是乱码,而不是应该在其中的希伯来语文本,如下所示:

™×•×~ות

如何找到原始编码或字符集并将文本转换为正确的希伯来语?

PHP 的 mb_detect_encoding($str);返回 UTF-8

尝试了各种 php 编码函数,具有不同的设置和输入/输出字符集,但它们都只是打印看起来不同的乱码块,例如:

阿姨阿姨

��×שמ×

...有什么想法可以解决这个问题吗?

I have a UTF-8 encoded xml file, which was exported from a Wordpress MySQL database.

While the file is saved as UTF-8, and the encoding is UTF-8, I get gibberish instead of the Hebrew text that is supposed to be in there, which looks like this:

™×•×˜×•×ª

How can I find the original encoding or charset and convert the text into proper Hebrew?

PHP's mb_detect_encoding($str); returns UTF-8

Tried all sorts of php encoding functions, with different settings and input/output charsets, but they all just print different looking gibberish blocks, like:

ÃâÃËÃâ¢Ãâ¢ÃËÃ

and

�� ×שמ×

...Any Ideas how to go about this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

一腔孤↑勇 2024-09-09 01:31:16
function convert($str) {
    $hebrew = array("א", "ב", "ג", "ד", "ה", "ו", "ז", "ח", "ט", "י", "כ", "ל", "מ", "נ", "ס", "ע", "פ", "צ", "ק", "ר", "ש", "ת", "ך", "ם", "ן", "ף", "ץ");
    $gibberish = array("à", "á", "â", "ã", "ä", "å", "æ", "ç", "è", "é", "ë", "ì", "î", "ð", "ñ", "ò", "ô", "ö", "÷", "ø", "ù", "ú", "ê", "í", "ï", "ó", "õ");
    return str_replace($gibberish, $hebrew, $str);
}

$hebrew_string = convert(utf8_encode($gibberish_string));
function convert($str) {
    $hebrew = array("א", "ב", "ג", "ד", "ה", "ו", "ז", "ח", "ט", "י", "כ", "ל", "מ", "נ", "ס", "ע", "פ", "צ", "ק", "ר", "ש", "ת", "ך", "ם", "ן", "ף", "ץ");
    $gibberish = array("à", "á", "â", "ã", "ä", "å", "æ", "ç", "è", "é", "ë", "ì", "î", "ð", "ñ", "ò", "ô", "ö", "÷", "ø", "ù", "ú", "ê", "í", "ï", "ó", "õ");
    return str_replace($gibberish, $hebrew, $str);
}

$hebrew_string = convert(utf8_encode($gibberish_string));
苦妄 2024-09-09 01:31:16

如果您有权访问数据库,则可以通过将其导出为 latin1 并导入为 UTF8 来轻松修复它。正如此处建议的

In case you have access to the database, you can fix it easily by exporting it as latin1 and importing as UTF8. As it has been suggested here.

渡你暖光 2024-09-09 01:31:16

这与这个问题非常相似。

据我所知,这是一个损坏的 Unicode 字符串,其中每个 unicode 字符都被编码为两个 unicode 字符。

我想出的代码只是丢弃了空的高位字节并从中重建了原始字节数组。该代码只是一个示例,方法非常简单,但应该可以帮助您实现目标。

This is very similar to this question.

From what I could see, this is a mangled Unicode string, where each unicode character got encoded as two unicode characters.

The code I came up with simply discarded the empty high-order byte and reconstructed the original byte array from that. The code is only an example and is very simplistic in approach, but should help you get there.

贩梦商人 2024-09-09 01:31:16

看看你的 php 文件,也许它不是 utf-8,这就是你的 xml 查询返回这个不需要的字符串的原因。

take a look at your php file, maybe it isn't utf-8 and thats the reason why your xml query returns this unwanted string.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文