当前位置：文江博客话题详情

Ã ©和其他代码

发布于 2024-10-02 13:11:39 字数 72 浏览 8 评论 0原文

我得到了一个充满这些代码的文件，我想将其“翻译”为普通字符（我的意思是整个文件）。我该怎么做呢？

预先非常感谢您。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

腹黑女流氓 2024-10-09 13:11:39

看起来您最初有一个 UTF-8 文件，该文件已被解释为 8 位编码（例如 ISO-8859-15 ）和实体编码。我这样说是因为序列 C3A9 看起来像是一个非常合理的 UTF-8 编码序列。

您需要首先对其进行实体解码，然后您将再次获得 UTF-8 编码。然后，您可以使用 iconv 之类的内容转换为您选择的编码。

要完成您的示例：

Ã ©将被解码为二进制字节序列 0xC3A9
0xC3A9 = 11000011 10101001
第一个八位字节中的前导 110 告诉我们这可以解释为 UTF-8 两字节序列。由于第二个八位字节以 10 开头，因此我们正在寻找可以解释为 UTF-8 的内容。为此，我们采用第一个八位字节的最后 5 位，以及第二个八位字节的最后 6 位...
因此，解释为 UTF8 它是 00011101001 = E9 = é (带有锐音的拉丁文小写字母 E）

您提到想用 PHP 处理这个问题，这样的事情可能会为您做：

 //to load from a file, use
 //$file=file_get_contents("/path/to/filename.txt");
 //example below uses a literal string to demonstrate technique...

 $file="&PrÃ©cÃ©dent is a French word";
 $utf8=html_entity_decode($file);
 $iso8859=utf8_decode($utf8);

 //$utf8 contains "Précédent is a French word" in UTF-8
 //$iso8859 contains "Précédent is a French word" in ISO-8859

Looks like you originally had a UTF-8 file which has been interpreted as an 8 bit encoding (e.g. ISO-8859-15) and entity-encoded. I say this because the sequence C3A9 looks like a pretty plausible UTF-8 encoding sequence.

You will need to first entity-decode it, then you'll have a UTF-8 encoding again. You could then use something like iconv to convert to an encoding of your choosing.

To work through your example:

Ã © would be decoded as the byte sequence 0xC3A9
0xC3A9 = 11000011 10101001 in binary
the leading 110 in the first octet tells us this could be interpreted as a UTF-8 two byte sequence. As the second octet starts with 10, we're looking at something we can interpret as UTF-8. To do that, we take the last 5 bits of the first octet, and the last 6 bits of the second octet...
So, interpreted as UTF8 it's 00011101001 = E9 = é (LATIN SMALL LETTER E WITH ACUTE)

You mention wanting to handle this with PHP, something like this might do it for you:

 //to load from a file, use
 //$file=file_get_contents("/path/to/filename.txt");
 //example below uses a literal string to demonstrate technique...

 $file="&PrÃ©cÃ©dent is a French word";
 $utf8=html_entity_decode($file);
 $iso8859=utf8_decode($utf8);

 //$utf8 contains "Précédent is a French word" in UTF-8
 //$iso8859 contains "Précédent is a French word" in ISO-8859

回复收藏 0 原文

~没有更多了~