à ©和其他代码

发布于 2024-10-02 13:11:39 字数 72 浏览 8 评论 0原文

我得到了一个充满这些代码的文件,我想将其“翻译”为普通字符(我的意思是整个文件)。我该怎么做呢?

预先非常感谢您。

I got a file full of those codes, and I want to "translate" it into normal chars (a whole file, I mean). How can I do it?

Thank you very much in advance.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

腹黑女流氓 2024-10-09 13:11:39

看起来您最初有一个 UTF-8 文件,该文件已被解释为 8 位编码(例如 ISO-8859-15 )和实体编码。我这样说是因为序列 C3A9 看起来像是一个非常合理的 UTF-8 编码序列

您需要首先对其进行实体解码,然后您将再次获得 UTF-8 编码。然后,您可以使用 iconv 之类的内容转换为您选择的编码。

要完成您的示例:

  • Ã ©将被解码为二进制字节序列 0xC3A9
  • 0xC3A9 = 11000011 10101001
  • 第一个八位字节中的前导 110 告诉我们这可以解释为 UTF-8 两字节序列。由于第二个八位字节以 10 开头,因此我们正在寻找可以解释为 UTF-8 的内容。为此,我们采用第一个八位字节的最后 5 位,以及第二个八位字节的最后 6 位...
  • 因此,解释为 UTF8 它是 00011101001 = E9 = é (带有锐音的拉丁文小写字母 E

您提到想用 PHP 处理这个问题,这样的事情可能会为您做:

 //to load from a file, use
 //$file=file_get_contents("/path/to/filename.txt");
 //example below uses a literal string to demonstrate technique...

 $file="&Précédent is a French word";
 $utf8=html_entity_decode($file);
 $iso8859=utf8_decode($utf8);

 //$utf8 contains "Précédent is a French word" in UTF-8
 //$iso8859 contains "Précédent is a French word" in ISO-8859

Looks like you originally had a UTF-8 file which has been interpreted as an 8 bit encoding (e.g. ISO-8859-15) and entity-encoded. I say this because the sequence C3A9 looks like a pretty plausible UTF-8 encoding sequence.

You will need to first entity-decode it, then you'll have a UTF-8 encoding again. You could then use something like iconv to convert to an encoding of your choosing.

To work through your example:

  • Ã © would be decoded as the byte sequence 0xC3A9
  • 0xC3A9 = 11000011 10101001 in binary
  • the leading 110 in the first octet tells us this could be interpreted as a UTF-8 two byte sequence. As the second octet starts with 10, we're looking at something we can interpret as UTF-8. To do that, we take the last 5 bits of the first octet, and the last 6 bits of the second octet...
  • So, interpreted as UTF8 it's 00011101001 = E9 = é (LATIN SMALL LETTER E WITH ACUTE)

You mention wanting to handle this with PHP, something like this might do it for you:

 //to load from a file, use
 //$file=file_get_contents("/path/to/filename.txt");
 //example below uses a literal string to demonstrate technique...

 $file="&Précédent is a French word";
 $utf8=html_entity_decode($file);
 $iso8859=utf8_decode($utf8);

 //$utf8 contains "Précédent is a French word" in UTF-8
 //$iso8859 contains "Précédent is a French word" in ISO-8859
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文