如何修复unicode字母?

发布于 2024-12-10 03:36:24 字数 241 浏览 0 评论 0原文

有人在电子邮件中向我发送了这样的信件

IVIàRâ€â€™

正确应该是

IVIØR†€™

假设是 我如何用原始葡萄牙语表示它们,它在通过 HTTP GET 请求传递后发生了变化。

我可能无法修复该网站..但也许创建一个修复工具来修复这些损坏的编码字母?或者有人知道有什么修复工具吗?或者如何手动完成?似乎没有什么损失..只是解释得很糟糕

Someone in email sent me letters like this

IVIØR†€™

correct should be

IVIØR†€™

suppose to be
How do I represent them in their original Portuguese langauge, it got altered after being passed through HTTP GET request.

I probably will not be able to fix the site.. but maybe create a repair tool to repair these broken encoded letters? or anyone know of any repair tool? or how to do it manually by hand? Seems like nothing is lost.. just badly interpreted

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

故乡的云 2024-12-17 03:36:24

这里发生的事情是 UTF-8 被误解为 ISO-8859-1;然后其他类型的损坏(错误的 ISO-8859-1 字符串被重新编码为 UTF-8;不间断空格字符 '\xA0' 被转换为常规空格 '\x20')似乎随后发生了,尽管这些可能只是将其粘贴到 Stack Overflow 中的结果。

由于随后的损坏,没有真正好的方法可以完全撤消它,但是您可以通过不太严格的 UTF-8 解释器传递它很大程度上来撤消它。例如,如果我使用记事本将“IVIàR”作为文本文件保存在计算机上,使用“ANSI”(单字节)编码,然后在 Firefox 中打开它并告诉它将其解释为 UTF-8(Firefox > Web Developer > 字符编码 > Unicode (UTF-8)),然后显示“IVIØR� €™”。 (“�”是因为“\xA0”已更改为“\x20”,这破坏了 UTF-8 编码。)

What happened here is that UTF-8 got misinterpreted as ISO-8859-1; and then other kinds of mangling (the bad ISO-8859-1 string being re-UTF-8-encoded; the non-breaking space character '\xA0' being converted to regular space '\x20') seem to have happened afterward, though those may just be a result of pasting it into Stack Overflow.

Due to the subsequent mangling, there's no really good way to completely undo it, but you can largely undo it by passing it through a not-very-strict UTF-8 interpreter. For example, if I save "IVIØR†€™" as a text-file on my computer, using Notepad, with the "ANSI" (single-byte) encoding, and then I open it in Firefox and tell it to interpret it as UTF-8 (Firefox > Web Developer > Character Encoding > Unicode (UTF-8)), then it displays "IVIØR� €™". (The "�" is because of the '\xA0' having been changed to '\x20', which broke the UTF-8 encoding.)

凶凌 2024-12-17 03:36:24

它们可能没有坏掉。这只是它们发送的编码与您查看它们的解码之间的差异。

找出最初使用的编码,并使用相同的编码对其进行解码,它应该看起来像原始的。在编写“修复”工具方面,您始终需要知道它们最初是用什么编码创建的,这可能会很复杂,具体取决于源以及您是否有权访问所述信息。

They're probably not broken. It's just a difference between the encoding they were sent in, vs. the decoding you're viewing them in.

Figure out what encoding was originally used, and use the same one to decode it, and it should look like the original. In terms of writing a "fix-it" tool, you'd always need to know what encoding they were originally created in, which can be complicated depending on the source, and whether or not you have access to said information.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文