当前位置：文江博客话题详情

从随机/垃圾 ASCII 中恢复原始 UTF8/汉字/中文文本

发布于 2024-12-13 07:03:37 字数 606 浏览 1 评论 0原文

我知道这可能不可能，但无论如何我想尝试一下。

所以我有一些数据作为 html 表单提交的结果。用户最初在某些字段中输入汉字。但我得到的只是随机的 ASCII 字母，如下所示：

我的世界

修复了编码问题（以便新的表单提交可以很好地处理 utf8），但想看看是否可以恢复修复之前的旧数据（正确的汉字字母）。

感谢您的帮助。

更新：

我猜需要一些澄清。正如我所说，我已经已经修复了 html 表单的编码问题。实际的问题是是否可以从我已经收到的“垃圾”数据中恢复原始汉字。

例如，我试图对以下内容进行“逆向工程”：

ôÃ¼ÒýR
å¼µå¥éºŸ
å†‰æ¦†å¹³
·¨¶vÚ¬

每一行都应该是某人的汉字或中文名字。我尝试了所有合理的编码，例如 GBK、gb18030 和 Big5-HKSCS。到目前为止还没有运气。

最后更新：

现在在 BIG5 编码方面运气不错。它并不适用于所有垃圾数据，但适用于大约 2/3 的垃圾数据。

原文

I know this may not be possible but wanna give it a shot anyway.

So I have some data as results of html form submissions. Users originally typed in Kanji in some of the fields. But all I got were random ascii letters like this:

æŽå°çŽ²

I already fixed the encoding issue (so that new form submissions handle utf8 fine) but would like to see if I can recover the old data (the correct kanji letters) from before the fix.

Thanks for the help.

UPDATE:

Guess a little clarification is needed. As I said, I have already fixed the encoding problem for the html form. The actual question is whether or not one can recover the original kanji from the "garbage" data that I already received.

For example, I'm trying to "reverse-engineer" the following

ôÃ¼ÒýR
å¼µå¥éºŸ
å†‰æ¦†å¹³
·¨¶vÚ¬

Every line is supposed to be someone's name in Kanji or Chinese. I tried all the sensible encodings such as GBK, gb18030, and Big5-HKSCS. No luck so far.

Last UPDATE:

Having some luck with BIG5 encoding now. It didn't work for all the garbage data, but it worked for about 2/3 of them.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

‖放下 2024-12-20 07:03:37

使用字符集转换器-在线工具

输入编码应该是UTF8
对于输出编码，请尝试东方字符的所有合理编码。
记住选中第二个复选框。

大多数（如果不是全部）垃圾信件都应该被恢复。

回复收藏 0 原文

南街九尾狐 2024-12-20 07:03:37

这些字母不是 ASCII。 ASCII 字母没有任何类型的重音。

目前尚不清楚您如何读取这些数据 - 是来自文件、数据库还是其他东西？不管怎样，它可能已经是 UTF-8 格式了 - 所以你应该尝试使用该编码来读取它。您还没有告诉我们您正在使用什么平台，但您应该确保无论您使用什么，您都可以按数字找出您读过的 Unicode 字符> - 这比将值打印为字符要可靠得多。

回复收藏 0 原文