如何找出我正在查看的代码页?

发布于 2024-07-10 12:16:48 字数 323 浏览 11 评论 0 原文

我有一个设备,其中包含一些有关如何发送文本的文档。 它使用 0x00-0x7F 发送“特殊”字符,例如重音字符、欧元符号……

我猜他们复制了现有的代码页并进行了一些更改,但我不知道如何找出最接近的代码页我的文档中的那个。

理论上,这应该很容易做到。 例如,它们将 Á 映射到 0x41,因此,如果我能找到某种方法来遍历所有代码页并找到在该位置上具有此字符的代码页,那么这将是小菜一碟。

然而,我在互联网上能找到的只是代码页转储的链接,就像我正在查看的那样,或者使用启发式方法读取文本并猜测最可能的代码页的软件。 当然有人已经使得查找人们正在查看的代码页成为可能吗?

I have a device with some documentation on how to send it text. It uses 0x00-0x7F to send 'special' characters like accented characters, euro signs, ...

I am guessing they copied an existing code page and made some changes, but I have no idea how to figure out what code page is closest to the one in my documentation.

In theory, this should be easy to do. For example, they map Á to 0x41, so if I could find some way to go through all code pages and find the ones that have this character on that position, it would be a piece of cake.

However, all I can find on the internet are links to code page dumps just like the one I'm looking at, or software that uses heuristics to read text and guess the most likely code page. Surely someone out there has made it possible to look up what code page one is looking at ?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

清风不识月 2024-07-17 12:16:48

如果它使用 0x000x7F 作为“特殊”字符,那么它如何对常规 ASCII 字符进行编码?

在大多数支持字符 Á 的字符集中,其代码点为 193 (0xC1)。 如果从中减去 128,则得到 65 (0x41)。 也许您的“代码页”只是 ISO-8859-1 或 windows-1252 等标准字符集之一的上半部分,其中高位设置为零而不是 1(即从每个 1 中减去 128)。

如果是这种情况,我希望找到一个可以设置的标志来告诉它下一组代码点是否应该使用“上”或“下”编码进行转换。 我不知道有任何系统使用该方案,但这是我对您描述的情况所能提供的最合理的解释。

If it uses 0x00 to 0x7F for the "special" characters, how does it encode the regular ASCII characters?

In most of the charsets that support the character Á, its codepoint is 193 (0xC1). If you subtract 128 from that, you get 65 (0x41). Maybe your "codepage" is just the upper half of one of the standard charsets like ISO-8859-1 or windows-1252, with the high-order bit set to zero instead of one (that is, subtracting 128 from each one).

If that's the case, I would expect to find a flag you can set to tell it whether the next bunch of codepoints should be converted using the "upper" or "lower" encoding. I don't know of any system that uses that scheme, but it's the most sensible explanation I can come with for the situation you describe.

红玫瑰 2024-07-17 12:16:48

如果没有附加信息,就无法自动检测代码页。 在显示层下面只是字节,并且所有字节都是平等的。 没有办法说“我是这个和那个代码页的 0x41”,只能说“我是 0x41。” 给我展示一下!”

There is no way to auto-detect the codepage without additional information. Below the display layer it’s just bytes and all bytes are created equal. There’s no way to say “I’m a 0x41 from this and that codepage”, there’s only “I’m 0x41. Display me!”

残月升风 2024-07-17 12:16:48

系统是什么字节序? 也许你正在翻转位顺序?

What endian is the system? Perhaps you're flipping bit orders?

坐在坟头思考人生 2024-07-17 12:16:48

在大多数代码页中,0x41 只是普通的“A”,我不认为有任何标准代码页< /a> 在该位置有“Á”。 它可能在添加重音符号的 A 之前的某处有一个控制字符,或者使用非标准代码页。

我认为了解“最接近的代码页”没有任何用处,您只需要使用通过设备获得的文档即可。

你的最后一句话令人费解,“可以查找正在查看的代码页”是什么意思?

如果您包含整个代码页,那么这里的人们可能会更有帮助,并为您提供有关此问题的更多见解,拥有一个数据点 0x41=Á 并没有多大帮助。

In most codepages, 0x41 is just the normal "A", I don't think any standard codepages have "Á" in that position. It could have a control character somewhere before the A that added the accent, or uses a non-standard codepage.

I don't see any use in knowing the "closest codepage", you just need to use the docs you got with the device.

Your last sentence is puzzling, what do you mean by "possible to look up what code page one is looking at"?

If you include your whole codepage, people here on SO could be more helpful and give you more insight about this issue, having one data point 0x41=Á doesn't help much.

妄断弥空 2024-07-17 12:16:48

有点随机的想法,但如果您可以从设备上复制大量文本,您可以尝试通过 detect 函数来运行它。 feedparser.org/" rel="nofollow noreferrer">http://chardet.feedparser.org/

Somewhat random idea, but if you can get replicate a significant amount of the text off the device, you could try running it through something like the detect function in http://chardet.feedparser.org/.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文