什么代码页对“'ç'”进行编码？作为'?º' (0x3f 0xba)

发布于 2024-10-21 10:55:56 字数 616 浏览 15 评论 0原文

今天我收到了一位客户发来的文件，我必须阅读该文件，但其中包含奇怪的字符。使用已知的名称，我可以猜测一些字符的含义。

例如：

Realname  | Encoded as   | sign  | hex
----------|--------------|-------|-------
Françios  | Fran?ºios    | ç     | 3f ba
André     | Andr??       | é     | 3f 3f
Hélène    | H??l?¿ne     | è     | 3f bf
etc.

我尝试了所有代码页（.Net已知的）来导入文件，看看它们是否包含我知道的单词。但没有任何代码页能让我满意。
在Notepad++中打开文件认为它是ANSI，并且还显示不需要的字符。（但它有一个有用的十六进制编辑器插件）。
其他文件（来自同一用户和 zip 文件）以 UTF-8 编码。

我不能指望从我收到文件的那个人那里得到帮助。（使用谷歌翻译）他向我明确表示，他发现仅仅创建文件就非常困难，而且他正在使用我无法访问的软件（我相信 SAP）。

有没有其他方法可以找到他刚刚发送给我的文件的编码？

原文

Today I received a file from a customer that I have to read, but it contains strange characters. Using known names, I can guess the meaning of some characters.

For example:

Realname  | Encoded as   | sign  | hex
----------|--------------|-------|-------
Françios  | Fran?ºios    | ç     | 3f ba
André     | Andr??       | é     | 3f 3f
Hélène    | H??l?¿ne     | è     | 3f bf
etc.

I have tried all codepages (known to .Net) to import the file, and see if they contain the words I know. But no codepage gives me satisfaction.
Opening the file in Notepad++ thinks it is ANSI, and also shows the unwanted characters. (But it has a hex-editor plugin that is usefull).
Other files (from the same user & zipfile) are encoded in UTF-8.

From the guy I received the files from, I cannot expect help. (Using Google Translate) he made it clear to me that he found it very hard just to create the files, and he is using software (I believe SAP) that I do not have access to.

Is there any other way I can find the encoding of the files he just send to me?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

御守 2024-10-28 10:55:56

如果我采用 UTF-8 编码文本，假装它是 CP850，然后就可以得到这些结果将其转换为 Latin-1，Windows-1252 或类似的编码。这 ”？”来自以下事实：0xc3 处的 CP850 字符是“├”，该字符在 Latin-1 或派生编码中不存在，因此转换将其替换为“？”。

编辑：我使用 iconv 和 CP437、CP862 或 CP865 比 CP850 更匹配。既然你问了，我这次用的一句台词是：

for enc in `iconv -l`; do echo -n "$enc: "; echo -n "ç é è" | iconv -s -f $enc -t "LATIN1//TRANSLIT" 2>/dev/null; echo; done

I can get those results if I take UTF-8 encoded text, pretend it is CP850, and then convert it to Latin-1, Windows-1252, or a similar encoding. The "?" comes from the fact that the CP850 character at 0xc3 is "├", which doesn't exist in Latin-1 or derived encodings, so the conversion replaces it with a "?".

Edit: I did a bit wider of a search using iconv, and CP437, CP862, or CP865 are better matches than CP850. Since you asked, the one-liner I used this time was:

for enc in `iconv -l`; do echo -n "$enc: "; echo -n "ç é è" | iconv -s -f $enc -t "LATIN1//TRANSLIT" 2>/dev/null; echo; done

回复收藏 0 原文

究竟谁懂我的在乎 2024-10-28 10:55:56

它应该是 UTF-8 或 UTF-16。
它们包含几乎所有常规字符。
看来您有解码/编码问题。

notepad++ 它可能会感到困惑，因为您的文件不使用字节顺序标记。

你如何处理你的文件？

尝试将它们读取为二进制，然后尝试不同的编码来获取字符串。
如果您不将它们读取为二进制，则可能会发生默认编码。

这 ”？”是一个迹象。

可能会有所帮助。

回复收藏 0 原文

~没有更多了~

关于作者

漫雪独思

暂无简介

文章

28 人气

关注发私信

友情链接

文江博客

什么代码页对“'ç'”进行编码？作为'?º' (0x3f 0xba)

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

alipaysp_snBf0MSZIv

梦断已成空

瞎闹

凯凯我们等你回来

寄意

似梦非梦

友情链接

什么代码页对“'ç'”进行编码？作为'?º' (0x3f 0xba)

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

alipaysp_snBf0MSZIv

梦断已成空

瞎闹

凯凯我们等你回来

寄意

似梦非梦

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。