可以匹配此转换的字符编码是什么:来自“§”到“”xC7;”?
下面的行是我拥有的许多字符编码错误的文件之一的示例;
REAPRESENTA§AO VIA DTENTRY
正确的表述应该是这样的:
REAPRESENTAÇAO VIA DTENTRY
编码错误的字符较多。我该如何纠正这个问题?
The line bellow is as an example of one of many files with wrong character encoding that I have;
REAPRESENTA§AO VIA DTENTRY
The correct presentation should be this:
REAPRESENTAÇAO VIA DTENTRY
There's more characters with wrong encoding. How do I correct this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
文件本身没有错误的编码,只是当您读取文件时使用了错误的编码来解码它们。
更正方法是使用与编码文件相同的编码来解码该文件。
如果您不知道那是什么编码,则应该在解码之前找出有问题的字符的字节代码,并查找字符集的编码,其中字符代码与您想要的字符相匹配。
例如,可以使用 IBM905 对文件进行编码,以便将字符“Ç”编码为字节码 74。如果您随后使用 IBM278 对文件进行解码,则字节码 74 将被解释为字符“§”。
以下是我在内置编码中找到的可能组合的列表:
The files themselves doesn't have the wrong encoding, it's when you read the file that you use the wrong encoding to decode them.
The correction is to use the same encoding to decode the file that was used to encode it.
If you don't know what encoding that is, you should find out the byte code for the problematic characters before they are decoded, and look for an encoding with a character set where the character code matches the character that you want.
For example, the file could be encoded using IBM905 so that the character "Ç" is encoded into the byte code 74. If you then decode the file using IBM278, the byte code 74 is interpreted as the character "§".
Here is a list of the possible combinations that I found in the built in encodings: