C# 中的字符串编码 - 奇怪的字符
我有一个需要导入的文件。 问题是我对该文件中的很多字符都有问题。
例如,这些名称是错误的:
Björn (在文件中) - 应该是 Björn
á…ke (在文件中) - 应该是 < 不幸的
是,我无法使用正确的编码重新创建该文件。 还有很多字符是错误的(这些只是例子)。我无法对所有内容进行搜索和替换(如果没有包含所有转换的字典)。
我可以以某种方式解码字符串吗?
谢谢帕特里克
编辑: 只是我之前应该添加一些更多信息(我责怪我的疲劳)。 该文件是 .xlsx 文件。
I have a file that i need to import.
The problem is that I have problems with a lot of characters in that file.
For example these names are wrong:
Björn (in file) - Should be Björn
Ã…ke (in file) - Should be Åke
Unfortunately I can't recreate the file with the correct encoding.
Also there are a lot of characters that are wrong (these was just examples). I can't do a search and replace on all (if there isn't a dictionary with all conversions).
Can I decode the strings in some way?
thanks Patrik
Edit:
Just some more info that I should added before (I blame my tiredness).
The file is an .xlsx file.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我用 Notepad++ 调试了这个。我将正确的字符串复制到 Notepad++ 中。我使用编码 |转换为 UTF-8。然后我选择编码|编码为 ANSI。这具有将 UTF-8 字节解释为 ANSI 字节的效果。当我这样做时,我最终得到了和你一样的错误价值观。很明显,当您阅读您正在解释的文件时,它是 ANSI 而不是 UTF-8。
解决方案是您的文件已编码为 UTF-8。确保在读取该文件时将其解释为 UTF-8。我无法确切地告诉你如何做到这一点,因为你没有展示你是如何阅读文件的。
您的文件可能不包含字节顺序标记 (BOM)。如果是这样,则在读取文件时通过传递 来指定编码
编码.UTF8
。I debugged this with Notepad++. I copied the correct strings into Notepad++. I used Encoding | Convert to UTF-8. Then I selected Encoding | Encode as ANSI. This has the effect of interpreting the UTF-8 bytes as if they were ANSI. And when I did this I end up with the same erroneous values as you. So clearly when you read the file you are interpreting is as ANSI rather than UTF-8.
The solution then is that your file has been encoded as UTF-8. Make sure that the file is interpreted as UTF-8 when you read it. I can't tell you exactly how to do that since you didn't show how you were reading the file in the first place.
It's possible that your file does not contain a byte-order-mark (BOM). If so then specify the encoding when you read the file by passing
Encoding.UTF8
.我刚刚尝试了你的第一个示例,它看起来确实是 UTF-8。
目前尚不清楚您首先使用什么来查看该文件,但如果您使用理解 UTF-8 的文本编辑器加载它并告诉它这是一个 UTF-8 文件,它应该没问题。
当您使用 .NET 加载它时,您应该能够使用
File.OpenText
、File.ReadAllText
等 - 大多数 IO 处理 .NET 中的编码默认为 UTF-无论如何,8。I've just tried your first example, and it definitely looks like that's UTF-8.
It's unclear what you're using to look at the file in the first place, but if you load it with a text editor which understands UTF-8 and tell it that it's a UTF-8 file, it should be fine.
When you load it with .NET, you should just be able to use
File.OpenText
,File.ReadAllText
etc - most IO dealing with encodings in .NET defaults to UTF-8 anyway.