C# 中的字符串编码 - 奇怪的字符

发布于 2024-12-09 11:48:07 字数 370 浏览 0 评论 0原文

我有一个需要导入的文件。 问题是我对该文件中的很多字符都有问题。

例如,这些名称是错误的:

Björn (在文件中) - 应该是 Björn

á…ke (在文件中) - 应该是 < 不幸的

是,我无法使用正确的编码重新创建该文件。 还有很多字符是错误的(这些只是例子)。我无法对所有内容进行搜索和替换(如果没有包含所有转换的字典)。

我可以以某种方式解码字符串吗?

谢谢帕特里克

编辑: 只是我之前应该添加一些更多信息(我责怪我的疲劳)。 该文件是 .xlsx 文件。

I have a file that i need to import.
The problem is that I have problems with a lot of characters in that file.

For example these names are wrong:

Björn (in file) - Should be Björn

Ã…ke (in file) - Should be Åke

Unfortunately I can't recreate the file with the correct encoding.
Also there are a lot of characters that are wrong (these was just examples). I can't do a search and replace on all (if there isn't a dictionary with all conversions).

Can I decode the strings in some way?

thanks Patrik

Edit:
Just some more info that I should added before (I blame my tiredness).
The file is an .xlsx file.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

吾性傲以野 2024-12-16 11:48:07

我用 Notepad++ 调试了这个。我将正确的字符串复制到 Notepad++ 中。我使用编码 |转换为 UTF-8。然后我选择编码|编码为 ANSI。这具有将 UTF-8 字节解释为 ANSI 字节的效果。当我这样做时,我最终得到了和你一样的错误价值观。很明显,当您阅读您正在解释的文件时,它是 ANSI 而不是 UTF-8。

解决方案是您的文件已编码为 UTF-8。确保在读取该文件时将其解释为 UTF-8。我无法确切地告诉你如何做到这一点,因为你没有展示你是如何阅读文件的。

您的文件可能不包含字节顺序标记 (BOM)。如果是这样,则在读取文件时通过传递 来指定编码编码.UTF8

I debugged this with Notepad++. I copied the correct strings into Notepad++. I used Encoding | Convert to UTF-8. Then I selected Encoding | Encode as ANSI. This has the effect of interpreting the UTF-8 bytes as if they were ANSI. And when I did this I end up with the same erroneous values as you. So clearly when you read the file you are interpreting is as ANSI rather than UTF-8.

The solution then is that your file has been encoded as UTF-8. Make sure that the file is interpreted as UTF-8 when you read it. I can't tell you exactly how to do that since you didn't show how you were reading the file in the first place.

It's possible that your file does not contain a byte-order-mark (BOM). If so then specify the encoding when you read the file by passing Encoding.UTF8.

凡间太子 2024-12-16 11:48:07

我刚刚尝试了你的第一个示例,它看起来确实是 UTF-8。

目前尚不清楚您首先使用什么来查看该文件,但如果您使用理解 UTF-8 的文本编辑器加载它并告诉它这是一个 UTF-8 文件,它应该没问题。

当您使用 .NET 加载它时,您应该能够使用 File.OpenTextFile.ReadAllText 等 - 大多数 IO 处理 .NET 中的编码默认为 UTF-无论如何,8。

I've just tried your first example, and it definitely looks like that's UTF-8.

It's unclear what you're using to look at the file in the first place, but if you load it with a text editor which understands UTF-8 and tell it that it's a UTF-8 file, it should be fine.

When you load it with .NET, you should just be able to use File.OpenText, File.ReadAllText etc - most IO dealing with encodings in .NET defaults to UTF-8 anyway.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文