C# 将字符串从 UTF-8 转换为 ISO-8859-1 (Latin1) H
我用谷歌搜索了这个主题,并查看了每个答案,但我仍然不明白。
基本上我需要将 UTF-8 字符串转换为 ISO-8859-1,我使用以下代码执行此操作:
Encoding iso = Encoding.GetEncoding("ISO-8859-1");
Encoding utf8 = Encoding.UTF8;
string msg = iso.GetString(utf8.GetBytes(Message));
我的源字符串是
Message = "ÄäÖöÕõÜü"
但不幸的是我的结果字符串变成了
msg = "�ä�ö�õ�ü
我在这里做错了什么?
I have googled on this topic and I have looked at every answer, but I still don't get it.
Basically I need to convert UTF-8 string to ISO-8859-1 and I do it using following code:
Encoding iso = Encoding.GetEncoding("ISO-8859-1");
Encoding utf8 = Encoding.UTF8;
string msg = iso.GetString(utf8.GetBytes(Message));
My source string is
Message = "ÄäÖöÕõÜü"
But unfortunately my result string becomes
msg = "�ä�ö�õ�ü
What I'm doing wrong here?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
使用 Encoding.Convert 调整字节数组在尝试将其解码为目标编码之前。
Use Encoding.Convert to adjust the byte array before attempting to decode it into your destination encoding.
我认为你的问题是你假设表示 utf8 字符串的字节在解释为其他内容时将产生相同的字符串(iso-8859-1)。但事实并非如此。我建议您阅读 Joel spolsky 撰写的这篇优秀文章。
I think your problem is that you assume that the bytes that represent the utf8 string will result in the same string when interpreted as something else (iso-8859-1). And that is simply just not the case. I recommend that you read this excellent article by Joel spolsky.
试试这个:
Try this:
您需要首先修复字符串的来源。
.NET 中的字符串实际上只是 16 位 unicode 代码点、字符的数组,因此字符串不采用任何特定的编码。
当您获取该字符串并将其转换为一组字节时,编码就会发挥作用。
无论如何,如您所见,您使用一种字符集将字符串编码为字节数组,然后使用另一种字符集对其进行解码的方式将行不通。
您能否告诉我们更多有关原始字符串的来源以及您认为它编码错误的原因?
You need to fix the source of the string in the first place.
A string in .NET is actually just an array of 16-bit unicode code-points, characters, so a string isn't in any particular encoding.
It's when you take that string and convert it to a set of bytes that encoding comes into play.
In any case, the way you did it, encoded a string to a byte array with one character set, and then decoding it with another, will not work, as you see.
Can you tell us more about where that original string comes from, and why you think it has been encoded wrong?
看起来有点奇怪的代码。要从 Utf8 字节流中获取字符串,您需要做的是:
如果您需要将 iso-8859-1 字节流保存到某个地方,则只需使用:
之前的附加代码行:
Seems bit strange code. To get string from Utf8 byte stream all you need to do is:
If you need to save iso-8859-1 byte stream to somewhere then just use:
additional line of code for previous:
也许它可以帮助
将一个代码页转换为另一代码页:
用法:
输出:
Maybe it can help
Convert one codepage to another:
Usage:
Output:
首先,指定输入和输出编码(没有办法从txt文件中准确识别编码,您必须知道它......):
对于您可以放入GetEncoding中的每个“名称”,请参阅此处的MS表:
https://learn. microsoft.com/it-it/dotnet/api/system.text.encodinginfo.getencoding?view=net-8.0
然后将字符串从输入编码转换为输出编码。
First, specify the input and output encoding (There is no way to exactly identify an encoding from txt file, you must know it...):
For every "name" you can put inside the GetEncoding, refer to MS table here:
https://learn.microsoft.com/it-it/dotnet/api/system.text.encodinginfo.getencoding?view=net-8.0
Then convert the string from input encoding to output encoding.
刚刚使用了内森的解决方案,效果很好。我需要将 ISO-8859-1 转换为 Unicode:
Just used the Nathan's solution and it works fine. I needed to convert ISO-8859-1 to Unicode:
这是 ISO-8859-9 的示例;
Here is a sample for ISO-8859-9;