ISO latin 1 字节转字符
如果我有一个 byte b
编码为 ISO Latin 1 (ISO 8859-1) 就足够了 char output = (char)b;
这似乎可行,但我不知道是否还有其他方法。
If i have a byte b
encoded as ISO Latin 1 (ISO 8859-1) is it enough to dochar output = (char)b;
This seems to work but I don't know if there is another way.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
直接转换似乎适用于这种特定的编码。但是,最佳实践是使用 Encoding.GetChars 方法来获得正确的转换。
A direct cast seems to work for this particular encoding. However, best practice would be to use the Encoding.GetChars method for proper conversion.
是的,这应该可以正常工作。如果您查看 8859-1 的 unicode 图表,就会发现一个8859-1 和 unicode 之间的一对一映射。这意味着您可以将其强制转换为 char。
然而,并非所有代码页都是如此,因此更强大的解决方案可能是一个好主意。
Yes, this should work fine. If you look at the unicode chart for 8859-1 there is a one-to-one mapping between 8859-1 and unicode. That means you can just cast it to char.
However this is not the case with all codepages so a more robust solution might be a good idea.
您可以使用
编码
类 - 特别是内置的编码。 ASCII
从字节数组中获取字符。特别是
GetChars
重载之一。You can use the
Encoding
class - in particular the built inEncoding.ASCII
to get chars from byte arrays.In particular, one of the
GetChars
overloads.我会使用
BitConverter
的ToChar
。请记住,首先,.NET 中的char
默认情况下是一个 2 字节值 - 像这样的简单转换(即使它有效,也可能有效)并不是最好的主意。I would use
BitConverter
'sToChar
. Remember that, for one, achar
in .NET is a 2-byte value by default - simple casting like that (even if it works, which it might) is not really the best idea.如果字节的值< 128,你很好。如果它 >=128,仅进行转换可能无法获得正确的角色。
ISO 代码页基本上都是 ASCII,主要区别在于将代码页值的上半部分(基本 ASCII 页面上的 IIRC 主要是在控制台应用程序中有用的线条艺术字符)替换为对代码页语言有用的字符。
然而,快速浏览一下 Unicode 代码页就会发现,Latin-1 补充占据了 80-FF 值 (128-255)。因此,在这个特定的实例中,您可能没问题,但如果出现了某些内容,例如西里尔字母 ISO 代码页,您将必须显式转换为 Unicode 字符。
If the value of the byte is < 128, you're fine. If it's >=128, just casting probably won't get you the right character.
The ISO codepages are basically all ASCII, with the key difference being replacing the upper half of the codepage values (which IIRC on the base ASCII page are mostly line-art characters useful in console apps) with characters useful to the language of the codepage.
HOWEVER, a quick look at the Unicode codepage says that the Latin-1 supplement occupies the 80-FF values (128-255). So IN THIS PARTICULAR INSTANCE, you're probably fine, but if something comes in with, for example, the Cyrillic ISO codepage, you'll have to explicitly transform to Unicode characters.
您可以使用
Encoding.Convert
。然后,您可以使用新的字节数组,而不必担心 Latin 1 是否会给您带来问题。
You can use
Encoding.Convert
.You can then work with the new byte array without worrying about whether Latin 1 will cause you problems.