有没有办法检查 C# 字符串的编码?
可能的重复:
确定 C# 中字符串的编码
我相信如果我创建一个字符串,它默认 UTF8,但是,如果字符串是在其他地方创建的,并且我想在处理它之前更加安全并检查它的编码是什么,我看不到任何使用字符串或 Encoding 类来做到这一点的简单方法。我是否遗漏了某些内容,或者 C# 字符串无论如何都始终是 UTF8?
Possible Duplicate:
Determine a string's encoding in C#
I believe if I create a string it defaults to UTF8, however if the string is created else where and I want to be extra safe before dealing with it and check what its encoding is I do not see any easy way to do that using the string or Encoding class. Am I missing something or is a C# string always UTF8 no matter what?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
C#(好吧,.NET)中的字符串实际上没有有编码...或者您可以将它们全部视为 UTF-16,因为它们是
char< 的序列/code> 值,它们是 UTF-16 代码单元。
然而,通常情况下,当您从字符串转换为二进制形式(例如,转换为套接字或转换为文件)时,您只需要关心编码。此时,您应该显式指定编码 - 字符串本身没有这个概念。
“默认”为 UTF-8 的唯一方面是,有大量 .NET API 被重载以接受或不接受编码,如果未指定编码,则使用 UTF-8。
File.ReadAllText
就是一个例子。但是,读取文件后,“从 UTF-8 文件读取的文本”和“从 Big5 文件读取的文本”等之间没有区别。Strings in C# (well, .NET) don't have encoding, effectively... or you can view them all as UTF-16, given that they're a sequence of
char
values, which are UTF-16 code units.Normally, however, you only need to care about encoding when you convert from a string to a binary form (e.g. down a socket or to a file). At that point, you should specify the encoding explicitly - the string itself has no concept of this.
The only aspect which "defaults" to UTF-8 is that there are plenty of .NET APIs which are overloaded to either accept an encoding or not, and if no encoding is specified, UTF-8 is used.
File.ReadAllText
is an example of this. However, after reading the file there's no distinction between "text which was read from a UTF-8 file" and "text which was read from a Big5 file" etc.