StreamReader 问题 - 未知文件编码(西方 ISO 88591)
从输入文件读取数据时,我注意到 StreamReader 没有读取 ¥ 符号。 Mozilla Firefox 显示输入文件类型为 Western (ISO-8859-1)。
在尝试了编码参数后,我发现它成功地适用于以下值:
System.Text.Encoding.GetEncoding(1252) // (western iso 88591)
System.Text.Encoding.Default
System.Text.Encoding.UTF7
现在我计划使用“默认”设置,但是我不太确定这是否是正确的决定。 现有代码没有使用任何编码,我担心我可能会破坏某些东西。
我对编码知之甚少(或者说一无所知)。 我该怎么办? 我使用 System.Text.Encoding.Default 的决定安全吗? 我应该要求用户以特定格式保存文件吗?
When reading data from the Input file I noticed that the ¥ symbom was not being read by the StreamReader. Mozilla Firefox showed the input file type as Western (ISO-8859-1).
After playing around with the encoding parameters I found it worked successfully for the following values:
System.Text.Encoding.GetEncoding(1252) // (western iso 88591)
System.Text.Encoding.Default
System.Text.Encoding.UTF7
Now I am planning on using the "Default" setting, however I am not very sure if this is the right decision. The existing code did not use any encoding and I am worried I might break something.
I know very little (OR rather nothing) about encoding. How do I go about this? Is my decision to use System.Text.Encoding.Default safe? Should I be asking the user to save the files in a particular format ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
代码页 1252 与 ISO-Latin-1 不太一样。 如果您需要 ISO-Latin-1,请使用
Encoding.GetEncoding(28591)
。 但是,我希望它们对于此代码点 (U+00A5) 是相同的。 UTF-7 完全不同(而且几乎不是您想要使用的)。Encoding.Default
是不安全的 - 在大多数情况下这是一个非常糟糕的主意。 它特定于您正在运行的特定计算机。 如果您将文件从一台计算机传输到另一台计算机,谁知道原始计算机使用的编码是什么?如果您知道您的文件采用 ISO-8859-1,则明确使用它。 是什么产生了这些文件? 如果它们只是由用户保存,那么它们保存在哪个程序中? 如果 UTF-8 是一种选择,那么这是一个很好的选择 - 部分原因是它可以处理整个 Unicode。
我有一篇关于 Unicode 的文章和另一篇关于调试 Unicode 问题,您可能会发现它很有用。
Code page 1252 isn't quite the same as ISO-Latin-1. If you want ISO-Latin-1, use
Encoding.GetEncoding(28591)
. However, I'd expect them to be the same for this code point (U+00A5). UTF-7 is completely different (and almost never what you want to use).Encoding.Default
is not safe - it's a really bad idea in most situations. It's specific to the particular computer you're running on. If you transfer a file from one computer to another, who knows what encoding the original computer was using?If you know that your file is in ISO-8859-1, then explicitly use that. What's producing these files? If they're just being saved by the user, what program are they being saved in? If UTF-8 is an option, that's a good one - partly because it can cope with the whole of Unicode.
I have an article on Unicode and another on debugging Unicode issues which you may find useful.
它可能没有明确指定编码,在这种情况下编码可能默认为Encoding.UTF8。
Encoding.Default 名称可能给人的印象是这是 StreamReader 等类使用的默认编码,但事实并非如此:正如 Jon Skeet 指出的,Encoding.Default 是操作系统当前 ANSI 代码页的编码。
我个人认为这使得属性名称 Encoding.Default 有些误导。
It may not have explicitly specified the encoding, in which case the encoding probably defaulted to Encoding.UTF8.
The name Encoding.Default might give the impression that this is the default encoding used by classes such as StreamReader, but this is not the case: As Jon Skeet pointed out, Encoding.Default is the encoding for the operating system's current ANSI code page.
Personally I think this makes the property name Encoding.Default somewhat misleading.
您是软件开发人员吗? 不要忘记阅读乔尔·斯波尔斯基(Joel Spolsky)的
每个软件开发人员绝对必须了解 Unicode 和字符集(没有任何借口) !)
Are you a software developer? do not forget to read Joel Spolsky's
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)