C#:在控制台中字符显示不正常,为什么?

发布于 2024-07-07 03:57:05 字数 305 浏览 8 评论 0原文

下图解释了所有内容:

alt text http://img133.imageshack.us/img133/ 4206/accentar9.png

变量 textInput 来自 File.ReadAllText(path); 并且不会显示诸如 ' é è ... 之类的字符。 当我运行单元测试时,一切都很好! 我看到他们...为什么?

The picture below explains all:

alt text http://img133.imageshack.us/img133/4206/accentar9.png

The variable textInput comes from File.ReadAllText(path); and characters like : ' é è ... do not display. When I run my UnitTest, all is fine! I see them... Why?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

偏爱自由 2024-07-14 03:57:06

.NET 类(System.IO.StreamReader 等)采用 UTF-8 作为默认编码。 如果您想读取不同的编码,则必须将其显式传递给适当的构造函数重载。

另请注意,没有一种称为“ANSI”的编码。 您可能指的是 Windows 代码页 1252 又名“西欧”。 请注意,这与其他国家/地区的 Windows 默认编码不同。 当您尝试使用 System.Text.Encoding.Default 时,这是相关的,因为这实际上因系统而异。

/编辑:看来您误解了我的回答和评论:

  1. 代码中的问题是您需要告诉.NET您正在使用什么编码。
  2. 另一条评论说“ANSI”可能指的是不同的编码,与您的问题没有任何关系。 这只是为了防止误解而说的“顺便说一句”(好吧,这适得其反)。

所以,最后:您的问题的解决方案应该是以下代码:

string text = System.IO.File.ReadAllText("path", Encoding.GetEncoding(1252));

这里重要的部分是使用适当的 System.Text.Encoding 实例。

但是,这假设您的编码确实是 Windows-1252(但我相信这就是 Notepad++ 中“ANSI”的含义)。 我不知道为什么 NUnit 读取您的文本时会正确显示。 我认为 NUnit 要么具有某种文本编码自动发现功能,要么 NUnit 使用一些奇怪的默认值(即不是 UTF-8)。

哦,顺便说一句:“ANSI”实际上指的是“美国国家标准协会”。 有许多完全不同的标准,其名称中都包含“ANSI”。 例如,C++(除其他外)也是 ANSI 标准。

仅在某些情况下,它(不精确地)用于指代 Windows 编码。 但即使如此,正如我试图解释的那样,它通常也不是指特定的编码,而是指 Windows 用作不同国家/地区默认值的一类编码。 Windows-1252 就是其中之一。

The .NET classes (System.IO.StreamReader and the likes) take UTF-8 as the default encoding. If you want to read a different encoding you have to pass this explicitly to the appropriate constructor overload.

Also note that there's not one single encoding called “ANSI”. You're probably referring to the Windows codepage 1252 aka “Western European”. Notice that this is different from the Windows default encoding in other countries. This is relevant when you try to use System.Text.Encoding.Default because this actually differs from system to system.

/EDIT: It seems you misunderstood both my answer and my comment:

  1. The problem in your code is that you need to tell .NET what encoding you're using.
  2. The other remark, saying that “ANSI” may refer to different encodings, didn't have anything to do with your problem. It was just a “by the way” remark to prevent misunderstandings (well, that one backfired).

So, finally: The solution to your problem should be the following code:

string text = System.IO.File.ReadAllText("path", Encoding.GetEncoding(1252));

The important part here is the usage of an appropriate System.Text.Encoding instance.

However, this assumes that your encoding is indeed Windows-1252 (but I believe that's what Notepad++ means by “ANSI”). I have no idea why your text gets displayed correctly when read by NUnit. I suppose that NUnit either has some kind of autodiscovery for text encodings or that NUnit uses some weird defaults (i.e. not UTF-8).

Oh, and by the way: “ANSI” really refers to the “American National Standards Institute”. There are a lot of completely different standards that have “ANSI” as part of their names. For example, C++ is (among others) also an ANSI standard.

Only in some contexts it's (imprecisely) used to refer to the Windows encodings. But even there, as I've tried to explain, it usually doesn't refer to a specific encoding but rather to a class of encodings that Windows uses as defaults for different countries. One of these is Windows-1252.

初雪 2024-07-14 03:57:06

尝试使用 chcp 命令设置控制台会话的输出代码页。 Windows 支持的代码页位于此处此处,以及 此处。 请记住,从根本上讲,控制台非常简单:它通过使用代码页来确定将显示的字形来显示 UNCICODE 或 DBCS 字符。

Try setting your console sessin's output code page using the chcp command. The code pages supported by windows are here, here, and here. Remember, fundametnaly the console is pretty simple: it displays UNCICODE or DBCS characters by using a code page to dtermine the glyph that will be displayed.

你好,陌生人 2024-07-14 03:57:06

我不知道为什么它可以与 NUnit 一起使用,但我用 NotePad++ 打开该文件,并在格式中看到 ANSI。 现在我转换为 UTF-8 并且它可以工作。

我仍然想知道为什么它可以与 NUnit 一起使用而不是在控制台中? 但至少现在可以了。

更新
我不明白为什么我对这个问题和这个答案投了反对票,因为这个问题仍然很好,为什么在控制台中我无法读取 ANSI 文件,但在 NUNit 中我可以?

I do not know why It works with NUnit, but I open the file with NotePad++ and I see ANSI in the format. Now I converted to UTF-8 and it works.

I am still wondering why it was working with NUnit and not in the console? but at least it works now.

Update
I do not get why I get down voted on the question and in this answer because the question is still good, why in a Console I can't read an ANSI file but in NUNit I can?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文