如何从C#读取中文文本文件?

发布于 2024-07-09 21:25:49 字数 479 浏览 5 评论 0原文

如何使用 C# 读取中文文本文件,我当前的代码无法显示正确的字符:

try
{    
    using (StreamReader sr = new StreamReader(path,System.Text.Encoding.UTF8))
    {
        // This is an arbitrary size for this example.
        string c = null;

        while (sr.Peek() >= 0)
        {
            c = null;
            c = sr.ReadLine();
            Console.WriteLine(c);
        }
    }
}
catch (Exception e)
{
    Console.WriteLine("The process failed: {0}", e.ToString());
}

How can I read a Chinese text file using C#, my current code can't display the correct characters:

try
{    
    using (StreamReader sr = new StreamReader(path,System.Text.Encoding.UTF8))
    {
        // This is an arbitrary size for this example.
        string c = null;

        while (sr.Peek() >= 0)
        {
            c = null;
            c = sr.ReadLine();
            Console.WriteLine(c);
        }
    }
}
catch (Exception e)
{
    Console.WriteLine("The process failed: {0}", e.ToString());
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

花辞树 2024-07-16 21:25:49

您需要对文件使用正确的编码。 你知道那个编码是什么吗? 它可能是 UTF-16,又名 Encoding.Unicode,或者可能是 Big5 之类的东西。 实际上,您应该尝试确定而不是猜测。

正如leppie的回答所提到的,问题也可能是控制台的功能。 为了确定这一点,请将字符串的 Unicode 字符值转储为数字。 请参阅我的有关调试 unicode 问题的文章,了解更多信息以及转储字符串的内容。

我还会避免使用您当前用于逐行读取文件的代码。 相反,请使用类似以下内容:

using (StreamReader sr = new StreamReader(path, appropriateEncoding))
{
    string line;
    while ( (line = sr.ReadLine()) != null)
    {
        // ...
    }
}

调用 Peek() 要求流能够查找,这对于文件可能适用,但并非所有流都适用。 另请查看 File.ReadAllTextFile.ReadAllLines 如果这就是您想要做的 -它们是非常方便的实用方法。

You need to use the right encoding for the file. Do you know what that encoding is? It might be UTF-16, aka Encoding.Unicode, or possibly something like Big5. Really you should try to find out for sure instead of guessing though.

As leppie's answer mentioned, the problem might also be the capabilities of the console. To find out for sure, dump the string's Unicode character values out as numbers. See my article on debugging unicode issues for more information and a useful method for dumping the contents of a string.

I would also avoid using the code you're currently using for reading a file line by line. Instead, use something like:

using (StreamReader sr = new StreamReader(path, appropriateEncoding))
{
    string line;
    while ( (line = sr.ReadLine()) != null)
    {
        // ...
    }
}

Calling Peek() requires that the stream is capable of seeking, which may be true for files but not all streams. Also look into File.ReadAllText and File.ReadAllLines if that's what you want to do - they're very handy utility methods.

半岛未凉 2024-07-16 21:25:49

如果是简体中文,通常是 gb2312,对于繁体中文,通常是 Big5 :

// gb2312 (codepage 936) :
System.Text.Encoding.GetEncoding(936)

// Big5 (codepage 950) :
System.Text.Encoding.GetEncoding(950)

If it is simplified chinese usually it is gb2312 and for the traditionnal chinese it is usually the Big5 :

// gb2312 (codepage 936) :
System.Text.Encoding.GetEncoding(936)

// Big5 (codepage 950) :
System.Text.Encoding.GetEncoding(950)
可遇━不可求 2024-07-16 21:25:49

使用 Encoding.Unicode 代替。

我认为您需要更改控制台的 OutputEncoding 才能正确显示它。

Use Encoding.Unicode instead.

I think you need to change the OutputEncoding of the Console to display it correctly.

泅人 2024-07-16 21:25:49

我刚刚遇到了和你一样的问题,现在已经解决了。 我认为主要问题来自txt编辑器。 当您使用记事本将文本保存为.txt格式时,您可以在底部选择编码。 默认编码为 ANSI,不支持中文流读取(取决于您的计算机),而 Unicode 适用于中文文本。 我希望这对你有帮助:)

干杯,

罗纳德

I just encountered the same problem as yours and I solve it now. I think the main problem would be from txt editor. When you save text in .txt format using notepad, you can choose the encoding at the bottom. The default encoding is ANSI which does not support Chinese stream reading (depends on your computer) while Unicode works for Chinese text. I hope this will help you :)

Cheers,

Ronald

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文