StreamReader无法正确读取扩展字符集(UTF8)

发布于 2024-11-19 10:35:29 字数 491 浏览 1 评论 0原文

我遇到一个问题,无法读取包含外来字符的文件。据我所知,该文件以 UTF-8 格式编码。

这是我的代码的核心:

using (FileStream fileStream = fileInfo.OpenRead())
{
    using (StreamReader reader = new StreamReader(fileStream, System.Text.Encoding.UTF8))
    {
        string line;

        while (!string.IsNullOrEmpty(line = reader.ReadLine()))
        {
            hashSet.Add(line);
        }
    }
}

该文件包含单词“achôcre”,但在调试过程中检查它时,它会将其添加为“achôcre”。

(这是一份脏话文件,所以如果您说法语,我深表歉意。我不知道这意味着什么)

I am having an issue where I am unable to read a file that contains foreign characters. The file, I have been told, is encoded in UTF-8 format.

Here is the core of my code:

using (FileStream fileStream = fileInfo.OpenRead())
{
    using (StreamReader reader = new StreamReader(fileStream, System.Text.Encoding.UTF8))
    {
        string line;

        while (!string.IsNullOrEmpty(line = reader.ReadLine()))
        {
            hashSet.Add(line);
        }
    }
}

The file contains the word "achôcre" but when examining it during debugging it is adding it as "ach�cre".

(This is a profanity file so I apologize if you speak French. I for one, have no idea what that means)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

手长情犹 2024-11-26 10:35:29

证据清楚地表明该文件不是 UTF-8 格式。尝试 System.Text.Encoding.Default 并查看是否获得正确的文本 - 如果是,则表明该文件采用 Windows-1252(假设这是您的系统默认代码页)。这种情况,我建议你用记事本打开该文件,然后重新“另存为”为UTF-8,然后就可以正常使用Encoding.UTF8了。

检查文件实际采用的编码的另一种方法是在浏览器中打开它。如果重音符号显示正确,则浏览器已检测到正确的字符集 - 因此请查看“查看/字符集”菜单以找出选择了哪一个。如果重音符号显示不正确,则通过该菜单更改字符集,直到正确显示为止。

The evidence clearly suggests that the file is not in UTF-8 format. Try System.Text.Encoding.Default and see if you get the correct text then — if you do, you know the file is in Windows-1252 (assuming that is your system default codepage). In that case, I recommend that you open the file in Notepad, then re-“Save As” it as UTF-8, and then you can use Encoding.UTF8 normally.

Another way to check what encoding the file is actually in is to open it in your browser. If the accents display correctly, then the browser has detected the correct character set — so look at the “View / Character set” menu to find out which one is selected. If the accents are not displaying correctly, then change the character set via that menu until they do.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文