StreamReader无法正确读取扩展字符集(UTF8)
我遇到一个问题,无法读取包含外来字符的文件。据我所知,该文件以 UTF-8 格式编码。
这是我的代码的核心:
using (FileStream fileStream = fileInfo.OpenRead())
{
using (StreamReader reader = new StreamReader(fileStream, System.Text.Encoding.UTF8))
{
string line;
while (!string.IsNullOrEmpty(line = reader.ReadLine()))
{
hashSet.Add(line);
}
}
}
该文件包含单词“achôcre”,但在调试过程中检查它时,它会将其添加为“achôcre”。
(这是一份脏话文件,所以如果您说法语,我深表歉意。我不知道这意味着什么)
I am having an issue where I am unable to read a file that contains foreign characters. The file, I have been told, is encoded in UTF-8 format.
Here is the core of my code:
using (FileStream fileStream = fileInfo.OpenRead())
{
using (StreamReader reader = new StreamReader(fileStream, System.Text.Encoding.UTF8))
{
string line;
while (!string.IsNullOrEmpty(line = reader.ReadLine()))
{
hashSet.Add(line);
}
}
}
The file contains the word "achôcre" but when examining it during debugging it is adding it as "ach�cre".
(This is a profanity file so I apologize if you speak French. I for one, have no idea what that means)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
证据清楚地表明该文件不是 UTF-8 格式。尝试 System.Text.Encoding.Default 并查看是否获得正确的文本 - 如果是,则表明该文件采用 Windows-1252(假设这是您的系统默认代码页)。这种情况,我建议你用记事本打开该文件,然后重新“另存为”为UTF-8,然后就可以正常使用Encoding.UTF8了。
检查文件实际采用的编码的另一种方法是在浏览器中打开它。如果重音符号显示正确,则浏览器已检测到正确的字符集 - 因此请查看“查看/字符集”菜单以找出选择了哪一个。如果重音符号显示不正确,则通过该菜单更改字符集,直到正确显示为止。
The evidence clearly suggests that the file is not in UTF-8 format. Try
System.Text.Encoding.Default
and see if you get the correct text then — if you do, you know the file is in Windows-1252 (assuming that is your system default codepage). In that case, I recommend that you open the file in Notepad, then re-“Save As” it as UTF-8, and then you can use Encoding.UTF8 normally.Another way to check what encoding the file is actually in is to open it in your browser. If the accents display correctly, then the browser has detected the correct character set — so look at the “View / Character set” menu to find out which one is selected. If the accents are not displaying correctly, then change the character set via that menu until they do.