转换扩展 ASCII/ANSI 值
我有一个程序,它输入文本并使用许多函数对其进行排序,并且无论格式如何,文本都应该可读,但是,当导入保存为扩展 ASCII 编码的文件时,任何超过 127 个字符都将被忽略。环顾四周,我似乎不知道如何克服这个问题。这些文件以 UTF-8 和 Unicode 格式可以正常读取。我尝试将字符串转换为 UTF-8,但有问题的字母仍然只是以问号形状出现。我可以看到这些值是正确的:û 为 0xBF,但它们没有被解释为值。
任何人都可以在这里帮助我,我以前没有做过很多此类事情。我正在用 C# 工作,如果有帮助的话。
我当前的转换代码如下所示:
System.Text.UTF8Encoding u = new System.Text.UTF8Encoding();
byte[] asciiBytes = Encoding.UTF8.GetBytes(sd);
sd = u.GetString(asciiBytes);
其中 sd
是字符串。当我导入此字符串时,我没有指定文本编码:
string input = File.ReadAllText(fname);
...
parser(input);
I have a program that inputs text and sorts through it using a number of functions and the text should be readable regardless of the format, however, when a file saved to the Extended ASCII encoding is imported, any characters over 127 are ignored. Looking around, I can't seem to see how to overcome this. The files are read fine in UTF-8 and Unicode. I've tried converting the strings to UTF-8, but the letters in question still just come up as question-mark like shapes instead. I can see that the values are correct: 0xBF for û, but they aren't being interpreted as value.
Can anyone help me here, I've not done lots of work with this sort of thing before. I'm working in C# if that helps.
My current code for converting looks like this:
System.Text.UTF8Encoding u = new System.Text.UTF8Encoding();
byte[] asciiBytes = Encoding.UTF8.GetBytes(sd);
sd = u.GetString(asciiBytes);
Where sd
is the string. When I import this string, I do not specify the text encoding:
string input = File.ReadAllText(fname);
...
parser(input);
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这不是 û 的 utf-8 编码,而是一个两字节序列,0xC3 + 0xBB。显然你猜错了文件编码。 Windows 代码页 1252 中该字符的编码(在西欧和美洲常见)是 0xFB。在您的居住国英国也很常见。你把数字颠倒了吗?
请改用 Encoding.Default。
That is not the utf-8 encoding for û, that would be a two byte sequence, 0xC3 + 0xBB. Clearly you guessed the file encoding wrong. The encoding for that character in Windows code page 1252, common in Western Europe and the Americas is 0xFB. Common in the UK as well, your country of residence. Did you reverse the digits?
Use Encoding.Default instead.