读取包含特殊字符(例如 é )的 ISO-8859 类型文件在 C++
我正在尝试读取一个以 ISO-8859(ansi) 编码的文件,它包含一些西欧字符,例如“é”。
当我尝试读取文件并输出结果时,所有特殊字符都显示为 �,而普通字母显示正确。
如果我将文件转换为 utf-8 格式,然后执行相同的工作,一切都会完美运行。
有谁有解决这个问题的想法吗?我尝试使用 wifstream 和 wstring 而不是 ifstream 和 string 但没有多大帮助。
这是我的示例代码:
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main()
{
ifstream myFS;
myFS.open("test.txt", ios::in);
string myString;
if(myFS.is_open()){
while(myFS >> myString)
cout << myString << endl;
}
myFS.close();
return 0;
}
test.txt(ISO-8859-15 格式)包含:
abcd éfg
结果:
abcd
�fg
任何建议将不胜感激。 先感谢您!
+)
忘了说我的系统环境了。
我正在使用 ubuntu 10.10(Maverick) 控制台和 g++ 版本 4.4.5
谢谢!
I'm trying to read a file which is encoded in ISO-8859(ansi), and it contains some west European characters such as "é".
When I try to read the file and output the result, all the special characters appear as �, whereas normal alphabets appear correctly.
If I convert the file to utf-8 format and then do the same job, everything works perfectly.
Does anyone have any idea to solve this problem? I tried to use wifstream and wstring instead of ifstream and string but didn't help much.
Here's my sample code:
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main()
{
ifstream myFS;
myFS.open("test.txt", ios::in);
string myString;
if(myFS.is_open()){
while(myFS >> myString)
cout << myString << endl;
}
myFS.close();
return 0;
}
test.txt (ISO-8859-15 format) contains:
abcd éfg
result:
abcd
�fg
Any advice will be appreciated.
Thank you in advance!
+)
forgot to mention my system environment.
I'm using ubuntu 10.10(Maverick) console with g++ ver 4.4.5
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您的控制台设置为使用 UTF-8,因此当您使用 cout 将 ISO-8859-15 格式的文件转储到控制台时,它会显示错误的字母。 ascii 代码 <128 的字母在两种编码中是相同的,这意味着所有这些字符都会正确显示在屏幕上。
程序的输出实际上是正确的,只是您的控制台未设置为正确显示输出。
我还建议在不全是 ascii 的文件上使用 ios::binary,否则稍后您可能会在其他平台上遇到问题。
Your console is set to use UTF-8, so when you just dump the file in ISO-8859-15 to the console using cout, it shows the wrong letters. Letters with ascii code <128 are the same in both encodings, which means all those characters will appear correctly on your screen.
The output from the program is actually correct, it's just your console that's not set to display the output correctly.
I'd also recommend using
ios::binary
on files that aren't all ascii, or you may have problems on other platforms later.