getline 和“奇怪的字符”的问题
我有一个奇怪的问题, 我使用
wifstream a("a.txt");
wstring line;
while (a.good()) //!a.eof() not helping
{
getline (a,line);
//...
wcout<<line<<endl;
}
它,对于像这样的txt文件来说效果很好 http://www.speedyshare.com/files/29833132/a.txt (抱歉,这个链接只有 80 个字节,所以如果换行符上的 ic/p 丢失的话,获取它应该不是问题) 但是当我添加例如水时(来自 http://en.wikipedia. org/wiki/UTF-16/UCS-2#Examples )到加载停止的任何行。我的错误印象是 getline 将 wstring 作为一个输入,将 wifstream 作为其他输入可以咀嚼任何 txt 输入...... 有没有办法读取文件中的每一行,即使它包含时髦的字符?
I have a strange problem,
I use
wifstream a("a.txt");
wstring line;
while (a.good()) //!a.eof() not helping
{
getline (a,line);
//...
wcout<<line<<endl;
}
and it works nicely for txt file like this
http://www.speedyshare.com/files/29833132/a.txt
(sorry for the link, but it is just 80 bytes so it shouldn't be a problem to get it , if i c/p on SO newlines get lost)
BUT when I add for example 水 (from http://en.wikipedia.org/wiki/UTF-16/UCS-2#Examples )to any line that is the line where loading stops. I was under the wrong impression that getline that takes wstring as one input and wifstream as other can chew any txt input...
Is there any way to read every single line in the file even if it contains funky characters?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
不太令人满意的答案是,您需要为输入流注入能够理解所讨论的特定字符编码的语言环境。如果您不知道选择哪个区域设置,可以使用空区域设置。
例如(未经测试):
不幸的是,没有标准方法来确定给定平台可用的区域设置,更不用说根据字符编码选择一个区域设置了。
上面的代码将区域设置的选择权交给了用户,如果他们将其设置为合理的值(例如
en_AU.UTF-8
),那么它可能就可以正常工作。如果做不到这一点,您可能需要求助于第三方库,例如 iconv 或 ICU。
也与此博客条目相关(为自己道歉-晋升)。
The not-very-satisfying answer is that you need to imbue the input stream with a locale which understands the particular character encoding in question. If you don't know which locale to choose, you can use the empty locale.
For example (untested):
Unfortunately, there is no standard way to determine what locales are available for a given platform, let alone select one based on the character encoding.
The above code puts the locale selection in the hands of the user, and if they set it to something plausible (e.g.
en_AU.UTF-8
) it might all Just Work.Failing this, you probably need to resort to third-party libraries such as iconv or ICU.
Also relevant this blog entry (apologies for the self-promotion).
问题在于您对全局函数
getline (a,line)
的调用。这需要一个std::string
。使用std::wistream::getline
方法而不是getline
函数。The problem is with your call to the global function
getline (a,line)
. This takes astd::string
. Use thestd::wistream::getline
method instead of thegetline
function.C++ fstreams 将 I/O 委托给它们的 filebufs。 filebuf 始终从磁盘读取“原始字节”,然后使用流语言环境的 codecvt 方面将这些原始字节转换为其“内部编码”。
wfstream
是basic_fstream
,因此具有使用区域设置的codecvt的
将从磁盘读取的字节转换为basic_filebuf
。wchar_t
。如果您读取 UCS-2 编码文件,则必须使用“知道”外部编码是 UCS-2 的 codecvt 来执行转换。因此,您需要一个具有此类 codecvt 的语言环境(例如,请参阅这个问题)默认情况下,流的区域设置是流构造时的全局区域设置。要使用特定区域设置,应该在流上使用
imbue()
-d。C++ fstreams delegeate I/O to their filebufs. filebufs always read "raw bytes" from disk and then use the stream locale's codecvt facet to convert between these raw bytes into their "internal encoding".
A
wfstream
is abasic_fstream<wchar_t>
and thus has abasic_filebuf<wchar_t>
which uses the locale'scodecvt<wchar_t, char>
to convert the bytes read from disk intowchar_t
s. If you read a UCS-2 encoded file, the conversion must thus be performed with a codecvt who "knows" that the external encoding is UCS-2. You thus need a locale with such a codecvt (see, for example, this SO question)By default, the stream's locale is the global locale at the stream construction. To use a specific locale, it should be
imbue()
-d on the stream.