getline 和“奇怪的字符”的问题

发布于 2024-11-29 04:13:24 字数 662 浏览 4 评论 0原文

我有一个奇怪的问题, 我使用

wifstream a("a.txt");
wstring line;
while (a.good()) //!a.eof()  not helping
{
     getline (a,line);
      //...
     wcout<<line<<endl;

}

它,对于像这样的txt文件来说效果很好 http://www.speedyshare.com/files/29833132/a.txt (抱歉,这个链接只有 80 个字节,所以如果换行符上的 ic/p 丢失的话,获取它应该不是问题) 但是当我添加例如水时(来自 http://en.wikipedia. org/wiki/UTF-16/UCS-2#Examples )到加载停止的任何行。我的错误印象是 getline 将 wstring 作为一个输入,将 wifstream 作为其他输入可以咀嚼任何 txt 输入...... 有没有办法读取文件中的每一行,即使它包含时髦的字符?

I have a strange problem,
I use

wifstream a("a.txt");
wstring line;
while (a.good()) //!a.eof()  not helping
{
     getline (a,line);
      //...
     wcout<<line<<endl;

}

and it works nicely for txt file like this
http://www.speedyshare.com/files/29833132/a.txt
(sorry for the link, but it is just 80 bytes so it shouldn't be a problem to get it , if i c/p on SO newlines get lost)
BUT when I add for example 水 (from http://en.wikipedia.org/wiki/UTF-16/UCS-2#Examples )to any line that is the line where loading stops. I was under the wrong impression that getline that takes wstring as one input and wifstream as other can chew any txt input...
Is there any way to read every single line in the file even if it contains funky characters?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

那些过往 2024-12-06 04:13:24

不太令人满意的答案是,您需要为输入流注入能够理解所讨论的特定字符编码的语言环境。如果您不知道选择哪个区域设置,可以使用空区域设置。

例如(未经测试):

std::wifstream a("a.txt");
std::locale loc("");
a.imbue(loc);

不幸的是,没有标准方法来确定给定平台可用的区域设置,更不用说根据字符编码选择一个区域设置了。

上面的代码将区域设置的选择权交给了用户,如果他们将其设置为合理的值(例如 en_AU.UTF-8),那么它可能就可以正常工作。

如果做不到这一点,您可能需要求助于第三方库,例如 iconvICU

也与此博客条目相关(为自己道歉-晋升)。

The not-very-satisfying answer is that you need to imbue the input stream with a locale which understands the particular character encoding in question. If you don't know which locale to choose, you can use the empty locale.

For example (untested):

std::wifstream a("a.txt");
std::locale loc("");
a.imbue(loc);

Unfortunately, there is no standard way to determine what locales are available for a given platform, let alone select one based on the character encoding.

The above code puts the locale selection in the hands of the user, and if they set it to something plausible (e.g. en_AU.UTF-8) it might all Just Work.

Failing this, you probably need to resort to third-party libraries such as iconv or ICU.

Also relevant this blog entry (apologies for the self-promotion).

会傲 2024-12-06 04:13:24

问题在于您对全局函数 getline (a,line) 的调用。这需要一个std::string。使用 std::wistream::getline 方法而不是 getline 函数。

The problem is with your call to the global function getline (a,line). This takes a std::string. Use the std::wistream::getline method instead of the getline function.

小梨窩很甜 2024-12-06 04:13:24

C++ fstreams 将 I/O 委托给它们的 filebufs。 filebuf 始终从磁盘读取“原始字节”,然后使用流语言环境的 codecvt 方面将这些原始字节转换为其“内部编码”。

wfstreambasic_fstream,因此具有使用区域设置的 codecvtbasic_filebuf将从磁盘读取的字节转换为wchar_t。如果您读取 UCS-2 编码文件,则必须使用“知道”外部编码是 UCS-2 的 codecvt 来执行转换。因此,您需要一个具有此类 codecvt 的语言环境(例如,请参阅这个问题

)默认情况下,流的区域设置是流构造时的全局区域设置。要使用特定区域设置,应该在流上使用 imbue()-d。

C++ fstreams delegeate I/O to their filebufs. filebufs always read "raw bytes" from disk and then use the stream locale's codecvt facet to convert between these raw bytes into their "internal encoding".

A wfstream is a basic_fstream<wchar_t> and thus has a basic_filebuf<wchar_t> which uses the locale's codecvt<wchar_t, char> to convert the bytes read from disk into wchar_ts. If you read a UCS-2 encoded file, the conversion must thus be performed with a codecvt who "knows" that the external encoding is UCS-2. You thus need a locale with such a codecvt (see, for example, this SO question)

By default, the stream's locale is the global locale at the stream construction. To use a specific locale, it should be imbue()-d on the stream.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文