印刷撇号 + 宽字符串文字破坏了我的 wofstream (C++)
我刚刚在处理不祥的印刷撇号(')时遇到了一些奇怪的行为,而不是打字机撇号(')。 与宽字符串文字一起使用时,撇号会破坏 wofstream。
这段代码有效
ofstream file("test.txt");
file << "A’B" ;
file.close();
==> A'B
这段代码有效
wofstream file("test.txt");
file << "A’B" ;
file.close();
==> A'B
此代码失败
wofstream file("test.txt");
file << L"A’B" ;
file.close();
==> A
此代码失败...
wstring test = L"A’B";
wofstream file("test.txt");
file << test ;
file.close();
==> 有
什么想法吗?
I’ve just encountered some strange behaviour when dealing with the ominous typographic apostrophe ( ’ ) – not the typewriter apostrophe ( ' ). Used with wide string literal, the apostrophe breaks wofstream.
This code works
ofstream file("test.txt");
file << "A’B" ;
file.close();
==> A’B
This code works
wofstream file("test.txt");
file << "A’B" ;
file.close();
==> A’B
This code fails
wofstream file("test.txt");
file << L"A’B" ;
file.close();
==> A
This code fails...
wstring test = L"A’B";
wofstream file("test.txt");
file << test ;
file.close();
==> A
Any idea ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您应该在使用 wofstream 之前“启用”区域设置:
因此,如果您有系统区域设置
en_US.UTF-8
那么文件test.txt
将包含utf8 编码数据(4 个字节),如果您有系统区域设置
en_US.ISO8859-1
,那么它会将其编码为 8 位编码(3 个字节),除非 ISO 8859-1 遗漏了此类字符。此代码有效,因为
"A'B"
实际上是 utf-8 字符串,并且您保存了 utf-8字符串逐字节到文件。
注意: 我假设您使用的是类似 POSIX 的操作系统,并且您的默认区域设置不同于默认区域设置“C”。
You should "enable" locale before using wofstream:
So if you have system locale
en_US.UTF-8
then the filetest.txt
will includeutf8 encoded data (4 byes), if you have system locale
en_US.ISO8859-1
, then it would encode it as 8 bit encoding (3 bytes), unless ISO 8859-1 misses such character.This code works because
"A’B"
is actually utf-8 string and you save utf-8string to file byte by byte.
Note: I assume you are using POSIX like OS, and you have default locale different from "C" that is the default locale.
您确定不是您的编译器对源文件中的 unicode 字符的支持“损坏”了吗? 如果您使用
\x
或类似的方法对字符串文字中的字符进行编码会怎样? 您的源文件是否采用了编译器可能采用的wchar_t
编码?Are you sure it's not your compiler's support for unicode characters in source files that is "broken"? What if you use
\x
or similar to encode the character in the string literal? Is your source file even in whatever encoding might might to awchar_t
for your compiler?尝试将流插入字符包装在
try-catch
块中,并告诉我们它抛出什么异常(如果有)。我不确定这里发生了什么,但无论如何我都会猜测一下。 印刷撇号可能有一个适合一个字节的值。 这适用于
"A'B"
因为它盲目地复制字节而不关心底层编码。 然而,对于L"A'B"
,依赖于实现的编码因素开始发挥作用。 它可能找不到正确的 UTF-16(如果您使用的是 Windows)或 UTF-32(如果您使用的是 *nix/Mac)值来存储该特定字符。Try wrapping the stream insertion character in a
try-catch
block and tell us what, if any, exception it throws.I am not sure what is going on here, but I'll harass a guess anyway. The typographic apostrophe probably has a value that fits into one byte. This works with
"A’B"
since it blindly copies bytes without bothering about the underlying encoding. However, withL"A’B"
, an implementation dependent encoding factor comes into play. It probably doesn't find the proper UTF-16 (if you are on Windows) or UTF-32 (if you are on *nix/Mac) value to store for this particular character.