印刷撇号 + 宽字符串文字破坏了我的 wofstream (C++)

发布于 2024-07-18 20:47:16 字数 620 浏览 6 评论 0原文

我刚刚在处理不祥的印刷撇号（'）时遇到了一些奇怪的行为，而不是打字机撇号（'）。与宽字符串文字一起使用时，撇号会破坏 wofstream。

这段代码有效

ofstream file("test.txt");
file << "A’B" ;
file.close();

==> A'B

这段代码有效

wofstream file("test.txt");
file << "A’B" ;
file.close();

==> A'B

此代码失败

wofstream file("test.txt");
file << L"A’B" ;
file.close();

==> A

此代码失败...

wstring test = L"A’B";
wofstream file("test.txt");
file << test ;
file.close();

==> 有

什么想法吗？

原文

I’ve just encountered some strange behaviour when dealing with the ominous typographic apostrophe ( ’ ) – not the typewriter apostrophe ( ' ). Used with wide string literal, the apostrophe breaks wofstream.

This code works

ofstream file("test.txt");
file << "A’B" ;
file.close();

==> A’B

This code works

wofstream file("test.txt");
file << "A’B" ;
file.close();

==> A’B

This code fails

wofstream file("test.txt");
file << L"A’B" ;
file.close();

==> A

This code fails...

wstring test = L"A’B";
wofstream file("test.txt");
file << test ;
file.close();

==> A

Any idea ?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

满意归宿 2024-07-25 20:47:16

您应该在使用 wofstream 之前“启用”区域设置：

std::locale::global(std::locale()); // Enable locale support 
wofstream file("test.txt");
file << L"A’B";

因此，如果您有系统区域设置 en_US.UTF-8 那么文件 test.txt 将包含
utf8 编码数据（4 个字节），如果您有系统区域设置 en_US.ISO8859-1，那么它会将其编码为 8 位编码（3 个字节），除非 ISO 8859-1 遗漏了此类字符。

wofstream file("test.txt");
file << "A’B" ;
file.close();

此代码有效，因为 "A'B" 实际上是 utf-8 字符串，并且您保存了 utf-8
字符串逐字节到文件。

注意： 我假设您使用的是类似 POSIX 的操作系统，并且您的默认区域设置不同于默认区域设置“C”。

You should "enable" locale before using wofstream:

std::locale::global(std::locale()); // Enable locale support 
wofstream file("test.txt");
file << L"A’B";

So if you have system locale en_US.UTF-8 then the file test.txt will include
utf8 encoded data (4 byes), if you have system locale en_US.ISO8859-1, then it would encode it as 8 bit encoding (3 bytes), unless ISO 8859-1 misses such character.

wofstream file("test.txt");
file << "A’B" ;
file.close();

This code works because "A’B" is actually utf-8 string and you save utf-8
string to file byte by byte.

Note: I assume you are using POSIX like OS, and you have default locale different from "C" that is the default locale.

回复收藏 0 原文

爱的十字路口 2024-07-25 20:47:16

您确定不是您的编译器对源文件中的 unicode 字符的支持“损坏”了吗？如果您使用 \x 或类似的方法对字符串文字中的字符进行编码会怎样？您的源文件是否采用了编译器可能采用的 wchar_t 编码？

回复收藏 0 原文

知足的幸福 2024-07-25 20:47:16

尝试将流插入字符包装在 try-catch 块中，并告诉我们它抛出什么异常（如果有）。

我不确定这里发生了什么，但无论如何我都会猜测一下。印刷撇号可能有一个适合一个字节的值。这适用于 "A'B" 因为它盲目地复制字节而不关心底层编码。然而，对于L"A'B"，依赖于实现的编码因素开始发挥作用。它可能找不到正确的 UTF-16（如果您使用的是 Windows）或 UTF-32（如果您使用的是 *nix/Mac）值来存储该特定字符。

回复收藏 0 原文

~没有更多了~