c++、cout 和 UTF-8

发布于 2024-11-28 02:10:17 字数 997 浏览 3 评论 0原文

希望有一个简单的问题:在处理以多字节 UTF-8 字符结尾的字符串时,cout 似乎死掉了,我做错了什么吗?这是 Win7 x64 上的 GCC (Mingw)。

**编辑抱歉,如果我不够清楚,我不关心丢失的字形或如何解释字节,只是在调用 cout << 后它们根本没有显示。 s4(缺少 BAR)。第一个之后的任何进一步的 cout 都不会显示任何文本!

#include <cstdio>
#include <iostream>
#include <string>

int main() {
    std::string s1("abc");
    std::string s2("…");  // … = 0xE2 80 A6
    std::string s3("…abc");
    std::string s4("abc…");

    //In C
    fwrite(s1.c_str(), s1.size(), 1, stdout);
    printf(" FOO ");
    fwrite(s2.c_str(), s2.size(), 1, stdout);
    printf(" BAR ");
    fwrite(s3.c_str(), s3.size(), 1, stdout);
    printf(" FOO ");
    fwrite(s4.c_str(), s4.size(), 1, stdout);
    printf(" BAR\n\n"); 

    //C++
    std::cout << s1 << " FOO " << s2 << " BAR " << s3 << " FOO " << s4 << " BAR ";
}

// results:

// abc FOO ��� BAR ���abc FOO abc… BAR

// abc FOO ��� BAR ���abc FOO abc…

Hopefully a simple question: cout seems to die when handling strings that end with a multibyte UTF-8 char, am I doing something wrong? This is with GCC (Mingw) on Win7 x64.

**Edit Sorry if I wasn't clear enough, I'm not concerned about the missing glyphs or how the bytes are interpreted, merely that they are not showing at all right after the call to cout << s4 (missing BAR). Any further couts after the first display no text whatsoever!

#include <cstdio>
#include <iostream>
#include <string>

int main() {
    std::string s1("abc");
    std::string s2("…");  // … = 0xE2 80 A6
    std::string s3("…abc");
    std::string s4("abc…");

    //In C
    fwrite(s1.c_str(), s1.size(), 1, stdout);
    printf(" FOO ");
    fwrite(s2.c_str(), s2.size(), 1, stdout);
    printf(" BAR ");
    fwrite(s3.c_str(), s3.size(), 1, stdout);
    printf(" FOO ");
    fwrite(s4.c_str(), s4.size(), 1, stdout);
    printf(" BAR\n\n"); 

    //C++
    std::cout << s1 << " FOO " << s2 << " BAR " << s3 << " FOO " << s4 << " BAR ";
}

// results:

// abc FOO ��� BAR ���abc FOO abc… BAR

// abc FOO ��� BAR ���abc FOO abc…

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

顾铮苏瑾 2024-12-05 02:10:17

如果您希望程序使用当前区域设置,请在程序中首先调用 setlocale(LC_ALL, "")。否则程序的语言环境是C,并且它对非 ASCII 字符的作用是我们人类无法知道的。

If you want your program to use your current locale, call setlocale(LC_ALL, "") as the first thing in your program. Otherwise the program's locale is C and what it will do to non-ASCII characters is not knowable by us mere humans.

捂风挽笑 2024-12-05 02:10:17

这确实不足为奇。除非您的终端设置为 UTF-8 编码,否则它如何知道 s2 不应该是“(带抑扬符的拉丁小写字母 a)(欧元符号)(管道)”,
假设您的终端根据 http://www.ascii-code.com/ 设置为 ISO-8859-1 顺便说一句

,cout 并没有“消亡”,因为它显然在测试字符串之后继续产生输出。

This is really no surprise. Unless your terminal is set to UTF-8 coding, how does it know that s2 isn't supposed to be "(Latin small letter a with circumflex)(Euro sign)(Pipe)",
supposing that your terminal is set to ISO-8859-1 according to http://www.ascii-code.com/

By the way, cout is not "dying" as it clearly continues to produce output after your test string.

夜灵血窟げ 2024-12-05 02:10:17

默认情况下,Windows 控制台不处理非本地代码页字符。

您需要确保在控制台窗口中设置了支持 Unicode 的字体,并且通过调用 chcp 将代码页设置为 UTF-8。但这并不能保证成功。
请注意,如果控制台由于字体损坏而无法显示花哨的字符,则“wcout”不会发生任何变化。

在所有现代 Linux 发行版上,控制台都设置为 UTF-8,这应该可以立即使用。

The Windows console does not handle non-local-codepage characters by default.

You'll need to make sure you have a Unicode-capable font set in the console window, and that the codepage is set to UTF-8 through a call to chcp. This is not a guaranteed success though.
Note that `wcout´ changes nothing if the console can't show the fancy characters because its font is botched.

On all modern Linux distros, the console is set to UTF-8 and this should work out of the box.

南街九尾狐 2024-12-05 02:10:17

正如其他人指出的那样,std::cout 对此是不可知的,至少在“C”语言环境(默认)中是这样。另一方面,您的控制台窗口必须设置为显示 UTF-8:代码页 65001。在执行程序之前尝试调用 chcp 65001。 (这在过去对我有用。)

As others have pointed out, std::cout is agnostic about this, at least in "C" locale (the default). On the other hand, your console window must be set up to display UTF-8: code page 65001. Try invoking chcp 65001 before executing your program. (This has worked for me in the past.)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文