如何在 Linux/OS X 上打印 wstring？

发布于 2024-11-25 15:33:35 字数 2708 浏览 4 评论 0原文

如何在控制台/屏幕上打印这样的字符串：€áa¢cée£？我尝试过这个：

#include <iostream>    
#include <string>
using namespace std;

wstring wStr = L"€áa¢cée£";

int main (void)
{
    wcout << wStr << " : " << wStr.length() << endl;
    return 0;
}

这不起作用。甚至令人困惑的是，如果我从字符串中删除 € ，打印输出将如下所示： ?a?c?e? : 7 但字符串中有 € 时，在 € 字符之后不会打印任何内容。

如果我在 python 中编写相同的代码：

#!/usr/bin/env python
# -*- coding: utf-8 -*-

wStr = u"€áa¢cée£"
print u"%s" % wStr

它会在同一个控制台上正确打印出字符串。我在 C++ 中缺少什么（好吧，我只是一个菜鸟）？干杯！！

Update 1: based on n.m.'s suggestion

#include <iostream>
#include <string>
using namespace std;

string wStr = "€áa¢cée£";
char *pStr = 0;

int main (void)
{
    cout << wStr << " : " << wStr.length() << endl;

    pStr = &wStr[0];
    for (unsigned int i = 0; i < wStr.length(); i++) {
        cout << "char "<< i+1 << " # " << *pStr << " => " << pStr << endl;
        pStr++;
    }
    return 0;
}

首先，它报告 14 作为字符串的长度： €áa¢cée£ : 14 是因为它每个字符计算 2 个字节吗？

我得到的只是：

char 1 # ? => €áa¢cée£
char 2 # ? => ??áa¢cée£
char 3 # ? => ?áa¢cée£
char 4 # ? => áa¢cée£
char 5 # ? => ?a¢cée£
char 6 # a => a¢cée£
char 7 # ? => ¢cée£
char 8 # ? => ?cée£
char 9 # c => cée£
char 10 # ? => ée£
char 11 # ? => ?e£
char 12 # e => e£
char 13 # ? => £
char 14 # ? => ?

作为最后的 cout 输出。所以，我相信，实际问题仍然存在。干杯！！

更新2：基于nm的第二个建议

#include <iostream>
#include <string>

using namespace std;

wchar_t wStr[] = L"€áa¢cée£";
int iStr = sizeof(wStr) / sizeof(wStr[0]);        // length of the string
wchar_t *pStr = 0;

int main (void)
{
    setlocale (LC_ALL,"");
    wcout << wStr << " : " << iStr << endl;

    pStr = &wStr[0];
    for (int i = 0; i < iStr; i++) {
       wcout << *pStr << " => " <<  static_cast<void*>(pStr) << " => " << pStr << endl;
       pStr++;
    }
    return 0;
}

这就是我得到的结果：

€áa¢cée£ : 9
€ => 0x1000010e8 => €áa¢cée£
á => 0x1000010ec => áa¢cée£
a => 0x1000010f0 => a¢cée£
¢ => 0x1000010f4 => ¢cée£
c => 0x1000010f8 => cée£
é => 0x1000010fc => ée£
e => 0x100001100 => e£
£ => 0x100001104 => £
 => 0x100001108 =>

为什么它被报告为9而不是8？或者这就是我应该期待的？干杯！！

原文

How can I print a string like this: €áa¢cée£ on the console/screen? I tried this:

#include <iostream>    
#include <string>
using namespace std;

wstring wStr = L"€áa¢cée£";

int main (void)
{
    wcout << wStr << " : " << wStr.length() << endl;
    return 0;
}

which is not working. Even confusing, if I remove € from the string, the print out comes like this: ?a?c?e? : 7 but with € in the string, nothing gets printed after the € character.

If I write the same code in python:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

wStr = u"€áa¢cée£"
print u"%s" % wStr

it prints out the string correctly on the very same console. What am I missing in c++ (well, I'm just a noob)? Cheers!!

Update 1: based on n.m.'s suggestion

#include <iostream>
#include <string>
using namespace std;

string wStr = "€áa¢cée£";
char *pStr = 0;

int main (void)
{
    cout << wStr << " : " << wStr.length() << endl;

    pStr = &wStr[0];
    for (unsigned int i = 0; i < wStr.length(); i++) {
        cout << "char "<< i+1 << " # " << *pStr << " => " << pStr << endl;
        pStr++;
    }
    return 0;
}

First of all, it reports 14 as the length of the string: €áa¢cée£ : 14 Is it because it's counting 2 byte per character?

And all I get this:

char 1 # ? => €áa¢cée£
char 2 # ? => ??áa¢cée£
char 3 # ? => ?áa¢cée£
char 4 # ? => áa¢cée£
char 5 # ? => ?a¢cée£
char 6 # a => a¢cée£
char 7 # ? => ¢cée£
char 8 # ? => ?cée£
char 9 # c => cée£
char 10 # ? => ée£
char 11 # ? => ?e£
char 12 # e => e£
char 13 # ? => £
char 14 # ? => ?

as the last cout output. So, actual problem still remains, I believe. Cheers!!

Update 2: based on n.m.'s second suggestion

#include <iostream>
#include <string>

using namespace std;

wchar_t wStr[] = L"€áa¢cée£";
int iStr = sizeof(wStr) / sizeof(wStr[0]);        // length of the string
wchar_t *pStr = 0;

int main (void)
{
    setlocale (LC_ALL,"");
    wcout << wStr << " : " << iStr << endl;

    pStr = &wStr[0];
    for (int i = 0; i < iStr; i++) {
       wcout << *pStr << " => " <<  static_cast<void*>(pStr) << " => " << pStr << endl;
       pStr++;
    }
    return 0;
}

And this is what I get as my result:

€áa¢cée£ : 9
€ => 0x1000010e8 => €áa¢cée£
á => 0x1000010ec => áa¢cée£
a => 0x1000010f0 => a¢cée£
¢ => 0x1000010f4 => ¢cée£
c => 0x1000010f8 => cée£
é => 0x1000010fc => ée£
e => 0x100001100 => e£
£ => 0x100001104 => £
 => 0x100001108 =>

Why there it's reported as 9 than 8? Or this is what I should expect? Cheers!!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

看轻我的陪伴 2024-12-02 15:33:35

在字符串文字之前删除 L。使用std::string，而不是std::wstring。

UPD：有一个更好（正确）的解决方案。保留 wchar_t、wstring 和 L，并在程序开头调用 setlocale(LC_ALL,"")。

无论如何，您应该在程序开头调用 setlocale(LC_ALL,"") 。这指示您的程序使用您环境的区域设置，而不是默认的“C”区域设置。您的环境具有 UTF-8 环境，因此一切都应该可以正常工作。

如果不调用 setlocale(LC_ALL,"")，程序将使用 UTF-8 序列，而不会“意识到”它们是 UTF-8。如果终端上打印了正确的 UTF-8 序列，它将被解释为 UTF-8 并且一切看起来都很好。如果您使用 string 和 char 就会发生这种情况：gcc 使用 UTF-8 作为字符串的默认编码，并且 ostream 会很高兴地打印它们而不应用任何转换。它认为它有一个 ASCII 字符序列。

但是，当您使用 wchar_t 时，一切都会中断：gcc 使用 UTF-32，未应用正确的重新编码（因为区域设置为“C”）并且输出是垃圾。

当您调用 setlocale(LC_ALL,"") 时，程序知道它应该将 UTF-32 重新编码为 UTF-8，然后一切又恢复正常了。

这一切都假设我们只想使用 UTF-8。使用任意区域设置和编码超出了本答案的范围。