string转wstring,编码问题

发布于 2024-12-05 13:58:36 字数 667 浏览 1 评论 0原文

我读过 Stroustrup 的附录 D(特别注意 Locales 和 Codecvt)。 Stroustrup 没有给出一个好的 codecvt 和 Widen 示例(恕我直言)。我一直在尝试从互联网上获取一些东西,但没有任何乐趣。我也尝试过注入字符串流但没有成功。

有人能够展示(并解释)从 UTF-8 到 UTF-16(或 UTF-32)编码的代码吗? 注意:我事先不知道输入/输出字符串的大小,因此我希望解决方案应该使用 reserveback_inserter。请不要使用out.resize(in.length()*2)

完成后,如果代码确实能够工作那就太好了(令人惊讶的是,那里有这么多损坏的代码)。请确保以下“往返”。下面的字节是 UTF-8 和 UTF-{16|32} 中“骨”的汉字。

const std::string n("\xe9\xaa\xa8");
const std::wstring w = L"\u9aa8";

我对一个基本问题表示歉意。在 Windows 上,我使用 Win32 API,并且在编码之间移动时不会遇到这些问题。

I've read Stroustrup's Appendix D (particular attention to Locales and Codecvt). Stroustrup does not give a good codecvt and widen example (IMHO). I've been trying to knob turn stuff from the internet with no joy. I've also tried imbue'ing stringstreams without success.

Would anyone be able to show (and explain) the code to go from a UTF-8 to a UTF-16 (or UTF-32) encoding? NOTE: I do not know the size of the input/output string in advance, so I expect the solution should use reserve and a back_inserter. Please don't use out.resize(in.length()*2).

When finished, it would be great if the code actually worked (its amazing how much broken code is out there). Please make sure the following 'round trips'. The bytes below are the Han character for 'bone' in UTF-8 and UTF-{16|32}.

const std::string n("\xe9\xaa\xa8");
const std::wstring w = L"\u9aa8";

My apologies for a basic question. On Windows, I use the Win32 API and don't have these problems moving between encodings.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

末骤雨初歇 2024-12-12 13:58:36

只需使用 UTF8-CPP

std::wstring conversion; 
utf8::utf8to16(utf8_str.begin(), utf8_str.end() , back_inserter(conversion));

警告:这只适用于 wchar_t 为 2 字节长的情况(Windows)。

对于便携式解决方案,您可以这样做:

std::vector<unsigned short> utf16line; // uint16_t if you can
utf8::utf8to16(utf8_line.begin(), utf8_line.end(), back_inserter(utf16line));

但是您将失去字符串支持。希望我们能尽快得到 char16_t。

Just use UTF8-CPP :

std::wstring conversion; 
utf8::utf8to16(utf8_str.begin(), utf8_str.end() , back_inserter(conversion));

Caveat: this will only work where wchar_t is 2-bytes long (windows).

For a portable solution you could do :

std::vector<unsigned short> utf16line; // uint16_t if you can
utf8::utf8to16(utf8_line.begin(), utf8_line.end(), back_inserter(utf16line));

But then you're losing the string support. Hopefully, we'll get char16_t soon enough.

感受沵的脚步 2024-12-12 13:58:36

很明显他在吸大麻。至于代码页转换,iconv 就是最好的选择!

It seems pretty obvious that he was smoking weed. As for the codepage conversions, look no further than iconv!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文