C++0x 是否支持 std::wstring 与 UTF-8 字节序列之间的转换?
我看到 C++0x 将添加对 UTF-8、UTF-16 和 UTF-32 文字的支持。 但是这三种表示形式之间的转换又如何呢?
我计划在代码中的任何地方使用 std::wstring 。 但我在处理文件和网络时还需要操作UTF-8编码的数据。 C++0x 也会提供对这些操作的支持吗?
I saw that C++0x will add support for UTF-8, UTF-16 and UTF-32 literals. But what about conversions between the three representations ?
I plan to use std::wstring everywhere in my code. But I also need to manipulate UTF-8 encoded data when dealing with files and network. Will C++0x provide also support for these operations ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
在 C++0x 中,
char16_t
和char32_t
将用于存储 UTF-16 和 UTF-32,而不是wchar_t
。来自草案 n2798:
关于
wchar_t
的事情是它不会为您提供有关所使用的编码的任何保证。 它是一种可以容纳多字节字符的类型。 时期。 如果您现在要编写软件,您就必须接受这种妥协。 与 C++0x 兼容的编译器还有很长的路要走。 您始终可以尝试一下 VC2010 CTP 和 g++ 编译器,看看它是否值得。 此外,wchar_t
在不同平台上具有不同的大小,这是另一件事需要注意(VS/Windows 上为 2 字节,GCC/Mac 上为 4 字节等)。 然后,GCC 的-fshort-wchar
等选项使问题进一步复杂化。因此,最好的解决方案是使用现有的库。 追踪 UNICODE 错误并不是精力/时间的最佳利用方式。 我建议你看一下:
有关 C++0x Unicode 字符串文字的更多信息 此处
In C++0x,
char16_t
andchar32_t
will be used to store UTF-16 and UTF-32 and notwchar_t
.From the draft n2798:
The thing about
wchar_t
is that it does not give you any guarantees about the encoding used. It is a type that can hold a multibyte character. Period. If you are going to write software now, you have to live with this compromise. C++0x compliant compilers are yet a far cry. You can always give the VC2010 CTP and g++ compilers a try for what it is worth. Moreover,wchar_t
has different sizes on different platforms which is another thing to watch out for (2 bytes on VS/Windows, 4 bytes on GCC/Mac and so on). There is then options like-fshort-wchar
for GCC to further complicate the issue.The best solution therefore is to use an existing library. Chasing UNICODE bugs around isn't the best possible use of effort/time. I'd suggest you take a look at:
More on C++0x Unicode string literals here
暗暗地谢谢你。 我尚未注册,因此无法投票或直接回复评论。
我通过 codecvt 学到了一些东西。 我知道您建议的库,以下资源也可能有用 http://www. unicode.org/Public/PROGRAMS/CVTUTF/。
该项目是一个应该开源的库。 我更喜欢最小化与外部库的依赖关系。 我已经依赖 libgc 和 boost,尽管后来我只使用线程。 我真的更愿意坚持 C++ 标准,而且我对 GC 支持以某种方式被放弃感到有点失望。
显然 VC++ Express 2008 据称支持大部分 C++0x 标准以及 icc。 由于我目前使用VC++进行开发,距离发布库还需要一段时间,所以我想尝试一下使用codecvt和char32_t字符串。
有谁知道如何做到这一点 ? 我应该提出另一个问题吗?
Thank you dirkgently. I'm not yet registered, so I can't upvote or respond directly as a comment.
I've learned something with codecvt. I knew about the libraries you suggest and the following resource may also be useful http://www.unicode.org/Public/PROGRAMS/CVTUTF/.
The project is for a library that should be open source. I would prefer minimizing the dependencies with external libraries. I already have a dependency with libgc and boost, though for the later I only use threads. I would really prefer to stick to the C++ standard and I'm a bit disappointed that GC supported has been somehow dropped.
Apparently VC++ express 2008 is said to support most of the C++0x standard as well as icc. Since I currently develop with VC++ and it will still take some time until the library would be released, I'd like to give a try to use codecvt and char32_t strings.
Does anyone know how to do this ? Should I post another question ?