在 I/O 中使用 char16_t 和 char32_t
C++11 引入了 char16_t
和 char32_t
来方便处理 UTF-16 和 UTF-32 编码的文本字符串。但
库仍然只支持实现定义的 wchar_t
多字节 I/O。
为什么
库没有添加对 char16_t
和 char32_t
的支持来补充 wchar_t
支持?
C++11 introduces char16_t
and char32_t
to facilitate working with UTF-16- and UTF-32-encoded text strings. But the <iostream>
library still only supports the implementation-defined wchar_t
for multi-byte I/O.
Why has support for char16_t
and char32_t
not been added to the <iostream>
library to complement the wchar_t
support?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在提案中对标准库(修订版 2)表明,仅库工作组支持字符串和 codecvt 方面的新字符类型。显然,大多数人反对支持 iostream、fstream、codecvt 以外的方面和正则表达式。
根据2006 年波特兰会议纪要 “LWG 致力于全面支持 Unicode,但不打算使用现有图书馆设施的 Unicode 字符变体来复制图书馆。”我还没有找到任何细节,但我猜测委员会认为当前的库接口不适合 Unicode。一个可能的抱怨可能是它在设计时考虑了固定大小的字符,但 Unicode 完全废弃了这一点,因为虽然 Unicode 数据可以使用固定大小的代码点,但它并不将字符限制为单个代码点。
我个人认为没有理由不标准化各种平台上已经提供的最小支持(Windows 对 wchar_t 使用 UTF-16,大多数 Unix 平台使用 UTF-32)。更高级的 Unicode 支持将需要新的库设施,但在 iostream 和 Facet 中支持 char16_t 和 char32_t 不会妨碍,但会启用基本的 Unicode I/O。
In the proposal Minimal Unicode support for the standard library (revision 2) it is indicated that there was only support among the Library Working Group for supporting the new character types in strings and codecvt facets. Apparently the majority was opposed to supporing iostream, fstream, facets other than codecvt, and regex.
According to minutes from the Portland meeting in 2006 "the LWG is committed to full support of Unicode, but does not intend to duplicate the library with Unicode character variants of existing library facilities." I haven't found any details, however I would guess that the committee feels that the current library interface is inappropriate for Unicode. One possible complaint could be that it was designed with fixed sized characters in mind, but Unicode completely obsoletes that as, while Unicode data can use fixed sized code points, it does not limit characters to single code points.
Personally I think there's no reason not to standardized the minimal support that's already provided on various platforms (Windows uses UTF-16 for wchar_t, most Unix platforms use UTF-32). More advanced Unicode support will require new library facilities, but supporting char16_t and char32_t in iostreams and facets won't get in the way but would enable basic Unicode i/o.