在 I/O 中使用 char16_t 和 char32_t

发布于 2024-12-16 20:06:56 字数 304 浏览 1 评论 0原文

C++11 引入了 char16_tchar32_t 来方便处理 UTF-16 和 UTF-32 编码的文本字符串。但 库仍然只支持实现定义的 wchar_t 多字节 I/O。

为什么 库没有添加对 char16_tchar32_t 的支持来补充 wchar_t 支持?

C++11 introduces char16_t and char32_t to facilitate working with UTF-16- and UTF-32-encoded text strings. But the <iostream> library still only supports the implementation-defined wchar_t for multi-byte I/O.

Why has support for char16_t and char32_t not been added to the <iostream> library to complement the wchar_t support?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

旧夏天 2024-12-23 20:06:56

在提案中对标准库(修订版 2)表明,仅库工作组支持字符串和 codecvt 方面的新字符类型。显然,大多数人反对支持 iostream、fstream、codecvt 以外的方面和正则表达式。

根据2006 年波特兰会议纪要 “LWG 致力于全面支持 Unicode,但不打算使用现有图书馆设施的 Unicode 字符变体来复制图书馆。”我还没有找到任何细节,但我猜测委员会认为当前的库接口不适合 Unicode。一个可能的抱怨可能是它在设计时考虑了固定大小的字符,但 Unicode 完全废弃了这一点,因为虽然 Unicode 数据可以使用固定大小的代码点,但它并不将字符限制为单个代码点。

我个人认为没有理由不标准化各种平台上已经提供的最小支持(Windows 对 wchar_t 使用 UTF-16,大多数 Unix 平台使用 UTF-32)。更高级的 Unicode 支持将需要新的库设施,但在 iostream 和 Facet 中支持 char16_t 和 char32_t 不会妨碍,但会启用基本的 Unicode I/O。

In the proposal Minimal Unicode support for the standard library (revision 2) it is indicated that there was only support among the Library Working Group for supporting the new character types in strings and codecvt facets. Apparently the majority was opposed to supporing iostream, fstream, facets other than codecvt, and regex.

According to minutes from the Portland meeting in 2006 "the LWG is committed to full support of Unicode, but does not intend to duplicate the library with Unicode character variants of existing library facilities." I haven't found any details, however I would guess that the committee feels that the current library interface is inappropriate for Unicode. One possible complaint could be that it was designed with fixed sized characters in mind, but Unicode completely obsoletes that as, while Unicode data can use fixed sized code points, it does not limit characters to single code points.

Personally I think there's no reason not to standardized the minimal support that's already provided on various platforms (Windows uses UTF-16 for wchar_t, most Unix platforms use UTF-32). More advanced Unicode support will require new library facilities, but supporting char16_t and char32_t in iostreams and facets won't get in the way but would enable basic Unicode i/o.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文