GCC 中的 wchar_t 有多大?
GCC 支持 -fshort-wchar 将 wchar_t 从 4 个字节切换为两个字节。
在编译时检测 wchar_t 大小的最佳方法是什么,以便我可以将其正确映射到适当的 utf-16 或 utf-32 类型? 至少,直到 c++0x 发布并为我们提供稳定的 utf16_t 和 utf_32_t typedef。
#if ?what_goes_here?
typedef wchar_t Utf32;
typedef unsigned short Utf16;
#else
typedef wchar_t Utf16;
typedef unsigned int Utf32;
#endif
GCC supports -fshort-wchar that switches wchar_t from 4, to two bytes.
What is the best way to detect the size of wchar_t at compile time, so I can map it correctly to the appropriate utf-16 or utf-32 type?
At least, until c++0x is released and gives us stable utf16_t and utf_32_t typedefs.
#if ?what_goes_here?
typedef wchar_t Utf32;
typedef unsigned short Utf16;
#else
typedef wchar_t Utf16;
typedef unsigned int Utf32;
#endif
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
宏
您可以使用gcc 定义的 。您可以使用 echo "" | 检查它们的值gcc -E - -dM
由于
__WCHAR_TYPE__
的值可以从int
到short unsigned int
或long int
,最适合您的测试的是恕我直言,检查 __WCHAR_MAX__ 是否高于 2^16。You can use the macros
They are defined by gcc. You can check their value with
echo "" | gcc -E - -dM
As the value of
__WCHAR_TYPE__
can vary fromint
toshort unsigned int
orlong int
, the best for your test is IMHO to check if__WCHAR_MAX__
is above 2^16.您可以使用标准宏:
WCHAR_MAX
:WCHAR_MAX
宏由 ISO C 和 ISO C++ 标准定义(请参阅:ISO/IEC 9899 - 7.18.3 其他整数类型和 ISO/IEC 14882 - C.2 的限制,因此您可以在几乎所有编译器上安全地使用它。You can use the standard macro:
WCHAR_MAX
:WCHAR_MAX
Macro was defined by ISO C and ISO C++ standard (see: ISO/IEC 9899 - 7.18.3 Limits of other integer types and ISO/IEC 14882 - C.2), so you could use it safely on almost all compilers.大小取决于编译器标志 -fshort-wchar:
The size depends on the compiler flag -fshort-wchar:
正如 Luther Blissett 所说,wchar_t 独立于 Unicode 存在——它们是两个不同的东西。
如果您真正谈论的是 UTF-16 - 请注意,有一些 unicode 字符映射到两个 16 位单词(U+10000..U+10FFFF,尽管这些在西方国家/语言中很少使用)。
As Luther Blissett said, wchar_t exists independently from Unicode - they are two different things.
If you are really talking about UTF-16 - be aware that there are unicode characters which map to two 16-bit words (U+10000..U+10FFFF, although these are rarely used in western countries/languages).