使 size_t 和 wchar_t 可移植吗?
据我了解, size_t 和 wchar_t 的表示完全是特定于平台/编译器的。例如,我读到 Linux 上的 wchar_t 现在通常是 32 位,但在 Windows 上它是 16 位。有什么方法可以在我自己的代码中将它们标准化为设定的大小(int、long 等),同时仍然保持与两个平台上现有标准 C 库和函数的向后可比性?
我的目标本质上是做一些类似 typedef 的事情,这样它们就有一个固定的大小。在不破坏任何东西的情况下这可能吗?我应该这样做吗?有更好的办法吗?
更新: 我想这样做的原因是为了让我的字符串编码在 Windows 和 Linux 上保持一致,
谢谢!
To my understanding the representation of size_t and wchar_t are completely platform/compiler specific. For instance I have read that wchar_t on Linux is now usually 32bit, but on Windows it is 16bit. Is there any way that I can standardize these to a set size (int, long, etc.) in my own code, while still maintaining backwards comparability with the existing standard C libraries and functions on both platforms?
My goal is essentially to do something like typedef them so they are a set size. Is this possible without breaking something? Should I do this? Is there a better way?
UPDATE: The reason I'd like to do this is so that my string encoding is consistent across both Windows and Linux
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
听起来您正在寻找 C99 的 & C++0x 的
/
标头。这定义了诸如uint8_t
和int64_t
之类的类型。如果您没有这些标头,您可以使用 Boost 的
cstdint.hpp
。Sounds like you're looking for C99's & C++0x's
<stdint.h>
/<cstdint>
headers. This defines types likeuint8_t
, andint64_t
.You can use Boost's
cstdint.hpp
in the case you don't have those headers.您不想重新定义这些类型。相反,您可以使用
int32_t
或int16_t
(带符号的 32 位和 16 位)等类型定义,它们是
的一部分C 标准库中的代码>。如果您使用 C++,C++0x 将添加
char16_t
和char32_t
,它们是用于 UTF-16 和 UTF 的新类型(不仅仅是整型类型的 typedef) -32。对于
wchar_t
,另一种方法是使用像 ICU 这样的库,它在一种独立于平台的方式。然后,您可以只使用UChar
类型,该类型始终为 UTF-16;您仍然需要注意字节顺序。 ICU 还提供 UChar (UTF-16) 之间的转换器。You don't want to redefine those types. Instead, you can use typedefs like
int32_t
orint16_t
(signed 32-bit and 16-bit), which are part of<stdint.h>
in the C standard library.If you're using C++, C++0x will add
char16_t
andchar32_t
, which are new types (not just typedefs for integral types) intended for UTF-16 and UTF-32.For
wchar_t
, an alternative is to just use a library like ICU which implements Unicode in a platform-independent way. Then, you can just use theUChar
type, which will always be UTF-16; you do still need to be careful about endianness. ICU also provides converters to and from UChar (UTF-16).不。尝试使用 typedef 来“修复”字符类型的根本问题是,您最终得到的结果在某些平台上与内置函数和宽字符文字一致,而在其他平台上则不然。
如果您想要在所有平台上都相同的字符串格式,您只需选择大小和符号即可。您想要无符号 8 位“字符”,还是有符号 64 位“字符”?您可以在任何具有适当大小的整数类型的平台上使用它们(并非所有平台都如此)。但是,就语言而言,它们并不是真正的字符,因此不要期望能够对它们调用
strlen
或wcslen
,或者拥有一个很好的文字语法。字符串文字是(当然,转换为)char*
,而不是signed char*
或unsigned char*
。宽字符串文字是wchar_t*
,它相当于一些其他整数类型,但不一定是您想要的类型。因此,您必须选择一种编码,在内部使用它,定义您自己的所需字符串函数版本,实现它们,然后根据需要与采用字符串的非字符串函数进行平台编码之间的转换。 utf-8 是一个不错的选择,因为大多数 C 字符串函数仍然“工作”,从某种意义上说,它们做了一些相当有用的事情,即使它不完全正确。
No. The fundemental problem with trying to use a typedef to "fix" a character type, is that you end up with something that on some platforms is consistent with the built in functions and with wide character literals, and on other platforms is not.
If you want a string format which is the same on all platforms, you could just pick a size and signed-ness. You want unsigned 8 bit "characters", or signed 64 bit "characters"? You can have them on any platform which has an integer type of the appropriate size (not all do). But, they're not really characters as far as the language is concerned, so don't expect to be able to call
strlen
orwcslen
on them, or to have a nice syntax for literals. A string literal is (well, converts to) achar*
, not asigned char*
or anunsigned char*
. A wide string literal is awchar_t*
, which is equivalent to some other integer type, but not necessarily the one you want it to be.So, you have to pick an encoding, use that internally, define your own versions of the string functions you need, implement them, then convert to/from the platform's encoding as necessary for non-string functions that take strings. utf-8 is a decent option because most of the C string functions still "work", in the sense that they do something fairly useful even if it isn't entirely correct.
wchar_t 可能会比 size_t 更具有粘性。人们可以假设 size_t 的最大大小(例如 8 字节),并在写入文件(或套接字)之前将所有变量转换为该大小。另一件需要记住的事情是,如果您尝试写入/读取某种二进制表示形式,您将遇到字节排序问题。无论如何, wchar_t 可以在一个系统上表示 utf-32 编码(我相信 Linux 是这样做的),并且可以在另一系统上表示 UTF-16 编码(Windows 是这样做的)。如果您尝试在平台之间创建标准格式,则必须解决所有这些问题。
wchar_t is going to be a stickier wicket, possibly, than size_t. One could assume a maximum size for size_t (8 bytes say) and cast all variables to that before writing to file (or socket). One other thing to keep in mind is that you are going to have byte ordering issues if you are trying to write/read some sort of binary representation. Anyway, wchar_t may represent a utf-32 encoding on one system (I believe that Linux does this) and could represent a UTF-16 encoding on another system (windows does this). If you are trying to create a standard format between platforms, you are going to have to resolve all of these issues.
只需在内部使用 UTF-8,并在将参数传递给需要它的 Windows 函数时及时转换为 UTF-16。 UTF-32 可能永远不需要。由于处理单个字符而不是字符串通常是错误的(在 Unicode 意义上),因此使用大写或规范化 UTF-8 字符串并不比处理 UTF-32 字符串更困难。
Just work with UTF-8 internally, and convert to UTF-16 just-in-time when passing arguments to Windows functions that require it. UTF-32 is probably never needed. Since it's usually wrong (in a Unicode sense) to process individual characters instead of strings, it's no more difficult to work with capitalizing or normalizing a UTF-8 string than it is a UTF-32 string.