C 标准 malloc 字符的潜在问题
在回答我的另一个答案的评论时 在这里,我发现了我认为可能是 C 标准中的一个漏洞(c1x,我没有检查过早期的标准,是的,我知道我不太可能地球上所有居民中只有一个人发现了标准中的错误)。信息如下:
- 第 6.5.3.4 节(“sizeof 运算符”)第 2 段指出“sizeof 运算符产生其操作数的大小(以字节为单位)”。
- 该节第 3 段指出:“当应用于 char、unsigned char 或signed char(或其限定版本)类型的操作数时,结果为 1”。
- 第 7.20.3.3 节描述了
void *malloc(size_t sz)
,但它只说“malloc 函数为大小由 size 指定且值不确定的对象分配空间”
代码>.它根本没有提及参数使用什么单位。 - 附件 E 开头的 8 是
CHAR_BIT
的最小值值,因此字符的长度可以超过一个字节。
我的问题很简单:
在 char 为 16 位宽的环境中,malloc(10 * sizeof(char)) 会分配 10 个字符(20 个字节)还是 10 个字节?上面的第1点似乎表示前者,第2点表示后者。
有比我更了解 C-standard-fu 的人对此有答案吗?
When answering a comment to another answer of mine here, I found what I think may be a hole in the C standard (c1x, I haven't checked the earlier ones and yes, I know it's incredibly unlikely that I alone among all the planet's inhabitants have found a bug in the standard). Information follows:
- Section 6.5.3.4 ("The sizeof operator") para 2 states
"The sizeof operator yields the size (in bytes) of its operand"
. - Para 3 of that section states:
"When applied to an operand that has type char, unsigned char, or signed char, (or a qualified version thereof) the result is 1"
. - Section 7.20.3.3 describes
void *malloc(size_t sz)
but all it says is"The malloc function allocates space for an object whose size is specified by size and whose value is indeterminate"
. It makes no mention at all what units are used for the argument. - Annex E startes the 8 is the minimum value for
CHAR_BIT
so chars can be more than one byte in length.
My question is simply this:
In an environment where a char is 16 bits wide, will malloc(10 * sizeof(char))
allocate 10 chars (20 bytes) or 10 bytes? Point 1 above seems to indicate the former, point 2 indicates the latter.
Anyone with more C-standard-fu than me have an answer for this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
在 16 位
char
环境中,malloc(10 * sizeof(char))
将分配 10 个char
(10 个字节),因为如果 < code>char 是 16 位,那么该架构/实现将字节定义为 16 位。char
不是一个八位组,而是一个字节。在较旧的计算机上,这可能大于我们今天使用的 8 位事实上标准。C 标准的相关部分如下:
In a 16-bit
char
environmentmalloc(10 * sizeof(char))
will allocate 10char
s (10 bytes), because ifchar
is 16 bits, then that architecture/implementation defines a byte as 16 bits. Achar
isn't an octet, it's a byte. On older computers this can be larger than the 8 bit de-facto standard we have today.The relevant section from the C standard follows:
在 C99 标准中,字节、
char
和对象大小之间的严格相关性在 6.2.6.1/4“类型表示 - 常规”中给出:在 C++ 标准中,3.9/2“类型”中给出了相同的关系:
在 C90 中,似乎没有明确提到的相关性,但在字节的定义、字符的定义和
sizeof
运算符的定义之间,可以推断出 < code>char类型相当于一个字节。另请注意,一个字节中的位数(以及
char
中的位数)是由实现定义的 — 严格来说,它不需要是 8 位。 onebyone 在其他地方的评论中指出,DSP 通常具有位数不是 8 的字节。请注意,IETF RFC 和标准通常(总是?)使用术语“八位字节”而不是“字节”,以明确表示他们所说的单位正好有 8 位——不多也不少。
In the C99 standard the rigorous correlation between bytes,
char
, and object size is given in 6.2.6.1/4 "Representations of types - General":In the C++ standard the same relationship is given in 3.9/2 "Types":
In C90 there doesn't appear to be as explicitly mentioned correlation, but between the definition of a byte, the definition of a character, and the definition of the
sizeof
operator the inference can be made that achar
type is equivalent to a byte.Also note that the number of bits in a byte (and the number of bits in a
char
) is implementation defined—strictly speaking it doesn't need to be 8 bits. And onebyone points out in a comment elsewhere that DSPs commonly have bytes with a number of bits that isn't 8.Note that IETF RFCs and standards generally (always?) use the term 'octect' instead of 'byte' to be unambiguous that the units they're talking about have exactly 8 bits - no more, no less.
“size_t sz”的单位不是与您的架构的可寻址单元相同吗?我使用的 DSP 的地址对应于 32 位值,而不是字节。 malloc(1) 获取一个指向 4 字节区域的指针。
Aren't the units of "size_t sz" in whatever the addressable unit of your architecture is? I work with a DSP whose addresses correspond to 32-bit values, not bytes. malloc(1) gets me a pointer to a 4-byte area.