C 标准 malloc 字符的潜在问题

发布于 2024-08-07 13:23:27 字数 771 浏览 6 评论 0原文

在回答我的另一个答案的评论时 在这里,我发现了我认为可能是 C 标准中的一个漏洞(c1x,我没有检查过早期的标准,是的,我知道我不太可能地球上所有居民中只有一个人发现了标准中的错误)。信息如下:

  1. 第 6.5.3.4 节(“sizeof 运算符”)第 2 段指出“sizeof 运算符产生其操作数的大小(以字节为单位)”。
  2. 该节第 3 段指出:“当应用于 char、unsigned char 或signed char(或其限定版本)类型的操作数时,结果为 1”。
  3. 第 7.20.3.3 节描述了 void *malloc(size_t sz),但它只说“malloc 函数为大小由 size 指定且值不确定的对象分配空间”代码>.它根本没有提及参数使用什么单位。
  4. 附件 E 开头的 8 是 CHAR_BIT最小值值,因此字符的长度可以超过一个字节。

我的问题很简单:

在 char 为 16 位宽的环境中,malloc(10 * sizeof(char)) 会分配 10 个字符(20 个字节)还是 10 个字节?上面的第1点似乎表示前者,第2点表示后者。

有比我更了解 C-standard-fu 的人对此有答案吗?

When answering a comment to another answer of mine here, I found what I think may be a hole in the C standard (c1x, I haven't checked the earlier ones and yes, I know it's incredibly unlikely that I alone among all the planet's inhabitants have found a bug in the standard). Information follows:

  1. Section 6.5.3.4 ("The sizeof operator") para 2 states "The sizeof operator yields the size (in bytes) of its operand".
  2. Para 3 of that section states: "When applied to an operand that has type char, unsigned char, or signed char, (or a qualified version thereof) the result is 1".
  3. Section 7.20.3.3 describes void *malloc(size_t sz) but all it says is "The malloc function allocates space for an object whose size is specified by size and whose value is indeterminate". It makes no mention at all what units are used for the argument.
  4. Annex E startes the 8 is the minimum value for CHAR_BIT so chars can be more than one byte in length.

My question is simply this:

In an environment where a char is 16 bits wide, will malloc(10 * sizeof(char)) allocate 10 chars (20 bytes) or 10 bytes? Point 1 above seems to indicate the former, point 2 indicates the latter.

Anyone with more C-standard-fu than me have an answer for this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

怀中猫帐中妖 2024-08-14 13:23:27

在 16 位 char 环境中,malloc(10 * sizeof(char)) 将分配 10 个 char(10 个字节),因为如果 < code>char 是 16 位,那么该架构/实现将字节定义为 16 位。 char 不是一个八位组,而是一个字节。在较旧的计算机上,这可能大于我们今天使用的 8 位事实上标准。

C 标准的相关部分如下:

3.6 术语、定义和符号

字节 - 数据存储的可寻址单元,足够大以容纳执行环境的基本字符集的任何成员...

注 2 - 一个字节由连续的位序列组成,其数量由实现定义。

In a 16-bit char environment malloc(10 * sizeof(char)) will allocate 10 chars (10 bytes), because if char is 16 bits, then that architecture/implementation defines a byte as 16 bits. A char isn't an octet, it's a byte. On older computers this can be larger than the 8 bit de-facto standard we have today.

The relevant section from the C standard follows:

3.6 Terms, definitions and symbols

byte - addressable unit of data storage large enough to hold any member of the basic character set of the execution environment...

NOTE 2 - A byte is composed of a contiguous sequence of bits, the number of which is implementation-defined.

永不分离 2024-08-14 13:23:27

在 C99 标准中,字节、char 和对象大小之间的严格相关性在 6.2.6.1/4“类型表示 - 常规”中给出:

存储在任何其他对象类型的非位域对象中的值由 n × CHAR_BIT 位组成,其中 n 是该类型对象的大小,以字节为单位。该值可以复制到unsigned char [n]类型的对象中(例如,通过memcpy);生成的字节集称为值的对象表示。

在 C++ 标准中,3.9/2“类型”中给出了相同的关系:

对于 POD 类型 T 的任何对象(基类子对象除外),无论该对象是否持有类型 T 的有效值,组成该对象的底层字节 (1.7) 都可以复制到以下数组中:字符或无符号字符。如果将 char 或 unsigned char 数组的内容复制回对象中,则该对象随后应保留其原始值。

在 C90 中,似乎没有明确提到的相关性,但在字节的定义、字符的定义和 sizeof 运算符的定义之间,可以推断出 < code>char类型相当于一个字节。

另请注意,一个字节中的位数(以及 char 中的位数)是由实现定义的 — 严格来说,它不需要是 8 位。 onebyone 在其他地方的评论中指出,DSP 通常具有位数不是 8 的字节。

请注意,IETF RFC 和标准通常(总是?)使用术语“八位字节”而不是“字节”,以明确表示他们所说的单位正好有 8 位——不多也不少。

In the C99 standard the rigorous correlation between bytes, char, and object size is given in 6.2.6.1/4 "Representations of types - General":

Values stored in non-bit-field objects of any other object type consist of n × CHAR_BIT bits, where n is the size of an object of that type, in bytes. The value may be copied into an object of type unsigned char [n] (e.g., by memcpy); the resulting set of bytes is called the object representation of the value.

In the C++ standard the same relationship is given in 3.9/2 "Types":

For any object (other than a base-class subobject) of POD type T, whether or not the object holds a valid value of type T, the underlying bytes (1.7) making up the object can be copied into an array of char or unsigned char. If the content of the array of char or unsigned char is copied back into the object, the object shall subsequently hold its original value.

In C90 there doesn't appear to be as explicitly mentioned correlation, but between the definition of a byte, the definition of a character, and the definition of the sizeof operator the inference can be made that a char type is equivalent to a byte.

Also note that the number of bits in a byte (and the number of bits in a char) is implementation defined—strictly speaking it doesn't need to be 8 bits. And onebyone points out in a comment elsewhere that DSPs commonly have bytes with a number of bits that isn't 8.

Note that IETF RFCs and standards generally (always?) use the term 'octect' instead of 'byte' to be unambiguous that the units they're talking about have exactly 8 bits - no more, no less.

烟酒忠诚 2024-08-14 13:23:27

“size_t sz”的单位不是与您的架构的可寻址单元相同吗?我使用的 DSP 的地址对应于 32 位值,而不是字节。 malloc(1) 获取一个指向 4 字节区域的指针。

Aren't the units of "size_t sz" in whatever the addressable unit of your architecture is? I work with a DSP whose addresses correspond to 32-bit values, not bytes. malloc(1) gets me a pointer to a 4-byte area.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文