使用可变大小数组在结构末尾填充似乎是错误的
考虑常见 64 位系统上的这些结构:
struct V1 { // size 1, alignment 1
uint8_t size; // offset 0, size 1, alignment 1
uint8_t data[]; // offset 1, size 0, alignment 1
};
struct V2 { // size 12, alignment 4
char c; // offset 0, size 1, alignment 1
int length; // offset 4, size 4, alignment 4
char b; // offset 8, size 1, alignment 1
short blob[]; // offset 10, size 0, alignment 2
};
在第一种情况下,data
成员位于结构的末尾,不占用任何空间。这会导致以下奇怪的情况:
struct V1 blobs[2];
&blobs[0].data == &blobs[1].size
幸运的是,C 标准 §6.7.2.1 第 3 段说:
结构或联合不得包含不完整或函数类型的成员,...除非具有多个命名成员的结构的最后一个成员可能具有不完整的数组类型;这样的结构(以及可能递归地包含属于此类结构的成员的任何联合)不应是结构的成员或数组的元素。
所以上面的数组是非法的,地址相同也没有问题。
如果我的代码在给定大小的情况下在预先分配的连续内存块中创建此类结构,该怎么办?创建 size == 0 的实例是否非法,因为这基本上是结构体的数组?
其次我对V2有一个问题。编译器在 V2 末尾添加额外的填充,因此大小是对齐的倍数。这对于数组中的结构是必需的,因此以下结构保持正确对齐。但 V2 绝对不能放置在数组中,所以我不明白为什么 V2 的末尾应该有任何填充。
事实上,我什至会说在那里添加填充是错误的。它混淆了给定长度的 blob 的结构大小的计算,因为现在必须考虑 blob 的偏移量而不是结构的大小。
align = _Alignof(struct V2);
needed_size = offsetof(struct V2, blob) + length; // beware of overflow
needed_size = (needed_size + align - 1) & (~align); // beware of overflow
我是否遗漏了为什么必须填充 struct V2 的原因?
Consider these structs on common 64bit system:
struct V1 { // size 1, alignment 1
uint8_t size; // offset 0, size 1, alignment 1
uint8_t data[]; // offset 1, size 0, alignment 1
};
struct V2 { // size 12, alignment 4
char c; // offset 0, size 1, alignment 1
int length; // offset 4, size 4, alignment 4
char b; // offset 8, size 1, alignment 1
short blob[]; // offset 10, size 0, alignment 2
};
In the first case the data
member is right at the end of the struct taking up no space. This causes the following odd-ness:
struct V1 blobs[2];
&blobs[0].data == &blobs[1].size
Luckily the C standard §6.7.2.1, paragraph 3 says:
A structure or union shall not contain a member with incomplete or function type,… except that the last member of a structure with more than one named member may have incomplete array type; such a structure (and any union containing, possibly recursively, a member that is such a structure) shall not be a member of a structure or an element of an array.
So the above array is illegal and there is no problem with the addresses being the same.
What if I have code that, given a size, creates such structures in a contiguous block of memory that was pre-allocated? Would it be illegal for it to create instances with size == 0 because that would basically be an array of the struct?
Secondly I have a problem with V2. The compiler adds extra padding at the end of V2 so the size is a multiple of the alignment. This is necessary for structs in an array so the following structs remain properly aligned. But V2 must never be placed in an array so I fail to see why there should be any padding at the end of V2.
In fact I would go so far as to say it is wrong to add padding there. It obfuscates calculating the size of the struct for a given length of blob
because now the offset of blob has to be considered instead of the size of the struct.
align = _Alignof(struct V2);
needed_size = offsetof(struct V2, blob) + length; // beware of overflow
needed_size = (needed_size + align - 1) & (~align); // beware of overflow
Is there something I'm missing why struct V2 must be padded?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
正如 @EricPostpischil 在评论中解释的那样,所讨论的约束不是关于内存中对象的布局,而是关于实际数组的声明元素类型。未声明为数组的对象在相关意义上不是数组,无论它看起来多么像数组,或者我们如何看待它或使用它。所以不,语言规范并不禁止您所描述的内容。
C 语言规范允许实现自行决定在任何成员(包括最后一个)之后填充结构布局。主要目的之一是允许结构成员正确对齐,包括但不限于在结构阵列内,但在结构布局中使用填充并不取决于存在基于对齐的理由。
“错误”这个词很强烈。特别是在语言律师问题的背景下,您应该用基于语言规范的论点来支持它。我认为你做不到。
不完全正确。如果您想计算结构实例可以容纳的最小可能大小,那么是的,您需要考虑 FAM 的偏移量。然而,
这不是填充的函数,而是 FAM 与结构大小不同的偏移量的函数。如果没有填充,这种情况就不会发生,但使用填充则不一定会发生。
如果您的空间有限,以至于无法为了更清晰的代码而容纳几个字节的过度分配,那么动态分配和 FAM 可能首先就不是一个好主意。特别是,分配器本身通常不以单字节粒度进行分配。
用
offsetof
表达式替换sizeof
表达式几乎不会造成混淆。甚至可能更清楚,因为 FAM 的名称实际上出现在大小计算中。然而,您的特定示例代码有些过于复杂,因为采用了不必要的措施来使分配大小成为结构对齐要求的倍数。尽管具有 FAM 的结构类型的大小不包括 FAM 本身的大小,但它确实包括倒数第二个成员和 FAM 之间的任何填充,甚至可能更多:
(C17 6.7.2.1/18)
因此,对于具有
fam
类型的灵活数组成员fam
类型的结构所需的空间,上限非常严格。 code>fam_t 可以计算为:这实际上是惯用的,但如果您更喜欢
绝对最小值,那么我认为这种形式没有什么令人反感的。
毫无疑问是这样,当你观察你的实现来填充它时,但实现不欠你一个解释。
尽管如此,您的实现很可能应用以下规则的组合:
这些都不是语言本身的规则,但它们在实践中相当常见。特别是,它们是 System V x86_64 ABI 的一部分,毫无疑问也是其他 ABI 的一部分。请注意,尽管这些规则确实起到了确保结构成员可以在结构数组内正确对齐的目的,但它们对于不允许作为数组元素类型的结构类型也不例外。
As @EricPostpischil explained in comments, the constraint in question is not about the layout of objects in memory, but rather about the declared element type of an actual array. An object that is not declared as an array is not an array in the relevant sense, no matter how array-like it may seem, or how we think about it or use it. So no, the language spec does not forbid what you describe.
The C language specification permits implementations to pad structure layouts after any member, including the last, at their own discretion. Among the primary purposes is to allow structure members to be properly aligned, including, but not limited to, within arrays of structures, but use of padding in structure layouts is not contingent on there being an alignment-based justification.
"Wrong" a strong word. Especially in the context of a language-lawyer question, you should back it up with an argument based on the language specification. I don't think you can do that.
Not exactly true. If you want to compute the minimum possible size into which an instance of your structure can fit then yes, you need to take the offset of the FAM into account. However,
That's not a function of there being padding, but rather of the offset of the FAM differing from the size of the structure. That can't happen without padding, but it doesn't have to happen with padding.
If you are so space-constrained that you cannot accommodate the possibility of a few bytes of overallocation for the sake of clearer code, then dynamic allocation and FAMs probably are not a good idea in the first place. In particular, the allocator itself typically does not allocate with single-byte granularity.
Substituting an
offsetof
expression for asizeof
expression is hardly obfuscatory. It might even be clearer, since then the name of the FAM actually appears in the size computation. Your particular example code is somewhat overcomplicated, however, by the unnecessary measure employed to make the allocation size a multiple of the structure's alignment requirement.Although the size of a structure type that has a FAM does not include the size of the FAM itself, it does include any padding between the penultimate member and the FAM, and possibly more:
(C17 6.7.2.1/18)
Thus, a pretty tight upper bound on the space needed for a structure of type
struct S
that has a flexible array memberfam
of typefam_t
can be calculated as:That is in fact idiomatic, but if you prefer
for the absolute minimum then I see nothing objectionable about that form.
Undoubtedly so, as you observe your implementation to pad it, but the implementation does not owe you an explanation.
Nevertheless, your implementation most likely applies a combination of rules such as these:
Neither of those is a rule of the language itself, but they are fairly common in practice. In particular, they are part of the System V x86_64 ABI, and undoubtedly of other ABIs, too. Note that although those rules do serve the purpose of ensuring that structure members can be properly aligned inside an array of structures, they make no exception for structure types that are not allowed to be the element type of an array.
这个答案解决了“我是否遗漏了什么为什么必须填充 struct V2?”
如果编译器没有将结构类型填充为其对齐要求的倍数,则某些结构类型将违反 C 2018 6.7.2.1 18 中的此规则:
要看到这一点,请在其中
int
为 4 个字节,具有 4 字节对齐要求:该结构体的成员需要 5 个字节,因此在数组中使用时必须将其填充到 8 个字节以满足对齐要求。接下来,我们添加灵活的数组成员:
该结构的非灵活成员也需要五个字节。灵活阵列不需要任何内容。如果编译器没有将其填充到 8 个字节,则它将比 struct s0 短,这违反了以下规则:其大小必须等于省略灵活数组成员,或者该大小加上更多填充。
这告诉我们为什么合格的编译器必须包含填充。但是,它没有告诉我们该规则的原因。除了将规则写入 C 标准以允许更少的填充会更加复杂之外,我没有看到任何其他情况。
关于对象大小的一些讨论
C 2018 标准的审查没有揭示任何明确表明对象的大小必须是其对齐要求的倍数的内容。显然,将对象放入数组的能力取决于此,但是缺乏大小是对齐要求的倍数的要求意味着可能有一些对象(除了具有灵活数组成员的结构之外)无法被放入数组中。用于数组;无法将对象放入数组中不会导致该要求的存在。
因此,将 struct s0 定义为 5 个字节且对齐要求为 4 个字节可能符合 C 实现,然后可以使 struct s1 也为 5 个字节具有四个字节的对齐要求。
This answer addresses “Is there something I'm missing why struct V2 must be padded?”
If a compiler did not pad a structure type to be a multiple of its alignment requirement, then some structure types would violate this rule in C 2018 6.7.2.1 18:
To see this, consider this structure in an implementation where
int
is four bytes and has a four-byte alignment requirement:This structure requires five bytes for its members, so it must be padded to eight bytes to satisfy the alignment requirements when used in an array. Next, we add flexible array member:
This structure also requires five bytes for its inflexible members. None are required for the flexible array. If the compiler did not pad it to eight bytes, it would be shorter than
struct s0
, which violates the rule that its size must be either as if the flexible array member were omitted or that size plus more padding.This tells us why a conforming compiler is constrained to include the padding. However, it does not tell us the reason for the rule. I see none except that it would be more complicated to write rules into the C standard to allow less padding.
Some Discussion About Object Size
Review of the C 2018 standard reveals nothing which explicitly says the size of an object must be a multiple of its alignment requirement. Obviously, the ability to put objects into an array depends on this, but the lack of a requirement that the size be a multiple of an alignment requirement would mean there might be some objects (besides a structure with flexible array member) that could not be used in arrays; the inability to put objects into an array would not cause the requirement to come into existence.
Thus, it might be conforming for a C implementation to define
struct s0
to be five bytes with an alignment requirement of four bytes, and then it could makestruct s1
also five bytes with an alignment requirement of four bytes.