结构填充和包装
考虑:
struct mystruct_A
{
char a;
int b;
char c;
} x;
struct mystruct_B
{
int b;
char a;
} y;
结构的大小分别为 12 和 8。
这些结构是填充的还是包装的?
何时进行填充或包装?
Consider:
struct mystruct_A
{
char a;
int b;
char c;
} x;
struct mystruct_B
{
int b;
char a;
} y;
The sizes of the structures are 12 and 8 respectively.
Are these structures padded or packed?
When does padding or packing take place?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
填充 将结构成员与“自然”地址边界对齐 - 例如,< code>int 成员会有偏移量,在 32 位平台上为
mod(4) == 0
。默认情况下,填充处于启用状态。它将以下“间隙”插入到您的第一个结构中:打包,另一方面,会阻止编译器进行填充 - 这必须明确请求 - 在 GCC 下它是
__attribute__((__packed__))
,因此以下内容:将在 32 位体系结构上生成大小为
6
的结构。但需要注意的是,未对齐的内存访问在允许的架构(如 x86 和 amd64)上速度较慢,并且在 SPARC 等严格对齐架构上被明确禁止。
Padding aligns structure members to "natural" address boundaries - say,
int
members would have offsets, which aremod(4) == 0
on 32-bit platform. Padding is on by default. It inserts the following "gaps" into your first structure:Packing, on the other hand prevents compiler from doing padding - this has to be explicitly requested - under GCC it's
__attribute__((__packed__))
, so the following:would produce structure of size
6
on a 32-bit architecture.A note though - unaligned memory access is slower on architectures that allow it (like x86 and amd64), and is explicitly prohibited on strict alignment architectures like SPARC.
(上面的答案很清楚地解释了原因,但似乎对填充的大小并不完全清楚,所以,我将根据我从 结构打包的失落艺术,它已经发展到不仅限于
C
,而是也适用于Go
、Rust
。)内存对齐(对于结构)
规则:
例如,在许多系统上,
int
应该从可被 4 整除的地址开始,而short
则应从可被 2 整除的地址开始。char
和char[ ]
很特殊,可以是任何内存地址,因此它们之前不需要填充。struct
,除了每个单独成员的对齐需要之外,整个结构本身的大小将通过在末尾进行填充来对齐到可被其任何成员的最严格对齐要求整除的大小。例如,在许多系统上,如果结构的最大成员是
int
,则可以被4整除,如果short
,则可以被2整除。成员顺序:
例如,下面示例中的
stu_c
和stu_d
具有相同的成员,但顺序不同,并导致这两个结构的大小不同。内存中的地址(对于结构)
空空间:
例如,在下面的
test_struct_address()
中,变量x
位于相邻的结构g
和h
之间。无论
x
是否声明,h
的地址都不会改变,x
只是复用了g
的空白空间代码>浪费了。y
的情况类似。示例
(对于 64 位系统)
memory_align.c:
执行结果 -
test_struct_padding()
:< /strong>执行结果 -
test_struct_address()
:因此每个变量的起始地址为 g:d0 x:d h:e0 y:e8
(The above answers explained the reason quite clearly, but seems not totally clear about the size of padding, so, I will add an answer according to what I learned from The Lost Art of Structure Packing, it has evolved to not limit to
C
, but also applicable toGo
,Rust
.)Memory align (for struct)
Rules:
E.g., on many systems, an
int
should start at an address divisible by 4 and ashort
by 2.char
andchar[]
are special, could be any memory address, so they don't need padding before them.struct
, other than the alignment need for each individual member, the size of whole struct itself will be aligned to a size divisible by strictest alignment requirement of any of its members, by padding at end.E.g., on many systems, if struct's largest member is
int
then by divisible by 4, ifshort
then by 2.Order of member:
E.g., the
stu_c
andstu_d
from example below have the same members, but in different order, and result in different size for the 2 structs.Address in memory (for struct)
Empty space:
e.g in
test_struct_address()
below, the variablex
resides between adjacent structg
andh
.No matter whether
x
is declared,h
's address won't change,x
just reused the empty space thatg
wasted.Similar case for
y
.Example
(for 64 bit system)
memory_align.c:
Execution result -
test_struct_padding()
:Execution result -
test_struct_address()
:Thus address start for each variable is g:d0 x:dc h:e0 y:e8
我知道这个问题很老了,这里的大多数答案都很好地解释了填充,但是在尝试自己理解它时,我认为对正在发生的事情有一个“视觉”图像会有所帮助。
处理器以一定大小(字)的“块”读取内存。假设处理器字长为 8 个字节。它将内存视为一大排 8 字节构建块。每当它需要从内存中获取一些信息时,它就会到达这些块之一并获取它。
如上图所示,Char(1 字节长)在哪里并不重要,因为它会位于这些块之一内,只需要 CPU 处理 1 个字。
当我们处理大于 1 字节的数据时,例如 4 字节 int 或 8 字节 double,它们在内存中的对齐方式会影响 CPU 必须处理的字数。如果 4 字节块以某种方式对齐,它们总是适合块内部(内存地址是 4 的倍数),则只需处理一个字。否则,一块 4 字节的块可能会在一个块上有一部分,在另一个块上有一部分,从而需要处理器处理 2 个字才能读取该数据。
这同样适用于 8 字节双精度数,只不过现在它必须位于 8 倍的内存地址中,以保证它始终位于块内。
这里考虑的是 8 字节字处理器,但该概念也适用于其他大小的字。
填充的工作原理是填充这些数据之间的间隙,以确保它们与这些块对齐,从而提高读取内存时的性能。
然而,正如其他答案所述,有时空间比性能本身更重要。也许您正在一台没有太多 RAM 的计算机上处理大量数据(可以使用交换空间,但速度要慢得多)。您可以在程序中排列变量,直到完成最少的填充(因为在其他一些答案中得到了很好的例证),但如果这还不够,您可以显式禁用填充,这就是打包。
I know this question is old and most answers here explains padding really well, but while trying to understand it myself I figured having a "visual" image of what is happening helped.
The processor reads the memory in "chunks" of a definite size (word). Say the processor word is 8 bytes long. It will look at the memory as a big row of 8 bytes building blocks. Every time it needs to get some information from the memory, it will reach one of those blocks and get it.
As seem in the image above, doesn't matter where a Char (1 byte long) is, since it will be inside one of those blocks, requiring the CPU to process only 1 word.
When we deal with data larger than one byte, like a 4 byte int or a 8 byte double, the way they are aligned in the memory makes a difference on how many words will have to be processed by the CPU. If 4-byte chunks are aligned in a way they always fit the inside of a block (memory address being a multiple of 4) only one word will have to be processed. Otherwise a chunk of 4-bytes could have part of itself on one block and part on another, requiring the processor to process 2 words to read this data.
The same applies to a 8-byte double, except now it must be in a memory address multiple of 8 to guarantee it will always be inside a block.
This considers a 8-byte word processor, but the concept applies to other sizes of words.
The padding works by filling the gaps between those data to make sure they are aligned with those blocks, thus improving the performance while reading the memory.
However, as stated on others answers, sometimes the space matters more then performance itself. Maybe you are processing lots of data on a computer that doesn't have much RAM (swap space could be used but it is MUCH slower). You could arrange the variables in the program until the least padding is done (as it was greatly exemplified in some other answers) but if that's not enough you could explicitly disable padding, which is what packing is.
结构填充抑制结构填充,当对齐最重要时使用填充,当空间最重要时使用填充。
一些编译器提供#pragma 来抑制填充或将其打包为 n 个字节。有些提供关键字来执行此操作。通常用于修改结构填充的编译指示将采用以下格式(取决于编译器):
例如 ARM 提供 __packed 关键字来抑制结构填充。请阅读编译器手册以了解有关此内容的更多信息。
因此,压缩结构是一种没有填充的结构。
通常,压缩结构将用于
节省空间
格式化数据结构以通过网络传输,使用一些
协议(这当然不是一个好的做法,因为你需要
处理字节顺序)
Structure packing suppresses structure padding, padding used when alignment matters most, packing used when space matters most.
Some compilers provide
#pragma
to suppress padding or to make it packed to n number of bytes. Some provide keywords to do this. Generally pragma which is used for modifying structure padding will be in the below format (depends on compiler):For example ARM provides the
__packed
keyword to suppress structure padding. Go through your compiler manual to learn more about this.So a packed structure is a structure without padding.
Generally packed structures will be used
to save space
to format a data structure to transmit over network using some
protocol (this is not a good practice of course because you need to
deal with endianness)
填充和打包只是同一件事的两个方面:
在
mystruct_A
中,假设默认对齐为4、每个成员按4字节的倍数对齐。由于char
的大小为 1,因此a
和c
的填充为 4 - 1 = 3 个字节,而不需要填充>int b
已经是 4 个字节了。mystruct_B
的工作方式相同。Padding and packing are just two aspects of the same thing:
In
mystruct_A
, assuming a default alignment of 4, each member is aligned on a multiple of 4 bytes. Since the size ofchar
is 1, the padding fora
andc
is 4 - 1 = 3 bytes while no padding is required forint b
which is already 4 bytes. It works the same way formystruct_B
.变量存储在可被其对齐方式(通常是其大小)整除的任何地址。因此,填充/打包不仅仅适用于结构。实际上,所有数据都有自己的对齐要求:
这与struct类似,但也有一些区别。首先,我们可以说有两种填充 - a) 为了正确地从每个成员的地址开始,在成员之间插入一些字节。 b) 为了正确地从其地址开始下一个结构实例,需要将一些字节附加到每个结构:
4< /code>,
int32_t
的对齐方式)。这与普通变量不同。普通变量可以从任何可被其对齐整除的地址开始,但结构体的第一个成员的情况并非如此。如您所知,结构体的地址与其第一个成员的地址相同。arr[1]
(arr[1]
的第一个成员)从可被 4 整除的地址开始,我们应该在每个结构体的末尾附加 3 个字节。这是我从失落的结构包装艺术中学到的。
注意:您可以通过
_Alignof
运算符研究数据类型的对齐要求。另外,您可以通过offsetof
宏获取结构体中成员的偏移量。The variables are stored at any addresses divisible by its alignment(by its size generally). So, padding/packing is not just for struct only. Actually, all data has its own alignment requirement:
This is similar to struct, but there are some differences. First, we can say there are two kinds of padding— a) To start each member from its address properly, some bytes are inserted between members. b) To start next struct instance from its address properly, some bytes are appended to each struct:
4
, alignment ofint32_t
). This is different with normal variables. The normal variables can start any addresses divisible by its alignment, but it is not the case for struct's first member. As you know, the address of a struct is the same as the address of its first member.struct st arr[2];
. To makearr[1]
(arr[1]
's first member) starting from an address divisible by 4, we should append 3 bytes at the end of each struct.This is what i learned from The Lost Art of Structure Packing.
NOTE : You can investigate what the data type's alignment requirement is through
_Alignof
operator. Also, you can get member's offset inside a struct throughoffsetof
macro.填充规则:
在元素之间或结构的末尾插入填充以确保满足此规则。这样做是为了让硬件更轻松、更高效地访问总线。
为什么规则 2:
考虑以下结构,
如果我们要创建此结构的数组(包含 2 个结构),
末尾不需要填充:
因此,结构的大小 = 8 字节
假设我们要创建另一个结构,如下所示:
png" rel="nofollow noreferrer">
如果我们要创建此结构的数组,
末尾所需的填充字节数有两种可能性。
A. 如果我们在末尾添加 3 个字节并将其对齐为 int 而不是 Long:
B. 如果我们在末尾添加 7 个字节并将其对齐为 Long:
< /a>
第二个数组的起始地址是8的倍数(即24)。
结构的大小 = 24 字节
因此,通过将结构的下一个数组的起始地址对齐到最大成员的倍数(即,如果我们要创建此结构的数组,则第二个数组的首地址必须从结构体最大成员的倍数开始,我们可以计算出末尾所需的填充字节数。
Rules for padding:
Padding is inserted between elements or at the end of the struct to make sure this rule is met. This is done for easier and more efficient Bus access by the hardware.
Why Rule 2:
Consider the following struct,
If we were to create an array(of 2 structs) of this struct,
No padding will be required at the end:
Therefore, size of struct = 8 bytes
Assume we were to create another struct as below:
If we were to create an array of this struct,
there are 2 possibilities, of the number of bytes of padding required at the end.
A. If we add 3 bytes at the end and align it for int and not Long:
B. If we add 7 bytes at the end and align it for Long:
The start address of the second array is a multiple of 8(i.e 24).
The size of the struct = 24 bytes
Therefore, by aligning the start address of the next array of the struct to a multiple of the largest member(i.e if we were to create an array of this struct, the first address of the second array must start at an address which is a multiple of the largest member of the struct. Here it is, 24(3 * 8)), we can calculate the number of padding bytes required at the end.
它们有衬垫。
最初浮现在脑海中的唯一可能的情况是,如果
char
和int
的大小相同,那么最小的char/int/char
结构的大小不允许填充,int/char
结构也是如此。但是,这需要
sizeof(int)
和sizeof(char)
都为4(以获得十二个和八个大小)。整个理论分崩离析,因为标准保证sizeof(char)
始终为 1。如果
char
和int
宽度相同,则大小将是一加一,而不是四加四。因此,为了获得 12 的大小,必须在最后一个字段之后进行填充。每当编译器实现需要它时。编译器可以自由地在字段之间以及最后一个字段之后插入填充(但不在第一个字段之前)。
这样做通常是为了性能,因为某些类型在特定边界上对齐时性能更好。甚至有一些架构在您尝试访问未对齐的数据时会拒绝运行(即崩溃)(是的,我正在看您, ARM)。
通常,您可以使用
#pragma pack
等特定于实现的功能来控制打包/填充(这实际上是同一范围的两端)。即使您在特定实现中无法做到这一点,您也可以在编译时检查代码以确保其满足您的要求(使用标准 C 功能,而不是特定于实现的内容)。例如:
如果这些结构中有任何填充,类似的东西将拒绝编译。
They're padded.
The only possibility that initially springs to mind, where they could be packed, is if
char
andint
were the same size, so that the minimum size of thechar/int/char
structure would allow for no padding, ditto for theint/char
structure.However, that would require both
sizeof(int)
andsizeof(char)
to be four (to get the twelve and eight sizes). The whole theory falls apart since it's guaranteed by the standard thatsizeof(char)
is always one.Were
char
andint
the same width, the sizes would be one and one, not four and four. So, in order to then get a size of twelve, there would have to be padding after the final field.Whenever the compiler implementation wants it to. Compilers are free to insert padding between fields, and following the final field (but not before the first field).
This is usually done for performance as some types perform better when they're aligned on specific boundaries. There are even some architectures that will refuse to function (i.e, crash) is you try to access unaligned data (yes, I'm looking at you, ARM).
You can generally control packing/padding (which is really opposite ends of the same spectrum) with implementation-specific features such as
#pragma pack
. Even if you cannot do that in your specific implementation, you can check your code at compile time to ensure it meets your requirement (using standard C features, not implementation-specific stuff).For example:
Something like this will refuse to compile if there is any padding in those structures.
仅当您明确告诉编译器打包结构时,才会完成结构打包。填充就是您所看到的。您的 32 位系统正在填充每个字段以进行字对齐。如果您告诉编译器打包结构,它们将分别为 6 和 5 字节。但不要这样做。它不可移植,并使编译器生成速度慢得多(有时甚至有错误)的代码。
Structure packing is only done when you tell your compiler explicitly to pack the structure. Padding is what you're seeing. Your 32-bit system is padding each field to word alignment. If you had told your compiler to pack the structures, they'd be 6 and 5 bytes, respectively. Don't do that though. It's not portable and makes compilers generate much slower (and sometimes even buggy) code.
没有什么但是!想要掌握这个主题,必须做到以下几点,
There are no buts about it! Who want to grasp the subject must do the following ones,
数据结构对齐是数据在计算机内存中排列和访问的方式。它由两个独立但相关的问题组成:数据对齐和数据结构填充。当现代计算机读取或写入内存地址时,它将以字大小的块(例如,32 位系统上的 4 字节块)或更大的形式执行此操作。数据对齐意味着将数据放置在等于字大小的某个倍数的内存地址处,这会由于 CPU 处理内存的方式而提高系统的性能。为了对齐数据,可能需要在最后一个数据结构的末尾和下一个数据结构的开头之间插入一些无意义的字节,这就是数据结构填充。
Data structure alignment is the way data is arranged and accessed in computer memory. It consists of two separate but related issues: data alignment and data structure padding. When a modern computer reads from or writes to a memory address, it will do this in word sized chunks (e.g. 4 byte chunks on a 32-bit system) or larger. Data alignment means putting the data at a memory address equal to some multiple of the word size, which increases the system’s performance due to the way the CPU handles memory. To align the data, it may be necessary to insert some meaningless bytes between the end of the last data structure and the start of the next, which is data structure padding.