C/C++获取结构体大小
今天,我惊讶地发现,
当 sizeof 运算符应用于类、结构体或联合类型时,结果是该类型对象中的字节数,加上为在字边界上对齐成员而添加的任何填充。结果不一定对应于通过添加各个成员的存储要求计算出的大小。
我不知道这一点,而且我很确定这件事正在破坏我的一些旧代码:要读取我曾经拥有这样的结构的二进制文件:
struct Header
{
union {
char identc[4];
uint32 ident;
};
uint16 version;
};
并使用 fread
由 sizeof
驱动:
fread( &header, sizeof(header), 1, f );
但现在 sizeof(header)
返回 8
!
是否有可能旧版 GCC sizeof(header)
返回 6
,或者我的思维完全消失了?
无论如何,是否有任何其他运算符(或预处理器指令或其他)可以让编译器知道结构有多大——不包括填充?
否则,从文件中读取原始数据结构而不需要编写太多代码的干净方法是什么?
编辑: 我知道这不是读取/写入二进制数据的正确方法:根据机器字节顺序和其他内容,我会得到不同的结果。无论如何,这种方法是最快的方法,我只是尝试读取一些二进制数据以快速获取其内容,而不是编写一个我将来要使用或发布的好的应用程序。
Today, with my great surprise, I discovered that
When the sizeof operator is applied to a class, struct, or union type, the result is the number of bytes in an object of that type, plus any padding added to align members on word boundaries. The result does not necessarily correspond to the size calculated by adding the storage requirements of the individual members.
I didn't know of it, and am pretty sure this thing is breaking some of my old code: to read binary files I used to have structs like this one:
struct Header
{
union {
char identc[4];
uint32 ident;
};
uint16 version;
};
and to read those 6 bytes directly with fread
driven by sizeof
:
fread( &header, sizeof(header), 1, f );
But now sizeof(header)
returns 8
!
Is it possible that with older GCC versions sizeof(header)
returned 6
, or I my mind is totally gone?
Anyway is there any other operator (or preprocessor directive or whatever) that lets the compiler know how big the structs is -- excluding padding?
Otherwise what would be a clean way to read a raw-data struct from a file that doesn't require to write too much code?
EDIT:
I know that this isn't the correct way to read/write binary data: I'd have different result depending on machine endianess and stuff. Anyway this method is the fastest one, I'm juist trying to read some binary data to quickly get its content, not to write a good application which I'm going to use in future or to release.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
您想要的是#pragma pack 命令。这允许您将包装设置为您想要的任何数量。通常,您会在结构定义之前将打包值设置为 1 (或者是 0?),然后在定义之后将其返回到默认值。
请注意,这并不能保证系统之间的可移植性。
另请参阅: use-of-pragma-in-c 以及有关 SO 的各种其他问题
What you want is the #pragma pack command. This allows you to set the packing to any amount you want. Typically you would set the packing value to 1 (or is it 0? ) before your structure definition and then return it to the default value after the definition.
Note that this does not do anything to guarantee portability between systems.
See also: use-of-pragma-in-c and various other questions on SO
是的,您提供的代码不可移植。不仅结构大小而且字节顺序也可能不同。
Yes the code you presented isn't portable. Not only structure sizes but also byte orders might differ.
这不是处理二进制文件的正确方法。除了对齐问题之外,它还存在字节序问题。读取二进制文件的正确方法是使用 uint8_t 数组(或 unsigned char,这并不重要)和您自己的函数来构建内存中表示出数据。
This is not the correct way to process binary files. Aside from alignment issues, it also has endian issues. The proper way to read binary files is with an array of
uint8_t
(orunsigned char
, it really doesn't matter) and your own functions to built an in-memory representation out of the data.大多数编译都提供特定的扩展,允许您控制结构的打包。这应该允许您控制它。但是,当您以二进制形式写入结构时,您应该能够直接写入并读取它,而不管打包如何,因为当您写入结构时,它也应该写入 sizeof(struct) 字节。唯一会造成麻烦的情况是,如果您想读取使用以前版本创建的文件。此外,您还需要考虑字节顺序问题等。
Most compiles provide for a specific extension that allows you to control the packing of structs. This should allow you to control it. However, when you write the struct in binary, you should be able to just write it and read it regardless of packing, as when you write the struct, it should also write sizeof(struct) bytes. The only case where this would be a trouble is if you wanted to read files created with the previous versions. Also, you need to consider byte-order issues, etc.
您的问题是特定于编译器的,但通常如果您构建的结构使每个成员位于与其自身大小相同的边界上(边界上的四个字节元素可被四整除等),您将得到您想要的行为。还要注意像您提出的这样的情况,其中填充位于结构的末尾,以对齐下一个结构的第一个元素的开头(如果它们布置在数组中)。
Your question is compiler specific, but generally if you build your structure such that each member lies on a boundary of the same size as itself (four byte elements on boundaries divisible by four, etc.), you'll get the behavior you want. Watch also for cases like the one you presented where padding comes at the end of a structure to align the start of the first element of the next structure--if they were laid out in an array.
看来您实际上并没有提出问题,所以我不确定为什么我要尝试回答!但是,是的,打包很重要,并且会根据编译器版本、标志、目标架构编译指示、风向、月相以及可能的许多其他因素而变化。将二进制文件转储到文件(或套接字)并不是序列化任何内容的好方法。
It seems that you havn'tactually asked a question so I'm not sure why I am even trying to answer! But yes, packing is important and will change depending on compiler versions, flags, target architecture pragmas, wind direction, phases of the moon and potentially many other things. Dumping binary to a file (or socket) is not a very good way of serializing anything.
当您创建这些结构的数组时,为了使成员正确对齐,需要额外的填充。如果没有它,数组的第二个元素的 ident 成员将在不是 4 的倍数的地址上对齐。
现在对此采取任何措施可能为时已晚,您可能会使用此结构编写文件前。更改包装将使这些文件无法读取。但是,是的,拥有依赖于编译器设置的文件数据并不是最好的主意。如今,以人类可读的格式存储数据很常见。磁盘字节和 CPU 周期都不值得。
This extra padding is necessary to get the members aligned properly when you create an array of these structures. Without it, the 2nd element of the array would have the ident member aligned on an address that's not a multiple of 4.
It is probably too late to do anything about it, you probably wrote files with this structure before. Changing the packing will make these files unreadable. But, yes, having file data that's dependent on compiler settings isn't the greatest idea. Having data stored in a human-readable format is common these days. Neither the disk bytes nor the CPU cycles are worth it.
是的,对齐问题。这就是为什么互联网协议消息具有对齐结构,以便在通过网络发送数据时可以避免此问题。
您可以做的就是修复结构以使它们正确对齐,或者拥有在保存和检索数据时使用的编组函数。
Yes, the alignment problem. That is why internet protocol messages have aligned structs so that this problem can be avoided when sending data over the network.
What you can do is either fix your structs so that they are aligned properly, or have marshalling functions that you use when saving and retrieving data.