C++数据成员对齐和数组打包

发布于 2024-08-10 11:30:09 字数 495 浏览 14 评论 0原文

在代码审查期间，我遇到了一些定义简单结构的代码，如下所示：

class foo {
   unsigned char a;
   unsigned char b;
   unsigned char c;
}

在其他地方，定义了这些对象的数组：

foo listOfFoos[SOME_NUM];

稍后，这些结构被原始复制到缓冲区中：

memcpy(pBuff,listOfFoos,3*SOME_NUM);

该代码依赖于以下假设： .) foo 的大小为 3，并且不应用填充，并且 b.) 这些对象的数组被打包在一起，它们之间没有填充。

我已经在两个平台（RedHat 64b、Solaris 9）上使用 GNU 进行了尝试，并且它在这两个平台上都有效。

上述假设有效吗？如果不是，在什么条件下（例如操作系统/编译器的更改）它们可能会失败？

原文

During a code review I've come across some code that defines a simple structure as follows:

class foo {
   unsigned char a;
   unsigned char b;
   unsigned char c;
}

Elsewhere, an array of these objects is defined:

foo listOfFoos[SOME_NUM];

Later, the structures are raw-copied into a buffer:

memcpy(pBuff,listOfFoos,3*SOME_NUM);

This code relies on the assumptions that: a.) The size of foo is 3, and no padding is applied, and b.) An array of these objects is packed with no padding between them.

I've tried it with GNU on two platforms (RedHat 64b, Solaris 9), and it worked on both.

Are the assumptions above valid? If not, under what conditions (e.g. change in OS/compiler) might they fail?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

凹づ凸ル 2024-08-17 11:30:10

这一切都归结为内存对齐。典型的 32 位机器每次尝试读取或写入 4 字节内存。此结构不会出现任何问题，因为它很容易低于 4 个字节，并且不会出现令人困惑的填充问题。

现在，如果结构是这样的：

class foo {
   unsigned char a;
   unsigned char b;
   unsigned char c;
   unsigned int i;
   unsigned int j;
}

您的同事逻辑可能会导致

memcpy(pBuff,listOfFoos,11*SOME_NUM);

(3 个字符 = 3 个字节，2 个整数 = 2*4 个字节，所以 3 + 8)

不幸的是，由于填充，结构实际上占用了 12 个字节。这是因为您无法将三个 char 和一个 int 放入该 4 字节字中，因此那里有一个字节的填充空间，它将 int 推入它自己的字中。数据类型变得越来越多样化，这变得越来越成为一个问题。

It all comes down to memory alignment. Typical 32-bit machines read or write 4 bytes of memory per attempt. This structure is safe from problems because it falls under that 4 bytes easily with no confusing padding issues.

Now if the structure was as such:

class foo {
   unsigned char a;
   unsigned char b;
   unsigned char c;
   unsigned int i;
   unsigned int j;
}

Your coworkers logic would probably lead to

memcpy(pBuff,listOfFoos,11*SOME_NUM);

(3 char's = 3 bytes, 2 ints = 2*4 bytes, so 3 + 8)

Unfortunately, due to padding the structure actually takes up 12 bytes. This is because you cannot fit three char's and an int into that 4 byte word, and so there's one byte of padded space there which pushes the int into it's own word. This becomes more and more of a problem the more diverse the data types become.

回复收藏 0 原文

隐诗 2024-08-17 11:30:10

对于使用此类内容且我无法避免的情况，我会尝试在假设不再成立时使编译中断。我使用类似以下内容（或 Boost.StaticAssert 如果情况允许）：

static_assert(sizeof(foo) <= 3);

// Macro for "static-assert" (only usefull on compile-time constant expressions)
#define static_assert(exp)           static_assert_II(exp, __LINE__)
// Macro used by static_assert macro (don't use directly)
#define static_assert_II(exp, line)  static_assert_III(exp, line)
// Macro used by static_assert macro (don't use directly)
#define static_assert_III(exp, line) enum static_assertion##line{static_assert_line_##line = 1/(exp)}

For situations where stuff like this is used, and I can't avoid it, I try to make the compilation break when the presumptions no longer hold. I use something like the following (or Boost.StaticAssert if the situation allows):

static_assert(sizeof(foo) <= 3);

// Macro for "static-assert" (only usefull on compile-time constant expressions)
#define static_assert(exp)           static_assert_II(exp, __LINE__)
// Macro used by static_assert macro (don't use directly)
#define static_assert_II(exp, line)  static_assert_III(exp, line)
// Macro used by static_assert macro (don't use directly)
#define static_assert_III(exp, line) enum static_assertion##line{static_assert_line_##line = 1/(exp)}

回复收藏 0 原文

梦魇绽荼蘼 2024-08-17 11:30:10

我认为我会安全地用 sizeof(foo) 替换神奇的数字 3。

我的猜测是，针对未来处理器架构优化的代码可能会引入某种形式的填充。

试图追踪这种错误是一件非常痛苦的事情！

回复收藏 0 原文

反差帅 2024-08-17 11:30:10

正如其他人所说，使用 sizeof(foo) 是一个更安全的选择。一些编译器（尤其是嵌入式世界中深奥的编译器）会向类添加 4 字节标头。其他人可以执行时髦的内存对齐技巧，具体取决于您的编译器设置。

对于主流平台来说，你可能没问题，但这并不能保证。

回复收藏 0 原文

全部不再 2024-08-17 11:30:10

当您在两台计算机之间传递数据时，sizeof() 可能仍然存在问题。在其中一个上，代码可能会使用填充进行编译，而在另一个上则不使用填充，在这种情况下， sizeof() 会给出不同的结果。如果数组数据从一台计算机传递到另一台计算机，则会被误解，因为无法在预期的位置找到数组元素。
一种解决方案是确保尽可能使用 #pragma pack(1)，但这对于数组来说可能还不够。最好的方法是预见问题并使用每个数组元素 8 字节倍数的填充。

回复收藏 0 原文

神魇的王 2024-08-17 11:30:09

这样做肯定会更安全：

sizeof(foo) * SOME_NUM

It would definitely be safer to do:

sizeof(foo) * SOME_NUM

回复收藏 0 原文

丑疤怪 2024-08-17 11:30:09

对象数组需要是连续的，因此对象之间永远不会填充，尽管可以将填充添加到对象的末尾（产生几乎相同的效果）。

鉴于您正在使用 char，这些假设通常可能是正确的，但 C++ 标准当然不能保证这一点。不同的编译器，甚至只是传递给当前编译器的标志的更改都可能导致在结构的元素之间插入填充或在结构的最后一个元素之后插入填充，或两者兼而有之。

回复收藏 0 原文

南烟 2024-08-17 11:30:09

如果你像这样复制你的数组，你应该使用

memcpy(pBuff,listOfFoos,sizeof(listOfFoos));

只要你将 pBuff 分配给相同的大小，这将始终有效。
这样您就根本不会对填充和对齐做出任何假设。

大多数编译器将结构或类与所包含的最大类型所需的对齐方式对齐。对于字符来说，这意味着没有对齐和填充，但是如果您添加一个短字符，例如您的类将有 6 个字节大，并在最后一个字符和短字符之间添加一个字节的填充。

If you copy your array like this you should use

memcpy(pBuff,listOfFoos,sizeof(listOfFoos));

This will always work as long as you allocated pBuff to the same size.
This way you are making no assumptions on padding and alignment at all.

Most compilers align a struct or class to the required alignment of the largest type included. In your case of chars that means no alignment and padding, but if you add a short for example your class would be 6 bytes large with one byte of padding added between the last char and your short.

回复收藏 0 原文

沒落の蓅哖 2024-08-17 11:30:09

我认为这有效的原因是结构中的所有字段都是 char 对齐的。如果至少有一个字段不对齐 1，则结构/类的对齐方式将不会为 1（对齐方式将取决于字段顺序和对齐方式）。

让我们看一个例子：

#include <stdio.h>
#include <stddef.h>

typedef struct {
    unsigned char a;
    unsigned char b;
    unsigned char c;
} Foo;
typedef struct {
    unsigned short i;
    unsigned char  a;
    unsigned char  b;
    unsigned char  c;
} Bar;
typedef struct { Foo F[5]; } F_B;
typedef struct { Bar B[5]; } B_F;


#define ALIGNMENT_OF(t) offsetof( struct { char x; t test; }, test )

int main(void) {
    printf("Foo:: Size: %d; Alignment: %d\n", sizeof(Foo), ALIGNMENT_OF(Foo));
    printf("Bar:: Size: %d; Alignment: %d\n", sizeof(Bar), ALIGNMENT_OF(Bar));
    printf("F_B:: Size: %d; Alignment: %d\n", sizeof(F_B), ALIGNMENT_OF(F_B));
    printf("B_F:: Size: %d; Alignment: %d\n", sizeof(B_F), ALIGNMENT_OF(B_F));
}

执行时，结果是：

Foo:: Size: 3; Alignment: 1
Bar:: Size: 6; Alignment: 2
F_B:: Size: 15; Alignment: 1
B_F:: Size: 30; Alignment: 2

你可以看到 Bar 和 F_B 的对齐方式为 2，这样它的字段 i 就会正确对齐。您还可以看到条形的大小是 6 而不是 5。同样，B_F 的大小（Bar 的 5）是 30 而不是 25。

所以，如果你是硬编码而不是sizeof(...)，你会在这里遇到问题。

希望这有帮助。

I think the reason that this works because all of the fields in the structure are char which align one. If there is at least one field that does not align 1, the alignment of the structure/class will not be 1 (the alignment will depends on the field order and alignment).

Let see some example:

#include <stdio.h>
#include <stddef.h>

typedef struct {
    unsigned char a;
    unsigned char b;
    unsigned char c;
} Foo;
typedef struct {
    unsigned short i;
    unsigned char  a;
    unsigned char  b;
    unsigned char  c;
} Bar;
typedef struct { Foo F[5]; } F_B;
typedef struct { Bar B[5]; } B_F;


#define ALIGNMENT_OF(t) offsetof( struct { char x; t test; }, test )

int main(void) {
    printf("Foo:: Size: %d; Alignment: %d\n", sizeof(Foo), ALIGNMENT_OF(Foo));
    printf("Bar:: Size: %d; Alignment: %d\n", sizeof(Bar), ALIGNMENT_OF(Bar));
    printf("F_B:: Size: %d; Alignment: %d\n", sizeof(F_B), ALIGNMENT_OF(F_B));
    printf("B_F:: Size: %d; Alignment: %d\n", sizeof(B_F), ALIGNMENT_OF(B_F));
}

When executed, the result is:

Foo:: Size: 3; Alignment: 1
Bar:: Size: 6; Alignment: 2
F_B:: Size: 15; Alignment: 1
B_F:: Size: 30; Alignment: 2

You can see that Bar and F_B has alignment 2 so that its field i will be properly aligned. You can also see that Size of Bar is 6 and not 5. Similarly, the size of B_F (5 of Bar) is 30 and not 25.

So, if you is a hard code instead of sizeof(...), you will get a problem here.

Hope this helps.

回复收藏 0 原文

~没有更多了~