管理 C++ 缓冲区中的对象，考虑对齐和内存布局假设

发布于 2024-07-11 11:54:59 字数 416 浏览 6 评论 0原文

我将对象存储在缓冲区中。现在我知道我不能对对象的内存布局做出假设。

如果我知道对象的总体大小，是否可以创建指向该内存的指针并在其上调用函数？

例如说我有以下课程：

[int,int,int,int,char,padding*3bytes,unsigned short int*]

1）如果我知道这个类的大小为 24 并且我知道它在内存中的起始地址虽然假设内存布局不安全，但将其转换为指针并调用该对象上访问这些成员的函数是否可以接受？（c++ 是否通过某种魔法知道成员的正确位置？）

2）如果这不安全/没问题，除了使用构造函数获取所有参数并一次将每个参数从缓冲区中取出一个之外，还有其他方法吗？

编辑：更改标题以使其更适合我的要求。

原文

I am storing objects in a buffer. Now I know that I cannot make assumptions about the memory layout of the object.

If I know the overall size of the object, is it acceptible to create a pointer to this memory and call functions on it?

e.g. say I have the following class:

[int,int,int,int,char,padding*3bytes,unsigned short int*]

1)
if I know this class to be of size 24 and I know the address of where it starts in memory
whilst it is not safe to assume the memory layout is it acceptible to cast this to a pointer and call functions on this object which access these members?
(Does c++ know by some magic the correct position of a member?)

2)
If this is not safe/ok, is there any other way other than using a constructor which takes all of the arguments and pulling each argument out of the buffer one at a time?

Edit: Changed title to make it more appropriate to what I am asking.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

过气美图社 2024-07-18 11:54:59

您可以创建一个构造函数来获取所有成员并分配它们，然后使用placement new。

class Foo
{
    int a;int b;int c;int d;char e;unsigned short int*f;
public:
    Foo(int A,int B,int C,int D,char E,unsigned short int*F) : a(A), b(B), c(C), d(D), e(E), f(F) {}
};

...
char *buf  = new char[sizeof(Foo)];   //pre-allocated buffer
Foo *f = new (buf) Foo(a,b,c,d,e,f);

这样做的优点是即使 v-table 也能正确生成。但请注意，如果您使用它进行序列化，则当您反序列化时，无符号短整型指针将不会指向任何有用的内容，除非您非常小心地使用某种方法将指针转换为偏移量，然后再返回。

this 指针上的各个方法是静态链接的，只是对函数的直接调用，this 是显式参数之前的第一个参数。

使用 this 指针的偏移量来引用成员变量。如果一个对象的布局如下：

0: vtable
4: a
8: b
12: c
etc...

a 将通过取消引用 this + 4 字节 来访问。

You can create a constructor that takes all the members and assigns them, then use placement new.

class Foo
{
    int a;int b;int c;int d;char e;unsigned short int*f;
public:
    Foo(int A,int B,int C,int D,char E,unsigned short int*F) : a(A), b(B), c(C), d(D), e(E), f(F) {}
};

...
char *buf  = new char[sizeof(Foo)];   //pre-allocated buffer
Foo *f = new (buf) Foo(a,b,c,d,e,f);

This has the advantage that even the v-table will be generated correctly. Note, however, if you are using this for serialization, the unsigned short int pointer is not going to point at anything useful when you deserialize it, unless you are very careful to use some sort of method to convert pointers into offsets and then back again.

Individual methods on a this pointer are statically linked and are simply a direct call to the function with this being the first parameter before the explicit parameters.

Member variables are referenced using an offset from the this pointer. If an object is laid out like this:

0: vtable
4: a
8: b
12: c
etc...

a will be accessed by dereferencing this + 4 bytes.

回复收藏 0 原文

秉烛思 2024-07-18 11:54:59

基本上，您建议做的是读取一堆（希望不是随机的）字节，将它们转换为已知对象，然后调用该对象上的类方法。它实际上可能有效，因为这些字节将最终出现在该类方法中的“this”指针中。但是，您确实在冒险，因为事情并非编译后的代码所期望的那样。与 Java 或 C# 不同，没有真正的“运行时”来捕获此类问题，因此最多您会得到核心转储，最糟糕的是您会得到损坏的内存。

听起来您想要 Java 序列化/反序列化的 C++ 版本。可能有一个图书馆可以做到这一点。

回复收藏 0 原文

抠脚大汉 2024-07-18 11:54:59

非虚函数调用就像 C 函数一样直接链接。对象（this）指针作为第一个参数传递。调用该函数不需要了解对象布局。

回复收藏 0 原文

怀中猫帐中妖 2024-07-18 11:54:59

听起来您没有将对象本身存储在缓冲区中，而是将组成它们的数据存储在缓冲区中。

如果此数据按照字段在您的类中定义的顺序存储在内存中（具有适合平台的适当填充）并且您的类型是POD，然后您可以将缓冲区中的数据memcpy到a指向您的类型的指针（或者可能对其进行强制转换，但请注意，存在一些特定于平台的陷阱，这些陷阱会强制转换为不同类型的指针）。

如果您的类不是 POD，则无法保证字段的内存布局，并且您不应该依赖任何观察到的顺序，因为它允许在每次重新编译时更改。

但是，您可以使用 POD 中的数据初始化非 POD。

至于非虚拟函数所在的地址：它们在编译时静态链接到代码段中的某个位置，该位置对于您的类型的每个实例都是相同的。请注意，不涉及“运行时”。当您编写如下代码时：

class Foo{
   int a;
   int b;

public:
   void DoSomething(int x);
};

void Foo::DoSomething(int x){a = x * 2; b = x + a;}

int main(){
    Foo f;
    f.DoSomething(42);
    return 0;
}

编译器会生成执行以下操作的代码：

function main：
1. 在堆栈上为对象“f”分配 8 个字节
2. 调用类“Foo”的默认初始化程序（在本例中不执行任何操作）
3. 将参数值42压入堆栈
4. 将指向对象“f”的指针压入堆栈
5. 调用函数Foo_i_DoSomething@4（实际名称通常更复杂）
6. 将返回值0加载到累加器寄存器
7. 返回呼叫者
函数 Foo_i_DoSomething@4 （位于代码段的其他位置）
1. 从堆栈加载“x”值（由调用者推送）
2. 乘以 2
3. 从堆栈加载“this”指针（由调用者推送）
4. 计算 Foo 对象中字段“a”的偏移量
5. 将计算出的偏移量添加到第 3 步中加载的 this 指针
6. 将第 2 步中计算出的乘积存储到第 5 步中计算出的偏移量
7. 再次从堆栈加载“x”值
8. 再次从堆栈加载“this”指针
9. 再次计算 Foo 对象中字段“a”的偏移量
10. 将计算出的偏移量添加到第 8 步中加载的 this 指针
11. 加载存储在偏移处的“a”值，
12. 将第 12 步加载的 int 值“a”添加到第 7 步加载的“x”值
13. 再次从堆栈加载“this”指针
14. 计算 Foo 对象中字段“b”的偏移量
15. 将计算出的偏移量添加到第 14 步中加载的 this 指针
16. 将第 13 步中计算出的总和存储到第 16 步中计算出的偏移量
17. 返回呼叫者

换句话说，它或多或少与您编写的代码相同（具体信息，例如 DoSomething 函数的名称和传递 this 指针的方法）给编译器）：

class Foo{
    int a;
    int b;

    friend void Foo_DoSomething(Foo *f, int x);
};

void Foo_DoSomething(Foo *f, int x){
    f->a = x * 2;
    f->b = x + f->a;
}

int main(){
    Foo f;
    Foo_DoSomething(&f, 42);
    return 0;
}

It sounds like you're not storing the objects themselves in a buffer, but rather the data from which they're comprised.

If this data is in memory in the order the fields are defined within your class (with proper padding for the platform) and your type is a POD, then you can memcpy the data from the buffer to a pointer to your type (or possibly cast it, but beware, there are some platform-specific gotchas with casts to pointers of different types).

If your class is not a POD, then the in-memory layout of fields is not guaranteed, and you shouldn't rely on any observed ordering, as it is allowed to change on each recompile.

You can, however, initialize a non-POD with data from a POD.

As far as the addresses where non-virtual functions are located: they are statically linked at compile time to some location within your code segment that is the same for every instance of your type. Note that there is no "runtime" involved. When you write code like this:

class Foo{
   int a;
   int b;

public:
   void DoSomething(int x);
};

void Foo::DoSomething(int x){a = x * 2; b = x + a;}

int main(){
    Foo f;
    f.DoSomething(42);
    return 0;
}

the compiler generates code that does something like this:

function main:
1. allocate 8 bytes on stack for object "f"
2. call default initializer for class "Foo" (does nothing in this case)
3. push argument value 42 onto stack
4. push pointer to object "f" onto stack
5. make call to function Foo_i_DoSomething@4 (actual name is usually more complex)
6. load return value 0 into accumulator register
7. return to caller
function Foo_i_DoSomething@4 (located elsewhere in the code segment)
1. load "x" value from stack (pushed on by caller)
2. multiply by 2
3. load "this" pointer from stack (pushed on by caller)
4. calculate offset of field "a" within a Foo object
5. add calculated offset to this pointer, loaded in step 3
6. store product, calculated in step 2, to offset calculated in step 5
7. load "x" value from stack, again
8. load "this" pointer from stack, again
9. calculate offset of field "a" within a Foo object, again
10. add calculated offset to this pointer, loaded in step 8
11. load "a" value stored at offset,
12. add "a" value, loaded int step 12, to "x" value loaded in step 7
13. load "this" pointer from stack, again
14. calculate offset of field "b" within a Foo object
15. add calculated offset to this pointer, loaded in step 14
16. store sum, calculated in step 13, to offset calculated in step 16
17. return to caller

In other words, it would be more or less the same code as if you had written this (specifics, such as name of DoSomething function and method of passing this pointer are up to the compiler):

class Foo{
    int a;
    int b;

    friend void Foo_DoSomething(Foo *f, int x);
};

void Foo_DoSomething(Foo *f, int x){
    f->a = x * 2;
    f->b = x + f->a;
}

int main(){
    Foo f;
    Foo_DoSomething(&f, 42);
    return 0;
}

回复收藏 0 原文

此生挚爱伱 2024-07-18 11:54:59

在这种情况下，一个POD类型的对象已经被创建了（无论你是否调用new。分配所需的存储已经足够了），并且你可以访问它的成员，包括调用该对象上的函数。但这只有在您精确知道 T 所需的对齐方式、T 的大小（缓冲区可能不小于它）以及 T 所有成员的对齐方式的情况下才有效。即使对于 pod 类型，编译器也是如此如果需要的话，允许在成员之间放置填充字节。对于非 POD 类型，如果您的类型没有虚函数或基类，没有用户定义的构造函数（当然），并且这也适用于基类及其所有非静态成员，那么您也可以有同样的运气。
对于所有其他类型，一切都无效。您必须首先使用 POD 读取值，然后使用该数据初始化非 POD 类型。

回复收藏 0 原文

淡淡離愁欲言轉身 2024-07-18 11:54:59

我将对象存储在缓冲区中。 ...如果我知道对象的总体大小，是否可以创建指向此内存的指针并在其上调用函数？

如果使用强制转换是可以接受的，那么这是可以接受的：

#include <iostream>

namespace {
    class A {
        int i;
        int j;
    public:
        int value()
        {
            return i + j;
        }
    };
}

int main()
{
    char buffer[] = { 1, 2 };
    std::cout << reinterpret_cast<A*>(buffer)->value() << '\n';
}

将对象强制转换为原始内存之类的东西然后再返回实际上是很常见的，尤其是在 C 世界中。但是，如果您使用类层次结构，那么使用指向成员函数的指针会更有意义。

假设我有以下课程：...
如果我知道这个类的大小为 24 并且我知道它在内存中的起始地址...

这就是事情变得困难的地方。对象的大小包括其数据成员（以及来自任何基类的任何数据成员）的大小加上任何填充加上任何函数指针或依赖于实现的信息，减去从某些大小优化（空基类优化）中保存的任何内容。如果结果数为 0 字节，则要求该对象在内存中至少占用 1 个字节。这些问题是语言问题和大多数 CPU 对内存访问的常见要求的结合。尝试让事情正常工作可能是一件非常痛苦的事情。

如果您只是分配一个对象并在原始内存中进行转换，则可以忽略这些问题。但是，如果您将对象的内部结构复制到某种缓冲区中，那么它们很快就会抬起头来。上面的代码依赖于一些关于对齐的一般规则（即，我碰巧知道类 A 将具有与 int 相同的对齐限制，因此数组可以安全地转换为 A；但我不一定保证如果我将数组的一部分转换为 A，并将部分转换为具有其他数据成员的其他类，则效果相同）。

哦，当复制对象时，您需要确保正确处理指针。

您可能还对 Google 的 Protocol Buffers 或 Facebook 的 Thrift。

是的，这些问题很困难。是的，有些编程语言将它们隐藏起来。但是有很多东西正在获取扫到地毯下：

在 Sun 的 HotSpot JVM 中，对象存储与最近的 64 位边界对齐。除此之外，每个对象在内存中都有一个 2 字标头。 JVM 的字大小通常是平台的本机指针大小。（仅由 32 位 int 和 64 位 double 组成的对象（即 96 位数据）将需要）两个字用于对象头，一个字用于 int，两个字用于 double。那是 5 个字：160 位。由于对齐的原因，该对象将占用 192 位内存。

这是因为 Sun 依靠相对简单的策略来解决内存对齐问题（在假想的处理器上，可以允许 char 存在于任何内存位置，int 可以存在于任何可被 4 整除的位置，而 double 可能需要仅分配在可被 32 整除的内存位置上——但最严格的对齐要求也满足所有其他对齐要求，因此 Sun 根据最严格的位置来对齐所有内容。

另一种内存对齐策略可以回收部分空间。

I am storing objects in a buffer. ... If I know the overall size of the object, is it acceptable to create a pointer to this memory and call functions on it?

This is acceptable to the extent that using casts is acceptable:

#include <iostream>

namespace {
    class A {
        int i;
        int j;
    public:
        int value()
        {
            return i + j;
        }
    };
}

int main()
{
    char buffer[] = { 1, 2 };
    std::cout << reinterpret_cast<A*>(buffer)->value() << '\n';
}

Casting an object to something like raw memory and back again is actually pretty common, especially in the C world. If you're using a class hierarchy, though, it would make more sense to use pointer to member functions.

say I have the following class: ...
if I know this class to be of size 24 and I know the address of where it starts in memory ...

This is where things get difficult. The size of an object includes the size of its data members (and any data members from any base classes) plus any padding plus any function pointers or implementation-dependent information, minus anything saved from certain size optimizations (empty base class optimization). If the resulting number is 0 bytes, then the object is required to take at least one byte in memory. These things are a combination of language issues and common requirements that most CPUs have regarding memory accesses. Trying to get things to work properly can be a real pain.

If you just allocate an object and cast to and from raw memory you can ignore these issues. But if you copy an object's internals to a buffer of some sort, then they rear their head pretty quickly. The code above relies on a few general rules about alignment (i.e., I happen to know that class A will have the same alignment restrictions as ints, and thus the array can be safely cast to an A; but I couldn't necessarily guarantee the same if I were casting parts of the array to A's and parts to other classes with other data members).

Oh, and when copying objects you need to make sure you're properly handling pointers.

You may also be interested in things like Google's Protocol Buffers or Facebook's Thrift.

Yes these issues are difficult. And, yes, some programming languages sweep them under the rug. But there's an awful lot of stuff getting swept under the rug:

In Sun's HotSpot JVM, object storage is aligned to the nearest 64-bit boundary. On top of this, every object has a 2-word header in memory. The JVM's word size is usually the platform's native pointer size. (An object consisting of only a 32-bit int and a 64-bit double -- 96 bits of data -- will require) two words for the object header, one word for the int, two words for the double. That's 5 words: 160 bits. Because of the alignment, this object will occupy 192 bits of memory.

This is because Sun is relying on a relatively simple tactic for memory alignment issues (on an imaginary processor, a char may be allowed to exist at any memory location, an int at any location that is divisible by 4, and a double may need to be allocated only on memory locations that are divisible by 32 -- but the most restrictive alignment requirement also satisfies every other alignment requirement, so Sun is aligning everything according to the most restrictive location).

Another tactic for memory alignment can reclaim some of that space.

回复收藏 0 原文