具有动态分配成员的动态创建对象的访问成本

发布于 2024-09-03 07:35:07 字数 695 浏览 4 评论 0原文

我正在构建一个应用程序，该应用程序将具有 A 类型的动态分配对象，每个对象都有一个动态分配的成员 (v)，类似于下面的类，

class A {
int a;
int b;
int* v;
};

其中：

v 的内存将在构造函数中分配。
v 将在创建类型 A 的对象时分配一次，并且永远不需要调整大小。
v 的大小在 A 的所有实例中都会有所不同。

应用程序可能有大量此类对象，并且大多数需要通过 CPU 流式传输大量此类对象，但只需要对成员变量执行非常简单的计算。

动态分配 v 是否意味着 A 的实例及其成员 v 不在内存中一起定位？
可以使用哪些工具和技术来测试这种碎片是否是性能瓶颈？
如果这种碎片是一个性能问题，是否有任何技术可以允许 A 和 v 在连续的内存区域中分配？
或者是否有任何技术可以帮助内存访问，例如预取方案？例如，获取类型 A 的对象，同时预取 v，对其他成员变量进行操作。
如果 v 的大小或可接受的最大大小在编译时已知，则将 v 替换为固定大小的数组，如 int v[max_length]带来更好的表现？

目标平台是配备 x86/AMD64 处理器、Windows 或 Linux 操作系统的标准台式机，并使用 GCC 或 MSVC 编译器进行编译。

原文

I'm building an application which will have dynamic allocated objects of type A each with a dynamically allocated member (v) similar to the below class

class A {
int a;
int b;
int* v;
};

where:

The memory for v will be allocated in the constructor.
v will be allocated once when an object of type A is created and will never need to be resized.
The size of v will vary across all instances of A.

The application will potentially have a huge number of such objects and mostly need to stream a large number of these objects through the CPU but only need to perform very simple computations on the members variables.

Could having v dynamically allocated could mean that an instance of A and its member v are not located together in memory?
What tools and techniques can be used to test if this fragmentation is a performance bottleneck?
If such fragmentation is a performance issue, are there any techniques that could allow A and v to allocated in a continuous region of memory?
Or are there any techniques to aid memory access such as pre-fetching scheme? for example get an object of type A operate on the other member variables whilst pre-fetching v.
If the size of v or an acceptable maximum size could be known at compile time would replacing v with a fixed sized array like int v[max_length] lead to better performance?

The target platforms are standard desktop machines with x86/AMD64 processors, Windows or Linux OSes and compiled using either GCC or MSVC compilers.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

东北女汉子 2024-09-10 07:35:07

如果您有充分的理由关心性能......

动态分配 v 可能意味着 A 的实例及其成员 v
不在内存中一起定位吗？

如果他们都被分配了“新”，那么他们很可能会彼此接近。然而，当前的记忆状态会极大地影响这一结果，这在很大程度上取决于您对记忆的处理方式。如果你只是一个接一个地分配一千个这样的东西，那么后面的几乎肯定是“几乎连续的”。

如果 A 实例在堆栈上，则它的“v”不太可能在附近。

如果这种碎片是一个性能问题，是否有任何技术可以
允许 A 和 v 分配在连续的内存区域中吗？

为两者分配空间，然后将新的它们放置到该空间中。它很脏，但通常应该可以工作：

char* p = reinterpret_cast<char*>(malloc(sizeof(A) + sizeof(A::v)));
char* v = p + sizeof(A);
A* a = new (p) A(v);

// time passes

a->~A();
free(a);

或者是否有任何技术可以帮助内存访问，例如预取方案？

预取是特定于编译器和平台的，但许多编译器都有可用的内在函数来执行此操作。请注意，如果您要尝试立即访问该数据，那么它不会有太大帮助，为了使预取具有任何价值，您通常需要在需要数据之前执行数百个周期。也就是说，它可以极大地提高速度。内在函数看起来像 __pf(my_a->v);

如果 v 的大小或可接受的最大大小在编译时已知
将 v 替换为固定大小的数组（如 int v[max_length]）会导致更好的结果
性能？

或许。如果固定大小的缓冲区通常接近您需要的大小，那么它可能会大大提高速度。通过这种方式访问一个实例总是会更快，但如果缓冲区过大并且大部分未使用，您将失去将更多对象放入缓存的机会。即，在缓存中拥有更多较小的对象比在缓存中填充大量未使用的数据要好。

具体细节取决于您的设计和性能目标。关于此问题的有趣讨论，以及使用特定编译器的特定硬件上的“现实世界”特定问题，请参阅面向对象编程的陷阱（这是 PDF 的 Google 文档链接，PDF 本身可以在此处）。

If you have a good reason to care about performance...

Could having v dynamically allocated could mean that an instance of A and its member v
are not located together in memory?

If they are both allocated with 'new', then it is likely that they will be near one another. However, the current state of memory can drastically affect this outcome, it depends significantly on what you've been doing with memory. If you just allocate a thousand of these things one after another, then the later ones will almost certainly be "nearly contiguous".

If the A instance is on the stack, it is highly unlikely that its 'v' will be nearby.

If such fragmentation is a performance issue, are there any techniques that could
allow A and v to allocated in a continuous region of memory?

Allocate space for both, then placement new them into that space. It's dirty, but it should typically work:

char* p = reinterpret_cast<char*>(malloc(sizeof(A) + sizeof(A::v)));
char* v = p + sizeof(A);
A* a = new (p) A(v);

// time passes

a->~A();
free(a);

Or are there any techniques to aid memory access such as pre-fetching scheme?

Prefetching is compiler and platform specific, but many compilers have intrinsics available to do it. Mind- it won't help a lot if you're going to try to access that data right away, for prefetching to be of any value you often need to do it hundreds of cycles before you want the data. That said, it can be a huge boost to speed. The intrinsic would look something like __pf(my_a->v);

If the size of v or an acceptable maximum size could be known at compile time
would replacing v with a fixed sized array like int v[max_length] lead to better
performance?

Maybe. If the fixed size buffer is usually close to the size you'll need, then it could be a huge boost in speed. It will always be faster to access one A instance in this way, but if the buffer is unnecessarily gigantic and largely unused, you'll lose the opportunity for more objects to fit into the cache. I.e. it's better to have more smaller objects in the cache than it is to have a lot of unused data filling the cache up.

The specifics depend on what your design and performance goals are. An interesting discussion about this, with a "real-world" specific problem on a specific bit of hardware with a specific compiler, see The Pitfalls of Object Oriented Programming (that's a Google Docs link for a PDF, the PDF itself can be found here).

回复收藏 0 原文