VisualStudio2008 上的 std::vector 似乎未实现最佳实现 - 复制构造函数调用过多

发布于 2024-07-14 21:54:01 字数 1523 浏览 4 评论 0原文

我一直在将流行的 XmlRpc 库的 STL 实现与主要避免 STL 的实现进行比较。 STL 实现要慢得多 - 我从 47 秒降至 4.5 秒。 我已经诊断出一些原因:部分原因是 std::string 被误用(例如,作者应该尽可能使用“const std::string&” - 不要仅仅使用 std::string 就好像它们是 Java 字符串),但这也是因为每次向量超出其边界时都会不断地调用复制构造函数,这种情况非常频繁。 复制构造函数非常慢,因为它们对树(XmlRpc 值)进行深层复制。

StackOverflow 上的其他人告诉我,std::vector 实现通常会在每次超出缓冲区大小时将其大小加倍。 VisualStudio 2008 上的情况似乎并非如此:向 std::vector 添加 50 个项目需要调用 177 次复制构造函数。 每次加倍应该调用复制构造函数 64 次。 如果您非常关心保持较低的内存使用量,那么每次增加 50% 就应该调用复制构造函数 121 次。 那么177从哪里来呢?

我的问题是:(a) 为什么复制构造函数被如此频繁地调用? (b) 如果您只是将对象从一个位置移动到另一个位置,是否有任何方法可以避免使用复制构造函数? (在这种情况下,实际上大多数情况下, memcpy() 就足够了 - 这会产生很大的差异)。

(注意:我了解 vector::reserve(),我只是有点失望,因为应用程序程序员需要实现加倍技巧,因为这样的东西已经是任何好的 STL 实现的一部分。)

我的测试程序:

#include <string>
#include <iostream>
#include <vector>



using namespace std;


int constructorCalls;
int assignmentCalls;
int copyCalls;


class C {
    int n;

public:
    C(int _n) { n = _n; constructorCalls++; }
    C(const C& orig) { copyCalls++; n = orig.n; }
    void operator=(const C &orig) { assignmentCalls++; n = orig.n; }
};



int main(int argc, char* argv[])
{
    std::vector<C> A;

    //A.reserve(50);
    for (int i=0; i < 50; i++)
        A.push_back(i);
    cout << "constructor calls = " << constructorCalls << "\n";
    cout << "assignment calls = " << assignmentCalls << "\n";
    cout << "copy calls = " << copyCalls << "\n";
    return 0;
}

I've been comparing a STL implementation of a popular XmlRpc library with an implementation that mostly avoids STL. The STL implementation is much slower - I got 47s down to 4.5s. I've diagnosed some of the reasons: it's partly due to std::string being mis-used (e.g. the author should have used "const std::string&" wherever possible - don't just use std::string's as if they were Java strings), but it's also because copy constructors were being constantly called each time the vector outgrew its bounds, which was exceedingly often. The copy constructors were very slow because they did deep-copies of trees (of XmlRpc values).

I was told by someone else on StackOverflow that std::vector implementations typically double the size of the buffer each time they outgrow. This does not seem to be the case on VisualStudio 2008: to add 50 items to a std::vector took 177 calls of the copy constructor. Doubling each time should call the copy constructor 64 times. If you were very concerned about keeping memory usage low, then increasing by 50% each time should call the copy constructor 121 times. So where does the 177 come from?

My question is: (a) why is the copy constructor called so often? (b) is there any way to avoid using the copy constructor if you're just moving an object from one location to another? (In this case and indeed most cases a memcpy() would have sufficed - and this makes a BIG difference).

(NB: I know about vector::reserve(), I'm just a bit disappointed that application programmers would need to implement the doubling trick when something like this is already part of any good STL implementation.)

My test program:

#include <string>
#include <iostream>
#include <vector>



using namespace std;


int constructorCalls;
int assignmentCalls;
int copyCalls;


class C {
    int n;

public:
    C(int _n) { n = _n; constructorCalls++; }
    C(const C& orig) { copyCalls++; n = orig.n; }
    void operator=(const C &orig) { assignmentCalls++; n = orig.n; }
};



int main(int argc, char* argv[])
{
    std::vector<C> A;

    //A.reserve(50);
    for (int i=0; i < 50; i++)
        A.push_back(i);
    cout << "constructor calls = " << constructorCalls << "\n";
    cout << "assignment calls = " << assignmentCalls << "\n";
    cout << "copy calls = " << copyCalls << "\n";
    return 0;
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

妖妓 2024-07-21 21:54:01

不要忘记计算将临时 C 对象 push_back 放入向量所需的复制构造函数调用次数。 每次迭代将至少调用C 的复制构造函数一次。

如果添加更多打印代码,发生的情况会更清楚:

std::vector<C> A;
std::vector<C>::size_type prevCapacity = A.capacity();

for (int i=0; i < 50; i++) {
    A.push_back(i);
    if(prevCapacity != A.capacity()) {
       cout << "capacity " << prevCapacity << " -> " << A.capacity() << "\n";
    }
    prevCapacity = A.capacity();
}

输出如下:

capacity 0 -> 1
capacity 1 -> 2
capacity 2 -> 3
capacity 3 -> 4
capacity 4 -> 6
capacity 6 -> 9
capacity 9 -> 13
capacity 13 -> 19
capacity 19 -> 28
capacity 28 -> 42
capacity 42 -> 63

所以是的,容量每次增加 50%,这占了 127 份:

1 + 2 + 3 + 4 + 6 + 9 + 13 + 19 + 28 + 42 = 127

从 50 份中添加 50 份额外的份数调用 push_back 并且您有 177:

127 + 50 = 177

Don't forget to count the copy constructor calls needed to push_back a temporary C object into the vector. Each iteration will call C's copy constructor at least once.

If you add more printing code, it's a bit clearer what is going on:

std::vector<C> A;
std::vector<C>::size_type prevCapacity = A.capacity();

for (int i=0; i < 50; i++) {
    A.push_back(i);
    if(prevCapacity != A.capacity()) {
       cout << "capacity " << prevCapacity << " -> " << A.capacity() << "\n";
    }
    prevCapacity = A.capacity();
}

This has the following output:

capacity 0 -> 1
capacity 1 -> 2
capacity 2 -> 3
capacity 3 -> 4
capacity 4 -> 6
capacity 6 -> 9
capacity 9 -> 13
capacity 13 -> 19
capacity 19 -> 28
capacity 28 -> 42
capacity 42 -> 63

So yes, the capacity increases by 50% each time, and this accounts for 127 of the copies:

1 + 2 + 3 + 4 + 6 + 9 + 13 + 19 + 28 + 42 = 127

Add the 50 additional copies from 50 calls to push_back and you have 177:

127 + 50 = 177
鱼忆七猫命九 2024-07-21 21:54:01

我的问题是:

(a) 为什么复制构造函数被如此频繁地调用?

因为当调整向量大小时,您需要将旧缓冲区中的所有元素复制到新缓冲区中。 这是因为向量保证对象存储在连续的内存位置中。

(b) 如果您只是移动,有什么方法可以避免使用复制构造函数
一个物体从一个位置到另一个位置?

不,没有办法避免使用复制构造函数。
这是因为该对象有多个需要正确初始化的成员。
如果您使用memcpy,您如何知道该对象已正确初始化为该对象!

例如。 如果对象包含智能指针。 你不能只 memcpy 一个智能指针。 它需要做额外的工作来跟踪所有权。 否则,当原始对象超出范围时,内存将被删除,并且新对象将具有悬空指针。 同样的原则适用于所有具有构造函数(复制构造函数)的对象,构造函数实际上执行所需的工作。

停止复制内容的方法就是预留空间。
这使得向量为其将存储的所有对象分配足够的空间。 因此它不需要不断重新分配主缓冲区。 它只是将对象复制到向量中。

每次加倍应该调用复制构造函数 64 次。
如果您非常关心保持较低的内存使用量,那么增加
50%每次应该调用复制构造函数121次。
那么177从哪里来呢?

向量分配的大小 = 1:
添加元素 1:(不重新分配)但将元素 1 复制到向量中。
添加元素 2:重新分配缓冲区(大小 2):复制元素 1。 将元素 2 复制到向量中。
添加元素 3:重新分配缓冲区(大小 4):复制元素 1-2。 将元素 3 复制到向量中。
添加元素 4:将元素 4 复制到向量中
添加元素 5:重新分配缓冲区(大小 8):复制元素 1-4。 将元素 5 复制到向量中。
添加元素 6:将元素 6 复制到向量中
添加元素 7:将元素 7 复制到向量中
添加元素 8:将元素 8 复制到向量中
添加元素 9:重新分配缓冲区(大小 16):复制元素 1-8。 将元素 9 复制到向量中。
添加元素 10:将元素 10 复制到向量
等等。

前 10 个元素需要 25 个复制结构。
如果您先使用保留,则只需构建 10 次副本即可。

My question is:

(a) why is the copy constructor called so often?

Because when the vector is re-sized you need to copy all the elements from the old buffer into the new buffer. This is because the vector guarantees that the objects are stored in consecutive memory locations.

(b) is there any way to avoid using the copy constructor if you're just moving
an object from one location to another?

No there is no way to avoid the use of the copy constructor.
This because the object has several members that need to be initialized correctly.
If you used memcpy how do you know the object has been initialized correctly for the object!

For example. IF the object contained a smart pointer. You can't just memcpy a smart pointer. It needs to do extra work to track ownership. Otherwise when the original goes out of scope the memory is deleted and the new object has a dangling pointer. The same principle applies to all objects that have a constructor (copy constructor) the constructor actually does required work.

The way to stop the copy of the content is too reserve the space.
This makes vector allocate enough space for all the objects it will store. Thus it does not need to keep reallocating the main buffer. It just copies the objects into the vector.

Doubling each time should call the copy constructor 64 times.
If you were very concerned about keeping memory usage low, then increasing by
50% each time should call the copy constructor 121 times.
So where does the 177 come from?

Vector allocated size = 1:
Add element 1: (no reallocation) But copies element 1 into vector.
Add element 2: Reallocate buffer (size 2): Copy element 1 across. Copy element 2 into vector.
Add element 3: Reallocate buffer (size 4): Copy element 1-2 across. Copy element 3 into vector.
Add element 4: Copy element 4 into vector
Add element 5: Reallocate buffer (size 8): Copy element 1-4 across. Copy element 5 into vector.
Add element 6: Copy element 6 into vector
Add element 7: Copy element 7 into vector
Add element 8: Copy element 8 into vector
Add element 9: Reallocate buffer (size 16): Copy element 1-8 across. Copy element 9 into vector.
Add element 10: Copy element 10 into vector
etc.

First 10 elements took 25 copy constructions.
If you had used reserve first it would have only taken 10 copy constructions.

泪意 2024-07-21 21:54:01

STL确实容易导致这种事情。 该规范不允许 memcpy'ing,因为这并非在所有情况下都有效。 有一个文档描述了 EASTL, EA 进行了一系列更改,使其更适合其目的,其中确实有一种方法来声明类型对于 memcpy 是安全的。 不幸的是,据我所知,它不是开源的,所以我们无法使用它。

IIRC Dinkumware STL(VS 中的那个)每次将向量增长 50%。

然而,在向量上执行一系列push_back 是一种常见的低效率现象。 您可以使用保留来缓解它(如果您显着高估,则可能会浪费内存)或使用不同的容器 - 双端队列对于这样的一系列插入表现更好,但在随机访问中稍微慢一些,这可能/可能对你来说不是一个好的权衡。

或者,您可以考虑存储指针而不是值,如果您存储较大的元素,这将使调整大小变得更便宜。 如果您要存储大型对象,这总是会获胜,因为您不必复制它们 - 至少在插入时您总是会为每个项目保存一份副本。

The STL does tend to cause this sort of thing. The spec doesn't allow memcpy'ing because that doesn't work in all cases. There's a document describing EASTL, a bunch of alterations made by EA to make it more suitable for their purposes, which does have a method of declaring that a type is safe to memcpy. Unfortunately it's not open source AFAIK so we can't play with it.

IIRC Dinkumware STL (the one in VS) grows vectors by 50% each time.

However, doing a series of push_back's on a vector is a common inefficiency. You can either use reserve to alleviate it (at the cost of possibly wasting memory if you overestimate significantly) or use a different container - deque performs better for a series of insertions like that but is a little slower in random access, which may/may not be a good tradeoff for you.

Or you could look at storing pointers instead of values which will make the resizing much cheaper if you're storing large elements. If you're storing large objects this will always win because you don't have to copy them ever - you'll always save that one copy for each item on insertion at least.

浴红衣 2024-07-21 21:54:01

如果我没记错的话,C++0x 可能具有移动语义(除了复制语义之外),也就是说,如果您确实愿意,您可以实现更有效的复制构造函数。

除非复制构造函数很复杂,否则它通常非常有效 - 毕竟,您应该做的不仅仅是复制对象,而且现在复制内存非常快。

If I recall correctly, C++0x may have move semantics (in addition to copy semantics), that said, you can implement a more efficient copy constructor if you really want to.

Unless the copy constructor is complex, it is normally very efficient - after all, you are supposed to be doing little more than merely copying the object, and copying memory is very fast these days.

绻影浮沉 2024-07-21 21:54:01

看起来 C++0x 的补充会有所帮助; 请参阅RvalueSTL 升级

It looks like additions to C++0x will help here; see Rvalue and STL upgrades.

oО清风挽发oО 2024-07-21 21:54:01

为了避免这个问题,为什么不使用指针向量而不是对象向量呢? 然后在销毁向量时删除每个元素。

换句话说,std::vector 而不是 std::vector。 Memcpy 指针非常快。

To circumvent this issue, why not use a vector of pointers instead of a vector of objects? Then delete each element when destructing the vector.

In other words, std::vector<C*> instead of std::vector<C>. Memcpy'ing pointers is very fast.

π浅易 2024-07-21 21:54:01

请注意,请小心向向量添加指针,作为最小化复制成本的一种方式,因为

  1. 向量中指针的不良数据局部性使得具有连续对象的非指针版本在向量运行时围绕指针版本运行。实际上使用过
  2. 堆分配比堆栈分配慢。

您是否更经常使用向量或向其中添加内容?

Just a note, be careful of adding pointers to the vector as a way of minimizing copying costs, since

  1. The bad data locality of the pointers in the vector makes the non-pointer version with consecutive objects run circles around the pointer version when the vector is actually used.
  2. Heap allocation is slower than stack allocation.

Do you more often use the vector or add stuff to it?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文