std::vector 比普通数组慢很多吗?
我一直认为 std::vector 是“作为数组实现”是普遍的看法,等等等等。今天我下去测试了一下,好像不是这样:
这是一些测试结果:
UseArray completed in 2.619 seconds
UseVector completed in 9.284 seconds
UseVectorPushBack completed in 14.669 seconds
The whole thing completed in 26.591 seconds
大约慢了3-4倍!并不能真正证明“向量可能会慢几纳秒”的评论。
我使用的代码:
#include <cstdlib>
#include <vector>
#include <iostream>
#include <string>
#include <boost/date_time/posix_time/ptime.hpp>
#include <boost/date_time/microsec_time_clock.hpp>
class TestTimer
{
public:
TestTimer(const std::string & name) : name(name),
start(boost::date_time::microsec_clock<boost::posix_time::ptime>::local_time())
{
}
~TestTimer()
{
using namespace std;
using namespace boost;
posix_time::ptime now(date_time::microsec_clock<posix_time::ptime>::local_time());
posix_time::time_duration d = now - start;
cout << name << " completed in " << d.total_milliseconds() / 1000.0 <<
" seconds" << endl;
}
private:
std::string name;
boost::posix_time::ptime start;
};
struct Pixel
{
Pixel()
{
}
Pixel(unsigned char r, unsigned char g, unsigned char b) : r(r), g(g), b(b)
{
}
unsigned char r, g, b;
};
void UseVector()
{
TestTimer t("UseVector");
for(int i = 0; i < 1000; ++i)
{
int dimension = 999;
std::vector<Pixel> pixels;
pixels.resize(dimension * dimension);
for(int i = 0; i < dimension * dimension; ++i)
{
pixels[i].r = 255;
pixels[i].g = 0;
pixels[i].b = 0;
}
}
}
void UseVectorPushBack()
{
TestTimer t("UseVectorPushBack");
for(int i = 0; i < 1000; ++i)
{
int dimension = 999;
std::vector<Pixel> pixels;
pixels.reserve(dimension * dimension);
for(int i = 0; i < dimension * dimension; ++i)
pixels.push_back(Pixel(255, 0, 0));
}
}
void UseArray()
{
TestTimer t("UseArray");
for(int i = 0; i < 1000; ++i)
{
int dimension = 999;
Pixel * pixels = (Pixel *)malloc(sizeof(Pixel) * dimension * dimension);
for(int i = 0 ; i < dimension * dimension; ++i)
{
pixels[i].r = 255;
pixels[i].g = 0;
pixels[i].b = 0;
}
free(pixels);
}
}
int main()
{
TestTimer t1("The whole thing");
UseArray();
UseVector();
UseVectorPushBack();
return 0;
}
我做错了吗?或者我刚刚打破了这个性能神话?
我在 Visual Studio 2005 中使用发布模式。
在 Visual C++ 中,#define _SECURE_SCL 0 将
UseVector
减少一半(将其缩短至 4 秒)。在我看来,这确实是巨大的。
I've always thought it's the general wisdom that std::vector
is "implemented as an array," blah blah blah. Today I went down and tested it, and it seems to be not so:
Here's some test results:
UseArray completed in 2.619 seconds
UseVector completed in 9.284 seconds
UseVectorPushBack completed in 14.669 seconds
The whole thing completed in 26.591 seconds
That's about 3 - 4 times slower! Doesn't really justify for the "vector
may be slower for a few nanosecs" comments.
And the code I used:
#include <cstdlib>
#include <vector>
#include <iostream>
#include <string>
#include <boost/date_time/posix_time/ptime.hpp>
#include <boost/date_time/microsec_time_clock.hpp>
class TestTimer
{
public:
TestTimer(const std::string & name) : name(name),
start(boost::date_time::microsec_clock<boost::posix_time::ptime>::local_time())
{
}
~TestTimer()
{
using namespace std;
using namespace boost;
posix_time::ptime now(date_time::microsec_clock<posix_time::ptime>::local_time());
posix_time::time_duration d = now - start;
cout << name << " completed in " << d.total_milliseconds() / 1000.0 <<
" seconds" << endl;
}
private:
std::string name;
boost::posix_time::ptime start;
};
struct Pixel
{
Pixel()
{
}
Pixel(unsigned char r, unsigned char g, unsigned char b) : r(r), g(g), b(b)
{
}
unsigned char r, g, b;
};
void UseVector()
{
TestTimer t("UseVector");
for(int i = 0; i < 1000; ++i)
{
int dimension = 999;
std::vector<Pixel> pixels;
pixels.resize(dimension * dimension);
for(int i = 0; i < dimension * dimension; ++i)
{
pixels[i].r = 255;
pixels[i].g = 0;
pixels[i].b = 0;
}
}
}
void UseVectorPushBack()
{
TestTimer t("UseVectorPushBack");
for(int i = 0; i < 1000; ++i)
{
int dimension = 999;
std::vector<Pixel> pixels;
pixels.reserve(dimension * dimension);
for(int i = 0; i < dimension * dimension; ++i)
pixels.push_back(Pixel(255, 0, 0));
}
}
void UseArray()
{
TestTimer t("UseArray");
for(int i = 0; i < 1000; ++i)
{
int dimension = 999;
Pixel * pixels = (Pixel *)malloc(sizeof(Pixel) * dimension * dimension);
for(int i = 0 ; i < dimension * dimension; ++i)
{
pixels[i].r = 255;
pixels[i].g = 0;
pixels[i].b = 0;
}
free(pixels);
}
}
int main()
{
TestTimer t1("The whole thing");
UseArray();
UseVector();
UseVectorPushBack();
return 0;
}
Am I doing it wrong or something? Or have I just busted this performance myth?
I'm using Release mode in Visual Studio 2005.
In Visual C++, #define _SECURE_SCL 0
reduces UseVector
by half (bringing it down to 4 seconds). This is really huge, IMO.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(22)
使用以下内容:
,因此数组的速度是向量的两倍。
但是在更详细地查看代码之后,这是预期的;当你遇到向量两次而数组只运行一次时。注意:当您
resize()
向量时,您不仅分配内存,而且还运行向量并调用每个成员的构造函数。稍微重新安排代码,以便向量仅初始化每个对象一次:
现在再次执行相同的计时:
向量现在的性能仅比数组稍差。在我看来,这种差异是微不足道的,可能是由一大堆与测试无关的事情引起的。
我还要考虑到您没有在
UseArrray()
方法中正确初始化/销毁 Pixel 对象,因为构造函数/析构函数都没有被调用(对于这个简单的类来说这可能不是问题,但是任何稍微复杂的东西(即使用指针或带有指针的成员)都会导致问题。Using the following:
So array is twice as quick as vector.
But after looking at the code in more detail this is expected; as you run across the vector twice and the array only once. Note: when you
resize()
the vector you are not only allocating the memory but also running through the vector and calling the constructor on each member.Re-Arranging the code slightly so that the vector only initializes each object once:
Now doing the same timing again:
The vector now performance only slightly worse than the array. IMO this difference is insignificant and could be caused by a whole bunch of things not associated with the test.
I would also take into account that you are not correctly initializing/Destroying the Pixel object in the
UseArrray()
method as neither constructor/destructor is not called (this may not be an issue for this simple class but anything slightly more complex (ie with pointers or members with pointers) will cause problems.很好的问题。我来到这里希望找到一些简单的修复方法来加快矢量测试的速度。结果并不像我预期的那样!
优化有帮助,但这还不够。通过优化,我仍然发现 UseArray 和 UseVector 之间有 2 倍的性能差异。有趣的是,在没有优化的情况下,UseVector 明显比 UseVectorPushBack 慢。
想法 #1 - 使用 new[] 而不是 malloc
我尝试将 UseArray 中的
malloc()
更改为new[]
,以便构造对象。并从单独的字段分配更改为分配 Pixel 实例。哦,并将内部循环变量重命名为j
。令人惊讶的是(对我来说),这些变化都没有产生任何影响。甚至不更改默认构造所有像素的
new[]
。看起来 gcc 可以在使用 new[] 时优化默认构造函数调用,但在使用 vector 时则不能。想法#2 - 删除重复的operator[]调用
我还尝试摆脱三重
operator[]
查找并缓存对pixels[j]
的引用。这实际上减慢了 UseVector 的速度!哎呀。想法 #3 - 删除构造函数
完全删除构造函数怎么样?那么也许 gcc 可以在创建向量时优化所有对象的构造。如果我们将 Pixel 更改为:
结果:速度提高约 10%。还是比数组慢。嗯。
想法 #4 - 使用迭代器而不是循环索引
使用
vector::iterator
而不是循环索引怎么样?结果:
不,没有什么不同。至少速度不慢。我认为这会具有类似于 #2 的性能,其中我使用了 Pixel& 参考。
结论
即使某些聪明的 cookie 弄清楚了如何使向量循环与数组循环一样快,但这并不能很好地说明
std::vector
的默认行为。编译器足够聪明,可以优化所有 C++ 特性并使 STL 容器与原始数组一样快。最重要的是,在使用 std::vector 时,编译器无法优化无操作默认构造函数调用。如果您使用普通的
new[]
它可以很好地优化它们。但对于std::vector
则不然。即使您可以重写代码以消除与这里的口头禅相反的构造函数调用:“编译器比您更聪明。STL 与普通 C 一样快。别担心。”Great question. I came in here expecting to find some simple fix that would speed the vector tests right up. That didn't work out quite like I expected!
Optimization helps, but it's not enough. With optimization on I'm still seeing a 2X performance difference between UseArray and UseVector. Interestingly, UseVector was significantly slower than UseVectorPushBack without optimization.
Idea #1 - Use new[] instead of malloc
I tried changing
malloc()
tonew[]
in UseArray so the objects would get constructed. And changing from individual field assignment to assigning a Pixel instance. Oh, and renaming the inner loop variable toj
.Surprisingly (to me), none of those changes made any difference whatsoever. Not even the change to
new[]
which will default construct all of the Pixels. It seems that gcc can optimize out the default constructor calls when usingnew[]
, but not when usingvector
.Idea #2 - Remove repeated operator[] calls
I also attempted to get rid of the triple
operator[]
lookup and cache the reference topixels[j]
. That actually slowed UseVector down! Oops.Idea #3 - Remove constructors
What about removing the constructors entirely? Then perhaps gcc can optimize out the construction of all of the objects when the vectors are created. What happens if we change Pixel to:
Result: about 10% faster. Still slower than an array. Hm.
Idea #4 - Use iterator instead of loop index
How about using a
vector<Pixel>::iterator
instead of a loop index?Result:
Nope, no different. At least it's not slower. I thought this would have performance similar to #2 where I used a
Pixel&
reference.Conclusion
Even if some smart cookie figures out how to make the vector loop as fast as the array one, this does not speak well of the default behavior of
std::vector
. So much for the compiler being smart enough to optimize out all the C++ness and make STL containers as fast as raw arrays.The bottom line is that the compiler is unable to optimize away the no-op default constructor calls when using
std::vector
. If you use plainnew[]
it optimizes them away just fine. But not withstd::vector
. Even if you can rewrite your code to eliminate the constructor calls that flies in face of the mantra around here: "The compiler is smarter than you. The STL is just as fast as plain C. Don't worry about it."这是一个古老但流行的问题。
此时,许多程序员将使用 C++11 工作。在 C++11 中,OP 的编写代码对于
UseArray
或UseVector
运行速度同样快。根本问题是,当您的
Pixel
结构未初始化时,std::vector::resize( size_t, T const&=T() )
采用默认值构造Pixel
并复制它。编译器没有注意到它被要求复制未初始化的数据,因此它实际上执行了复制。在 C++11 中,
std::vector::resize
有两个重载。第一个是std::vector::resize(size_t)
,另一个是std::vector::resize(size_t, T const&)
>。这意味着当您在没有第二个参数的情况下调用resize
时,它只是默认构造,并且编译器足够聪明,可以意识到默认构造不执行任何操作,因此它会跳过缓冲区的传递。(添加两个重载是为了处理可移动、可构造和不可复制类型——处理未初始化数据时的性能改进是一个额外的好处)。
push_back
解决方案还进行了 fencepost 检查,这会减慢速度,因此它仍然比malloc
版本慢。现场示例(我还用
chrono::high_resolution_clock
替换了计时器)。请注意,如果您有一个通常需要初始化的结构,但您想在增加缓冲区后处理它,则可以使用自定义
std::vector
分配器来完成此操作。如果您想将其移动到更正常的std::vector
中,我相信仔细使用allocator_traits
并覆盖==
可能会导致关闭,但我不确定。This is an old but popular question.
At this point, many programmers will be working in C++11. And in C++11 the OP's code as written runs equally fast for
UseArray
orUseVector
.The fundamental problem was that while your
Pixel
structure was uninitialized,std::vector<T>::resize( size_t, T const&=T() )
takes a default constructedPixel
and copies it. The compiler did not notice it was being asked to copy uninitialized data, so it actually performed the copy.In C++11,
std::vector<T>::resize
has two overloads. The first isstd::vector<T>::resize(size_t)
, the other isstd::vector<T>::resize(size_t, T const&)
. This means when you invokeresize
without a second argument, it simply default constructs, and the compiler is smart enough to realize that default construction does nothing, so it skips the pass over the buffer.(The two overloads where added to handle movable, constructable and non-copyable types -- the performance improvement when working on uninitialized data is a bonus).
The
push_back
solution also does fencepost checking, which slows it down, so it remains slower than themalloc
version.live example (I also replaced the timer with
chrono::high_resolution_clock
).Note that if you have a structure that usually requires initialization, but you want to handle it after growing your buffer, you can do this with a custom
std::vector
allocator. If you want to then move it into a more normalstd::vector
, I believe careful use ofallocator_traits
and overriding of==
might pull that off, but am unsure.公平地说,您无法将 C++ 实现与 C 实现进行比较,因为我将其称为 malloc 版本。 malloc 不创建对象 - 它只分配原始内存。然后将内存视为对象而不调用构造函数是糟糕的 C++(可能无效 - 我将把它留给语言律师)。
也就是说,简单地将 malloc 更改为
new Pixel[dimensions*dimensions]
并 free 为delete [] Pixel
与您拥有的 Pixel 的简单实现没有太大区别。以下是我的机器(E6600,64 位)上的结果:但稍作更改,表格变为:
Pixel.h
Pixel.cc
main.cc
以这种方式编译:
我们得到非常不同的结果:
使用非内联构造函数Pixel、std::vector 现在胜过原始数组。
看来通过 std::vector 和 std:allocator 进行分配的复杂性太大,无法像简单的 new Pixel[n] 那样有效地进行优化。然而,我们可以看到,问题只是在于分配,而不是向量访问,通过调整几个测试函数,通过将其移出循环来创建向量/数组一次:
我们
现在得到这些结果:
我们可以学到什么由此可见, std::vector 与访问原始数组相当,但如果需要多次创建和删除向量/数组,则在元素的构造函数中创建复杂对象将比创建简单数组更耗时不是内联的。我认为这并不令人惊讶。
To be fair, you cannot compare a C++ implementation to a C implementation, as I would call your malloc version. malloc does not create objects - it only allocates raw memory. That you then treat that memory as objects without calling the constructor is poor C++ (possibly invalid - I'll leave that to the language lawyers).
That said, simply changing the malloc to
new Pixel[dimensions*dimensions]
and free todelete [] pixels
does not make much difference with the simple implementation of Pixel that you have. Here's the results on my box (E6600, 64-bit):But with a slight change, the tables turn:
Pixel.h
Pixel.cc
main.cc
Compiled this way:
we get very different results:
With a non-inlined constructor for Pixel, std::vector now beats a raw array.
It would appear that the complexity of allocation through std::vector and std:allocator is too much to be optimised as effectively as a simple
new Pixel[n]
. However, we can see that the problem is simply with the allocation not the vector access by tweaking a couple of the test functions to create the vector/array once by moving it outside the loop:and
We get these results now:
What we can learn from this is that std::vector is comparable to a raw array for access, but if you need to create and delete the vector/array many times, creating a complex object will be more time consuming that creating a simple array when the element's constructor is not inlined. I don't think that this is very surprising.
当我第一次看到你的代码时,这很难说是一个公平的比较;我绝对认为你不是在比较苹果与苹果。所以我想,让我们在所有测试中调用构造函数和析构函数;然后进行比较。
我的想法是,通过这种设置,它们应该完全相同。事实证明,我错了。
那么为什么会出现 30% 的性能损失呢? STL 在标头中包含所有内容,因此编译器应该能够理解所需的所有内容。
我的想法是,循环如何将所有值初始化为默认构造函数。所以我进行了一个测试:
结果正如我所怀疑的那样:
这显然是速度减慢的根源,事实上向量使用复制构造函数来初始化默认构造对象中的元素。
这意味着,在向量的构造过程中会发生以下伪操作顺序:
由于编译器创建的隐式复制构造函数,该伪操作顺序被扩展为以下内容:
因此默认的 Pixel 仍然是 un - 已初始化,而其余部分则使用默认
Pixel
的未初始化值进行初始化。与
New[]
/Delete[]
的替代情况相比:它们都保留为未初始化的值,并且没有对序列进行两次迭代。
有了这些信息,我们如何测试它呢?让我们尝试重写隐式复制构造函数。
结果如何?
总之,如果您经常创建数百个向量:重新考虑您的算法。
无论如何,STL 实现并不会因为某些未知原因而变慢,它只是完全按照您的要求进行操作;希望你了解得更多。
It was hardly a fair comparison when I first looked at your code; I definitely thought you weren't comparing apples with apples. So I thought, let's get constructors and destructors being called on all tests; and then compare.
My thoughts were, that with this setup, they should be exactly the same. It turns out, I was wrong.
So why did this 30% performance loss even occur? The STL has everything in headers, so it should have been possible for the compiler to understand everything that was required.
My thoughts were that it is in how the loop initialises all values to the default constructor. So I performed a test:
The results were as I suspected:
This is clearly the source of the slowdown, the fact that the vector uses the copy constructor to initialise the elements from a default constructed object.
This means, that the following pseudo-operation order is happening during construction of the vector:
Which, due to the implicit copy constructor made by the compiler, is expanded to the following:
So the default
Pixel
remains un-initialised, while the rest are initialised with the defaultPixel
's un-initialised values.Compared to the alternative situation with
New[]
/Delete[]
:They are all left to their un-initialised values, and without the double iteration over the sequence.
Armed with this information, how can we test it? Let's try over-writing the implicit copy constructor.
And the results?
So in summary, if you're making hundreds of vectors very often: re-think your algorithm.
In any case, the STL implementation isn't slower for some unknown reason, it just does exactly what you ask; hoping you know better.
尝试一下:
我获得了与数组几乎完全相同的性能。
vector
的特点是它是一种比数组更通用的工具。这意味着您必须考虑如何使用它。它可以以多种不同的方式使用,提供数组甚至没有的功能。如果你“错误”地使用它来达到你的目的,你会产生大量的开销,但如果你正确地使用它,它通常基本上是一个零开销的数据结构。在这种情况下,问题在于您单独初始化了向量(导致所有元素调用其默认构造函数),然后用正确的值单独覆盖每个元素。对于编译器来说,这比对数组执行相同的操作要困难得多。这就是为什么向量提供了一个构造函数,可以让您准确地执行此操作:用值X
初始化N
元素。当你使用它时,向量和数组一样快。
所以不,你还没有打破性能神话。但你已经证明,只有当你以最佳方式使用向量时,这才是正确的,这也是一个很好的观点。 :)
从好的方面来看,这确实是最简单的用法,但结果却是最快的。如果您将我的代码片段(一行)与 John Kugelman 的答案进行对比,其中包含大量的调整和优化,但仍然不能完全消除性能差异,那么很明显
vector
相当不错毕竟设计得很巧妙。您不必费力就能获得与阵列相同的速度。相反,您必须使用最简单的解决方案。Try with this:
I get almost exactly the same performance as with array.
The thing about
vector
is that it's a much more general tool than an array. And that means you have to consider how you use it. It can be used in a lot of different ways, providing functionality that an array doesn't even have. And if you use it "wrong" for your purpose, you incur a lot of overhead, but if you use it correctly, it is usually basically a zero-overhead data structure. In this case, the problem is that you separately initialized the vector (causing all elements to have their default ctor called), and then overwriting each element individually with the correct value. That is much harder for the compiler to optimize away than when you do the same thing with an array. Which is why the vector provides a constructor which lets you do exactly that: initializeN
elements with valueX
.And when you use that, the vector is just as fast as an array.
So no, you haven't busted the performance myth. But you have shown that it's only true if you use the vector optimally, which is a pretty good point too. :)
On the bright side, it's really the simplest usage that turns out to be fastest. If you contrast my code snippet (a single line) with John Kugelman's answer, containing heaps and heaps of tweaks and optimizations, which still don't quite eliminate the performance difference, it's pretty clear that
vector
is pretty cleverly designed after all. You don't have to jump through hoops to get speed equal to an array. On the contrary, you have to use the simplest possible solution.尝试禁用检查的迭代器并在发布模式下构建。您不应该看到太大的性能差异。
Try disabling checked iterators and building in release mode. You shouldn't see much of a performance difference.
GNU 的 STL(和其他),给定
vector(n)
,默认构造一个原型对象T()
- 编译器将优化掉空构造函数 - 但随后现在为该对象保留的内存地址中发生的任何垃圾的副本都会由 STL 的 __uninitialized_fill_n_aux 获取,它会循环填充该对象的副本作为向量中的默认值。所以,“我的”STL不是循环构造,而是构造然后循环/复制。这是违反直觉的,但我应该记得,因为我在最近的一个 stackoverflow 问题上评论了这一点:构造/复制对于引用计数对象等来说可以更有效。所以:
or
is - 在许多 STL 实现上 - 类似于:
问题在于,当前一代的编译器优化器似乎无法从 temp 是未初始化的垃圾的角度出发,并且无法优化循环和默认复制构造函数调用。您可以可信地争辩说,编译器绝对不应该对此进行优化,因为编写上述代码的程序员有一个合理的期望,即所有对象在循环后都将相同,即使是垃圾(关于“相同”/operator== 的通常警告) memcmp/operator= 等适用)。不能期望编译器对 std::vector<> 的更大上下文有任何额外的了解。或稍后使用的数据表明此优化是安全的。
这可以与更明显的直接实现形成对比:
我们可以期望编译器对其进行优化。
为了更明确地说明向量行为这方面的理由,请考虑:
显然,如果我们创建 10000 个独立对象与 10000 个引用相同数据的对象,这是一个主要区别。有一个合理的论点认为,保护普通 C++ 用户免于意外地执行如此昂贵的操作的优势超过了难以优化的复制构造在现实世界中所付出的非常小的成本。
原始答案(供参考/理解评论):
没有机会。向量与数组一样快,至少如果您明智地保留空间的话。 ...
GNU's STL (and others), given
vector<T>(n)
, default constructs a prototypal objectT()
- the compiler will optimise away the empty constructor - but then a copy of whatever garbage happened to be in the memory addresses now reserved for the object is taken by the STL's__uninitialized_fill_n_aux
, which loops populating copies of that object as the default values in the vector. So, "my" STL is not looping constructing, but constructing then loop/copying. It's counter intuitive, but I should have remembered as I commented on a recent stackoverflow question about this very point: the construct/copy can be more efficient for reference counted objects etc..So:
or
is - on many STL implementations - something like:
The issue being that the current generation of compiler optimisers don't seem to work from the insight that temp is uninitialised garbage, and fail to optimise out the loop and default copy constructor invocations. You could credibly argue that compilers absolutely shouldn't optimise this away, as a programmer writing the above has a reasonable expectation that all the objects will be identical after the loop, even if garbage (usual caveats about 'identical'/operator== vs memcmp/operator= etc apply). The compiler can't be expected to have any extra insight into the larger context of std::vector<> or the later usage of the data that would suggest this optimisation safe.
This can be contrasted with the more obvious, direct implementation:
Which we can expect a compiler to optimise out.
To be a bit more explicit about the justification for this aspect of vector's behaviour, consider:
Clearly it's a major difference if we make 10000 independent objects versus 10000 referencing the same data. There's a reasonable argument that the advantage of protecting casual C++ users from accidentally doing something so expensive outweights the very small real-world cost of hard-to-optimise copy construction.
ORIGINAL ANSWER (for reference / making sense of the comments):
No chance. vector is as fast as an array, at least if you reserve space sensibly. ...
马丁·约克的回答让我烦恼,因为它似乎就像试图掩盖初始化问题一样。但他认为冗余默认构造是性能问题的根源是正确的。
[编辑:Martin 的回答不再建议更改默认构造函数。]
对于眼前的问题,您当然可以调用
vector
的 2 参数版本相反:如果您想用常量值进行初始化(这是常见情况),那么这很有效。但更普遍的问题是:如何使用比常量值更复杂的东西有效地初始化?
为此,您可以使用
back_insert_iterator
,它是一个迭代器适配器。这是一个带有int
向量的示例,尽管总体思路对于Pixel
也适用:或者您可以使用
copy()
或transform()
而不是generate_n()
。缺点是构造初始值的逻辑需要移到一个单独的类中,这比直接放置它不太方便(尽管 C++1x 中的 lambda 使这变得更好)。另外,我预计这仍然不会像基于
malloc()
的非 STL 版本那么快,但我预计它会接近,因为它只为每个元素执行一次构造。Martin York's answer bothers me because it seems like an attempt to brush the initialisation problem under the carpet. But he is right to identify redundant default construction as the source of performance problems.
[EDIT: Martin's answer no longer suggests changing the default constructor.]
For the immediate problem at hand, you could certainly call the 2-parameter version of the
vector<Pixel>
ctor instead:That works if you want to initialise with a constant value, which is a common case. But the more general problem is: How can you efficiently initialise with something more complicated than a constant value?
For this you can use a
back_insert_iterator
, which is an iterator adaptor. Here's an example with a vector ofint
s, although the general idea works just as well forPixel
s:Alternatively you could use
copy()
ortransform()
instead ofgenerate_n()
.The downside is that the logic to construct the initial values needs to be moved into a separate class, which is less convenient than having it in-place (although lambdas in C++1x make this much nicer). Also I expect this will still not be as fast as a
malloc()
-based non-STL version, but I expect it will be close, since it only does one construction for each element.矢量还调用 Pixel 构造函数。
每个都会导致您正在计时的近一百万次 ctor 运行。
编辑:然后是外部 1...1000 循环,因此请进行 10 亿次 ctor 调用!
编辑2:看看 UseArray 案例的反汇编会很有趣。优化器可以优化整个事情,因为它除了烧毁 CPU 之外没有任何效果。
The vector ones are additionally calling Pixel constructors.
Each is causing almost a million ctor runs that you're timing.
edit: then there's the outer 1...1000 loop, so make that a billion ctor calls!
edit 2: it'd be interesting to see the disassembly for the UseArray case. An optimizer could optimize the whole thing away, since it has no effect other than burning CPU.
以下是向量中的
push_back
方法的工作原理:调用
push_back
X 个项目后:重复。如果您不
预留
空间,它肯定会变慢。更重要的是,如果复制该项目的成本很高,那么像这样的“push_back”就会把你活活吃掉。至于向量与数组的问题,我必须同意其他人的观点。在发布版本中运行,打开优化,并添加更多标志,这样微软的友好人员就不会#@%$^ 为你做这些。
还有一件事,如果您不需要调整大小,请使用 Boost.Array。
Here's how the
push_back
method in vector works:After calling
push_back
X items:Repeat. If you're not
reserving
space its definitely going to be slower. More than that, if it's expensive to copy the item then 'push_back' like that is going to eat you alive.As to the
vector
versus array thing, I'm going to have to agree with the other people. Run in release, turn optimizations on, and put in a few more flags so that the friendly people at Microsoft don't #@%$^ it up for ya.One more thing, if you don't need to resize, use Boost.Array.
一些分析器数据(像素与 32 位对齐):
Blah
In
allocator
:vector
:array
大部分开销都在复制构造函数中。例如,
它具有与数组相同的性能。
Some profiler data (pixel is aligned to 32 bits):
Blah
In
allocator
:vector
:array
Most of the overhead is in the copy constructor. For example,
It has the same performance as an array.
我的笔记本电脑是 Lenova G770(4 GB RAM)。
操作系统是Windows 7 64位(带笔记本电脑的)
编译器是MinGW 4.6.1 。
IDE 是 Code::Blocks。
我测试了第一篇文章的源代码。
结果
O2 优化
UseArray 在 2.841 秒内完成
UseVector 在 2.548 秒内
完成 UseVectorPushBack 在 11.95 秒内
完成 整个事情在 17.342 秒内完成
系统暂停
O3 优化
UseArray 在 1.452 秒内完成
UseVector 在 2.514 秒内完成
完成
UseVectorPushBack 在 12.967 秒内完成整个事情在 12.967 秒内 16.937秒
看起来在O3优化下向量的性能更差。
如果把循环改成
O2和O3下数组和向量的速度几乎是一样的。
My laptop is Lenova G770 (4 GB RAM).
The OS is Windows 7 64-bit (the one with laptop)
Compiler is MinGW 4.6.1.
The IDE is Code::Blocks.
I test the source codes of the first post.
The results
O2 optimization
UseArray completed in 2.841 seconds
UseVector completed in 2.548 seconds
UseVectorPushBack completed in 11.95 seconds
The whole thing completed in 17.342 seconds
system pause
O3 optimization
UseArray completed in 1.452 seconds
UseVector completed in 2.514 seconds
UseVectorPushBack completed in 12.967 seconds
The whole thing completed in 16.937 seconds
It looks like the performance of vector is worse under O3 optimization.
If you change the loop to
The speed of array and vector under O2 and O3 are almost the same.
一个更好的基准(我认为......),由于优化,编译器可以更改代码,因为分配的向量/数组的结果不会在任何地方使用。
结果:
编译器:
CPU:
代码:
A better benchmark (I think...), compiler due to optimizations can change code, becouse results of allocated vectors/arrays are not used anywhere.
Results:
Compiler:
CPU:
And the code:
我做了一些我长期以来想做的广泛测试。不妨分享一下这个。
这是我的双启动机器 i7-3770、16GB RAM、x86_64,运行 Windows 8.1 和 Ubuntu 16.04。更多信息和结论,请参见下面的评论。测试了 MSVS 2017 和 g++(在 Windows 和 Linux 上)。
测试程序
结果
注释
我的结论和评论
std::array
连续之间的时间变化具有显着的稳定性运行,而其他尤其是 std:: 数据结构相比变化很大结论
当然,这是优化构建的代码。既然问题是关于 std::vector 的,那么是的,它是!很多!比普通数组慢(优化/未优化)。但是当您进行基准测试时,您自然希望生成优化的代码。
对我来说,这场秀的明星是
std::array
。I did some extensive tests that I wanted to for a while now. Might as well share this.
This is my dual boot machine i7-3770, 16GB Ram, x86_64, on Windows 8.1 and on Ubuntu 16.04. More information and conclusions, remarks below. Tested both MSVS 2017 and g++ (both on Windows and on Linux).
Test Program
Results
Notes
std::sort()
too (you can see it commented out) but removed them later because there were no significant relative differences.My Conclusions and Remarks
std::array
's time variations between consecutive runs, while others especially std:: data structs varied wildly in comparisonstd::array
and c-style arrays faster on Windows without optimizationVerdict
Of course this is code for an optimized build. And since the question was about
std::vector
then yes it is !much! slower than plain arrays (optimized/unoptimized). But when you're doing a benchmark, you naturally want to produce optimized code.The star of the show for me though has been
std::array
.通过正确的选项,向量和数组可以生成相同的asm。在这些情况下,它们当然具有相同的速度,因为无论哪种方式您都会获得相同的可执行文件。
With the right options, vectors and arrays can generate identical asm. In these cases, they are of course the same speed, because you get the same executable file either way.
顺便说一句,在使用向量的类中,您的观察速度也会减慢,这也发生在像 int 这样的标准类型上。这是一个多线程代码:
代码的行为表明向量的实例化是代码最长的部分。一旦你突破了这个瓶颈。其余代码运行得非常快。无论运行多少个线程,都是如此。
顺便说一下,忽略绝对疯狂的包含数量。我一直在使用这段代码来测试项目的内容,因此包含的数量不断增长。
By the way the slow down your seeing in classes using vector also occurs with standard types like int. Heres a multithreaded code:
The behavior from the code shows the instantiation of vector is the longest part of the code. Once you get through that bottle neck. The rest of the code runs extremely fast. This is true no matter how many threads you are running on.
By the way ignore the absolutely insane number of includes. I have been using this code to test things for a project so the number of includes keep growing.
我只想提一下,向量(和 smart_ptr)只是添加在原始数组(和原始指针)之上的薄层。
实际上,连续内存中向量的访问时间比数组更快。
以下代码显示了初始化和访问向量和数组的结果。
输出是:
所以如果你使用得当,速度几乎是一样的。
(正如其他人提到的使用reserve()或resize())。
I just want to mention that vector (and smart_ptr) is just a thin layer add on top of raw arrays (and raw pointers).
And actually the access time of an vector in continuous memory is faster than array.
The following code shows the result of initialize and access vector and array.
The output is:
So the speed will be almost the same if you use it properly.
(as others mentioned using reserve() or resize()).
嗯,因为 vector::resize() 比普通内存分配(通过 malloc)执行更多处理。
尝试在复制构造函数中放置一个断点(定义它以便可以断点!),这会增加额外的处理时间。
Well, because vector::resize() does much more processing than plain memory allocation (by malloc).
Try to put a breakpoint in your copy constructor (define it so that you can breakpoint!) and there goes the additional processing time.
我不得不说我不是 C++ 专家。但要添加一些实验结果:
编译:
gcc-6.2.0/bin/g++ -O3 -std=c++14 vector.cpp
machine:
OS:
Output:
这里唯一让我感到奇怪的是“UseFillConstructor”性能与“UseConstructor”相比。
代码:
因此,提供的额外“值”会大大降低性能,我认为这是由于多次调用复制构造函数所致。但是...
编译:
输出:
所以在这种情况下,gcc 优化非常重要,但当提供默认值时,它对您没有多大帮助。这,其实是抵扣我的学费的。希望它可以帮助新程序员选择哪种向量初始化格式。
I Have to say I'm not an expert in C++. But to add some experiments results:
compile:
gcc-6.2.0/bin/g++ -O3 -std=c++14 vector.cpp
machine:
OS:
Output:
Here the only thing I feel strange is that "UseFillConstructor" performance compared with "UseConstructor".
The code:
So the additional "value" provided slows down performance quite a lot, which I think is due to multiple call to copy constructor. But...
Compile:
Output:
So in this case, gcc optimization is very important but it can't help you much when a value is provided as default. This, is against my tuition actually. Hopefully it helps new programmer when choose which vector initialization format.
它似乎取决于编译器标志。这是一个基准代码:
不同的优化标志给出不同的答案:
您的确切结果会有所不同,但这在我的机器上非常典型。
It seems to depend on the compiler flags. Here is a benchmark code:
Different optimization flags give different answers:
Your exact results will vary but this is quite typical on my machine.
根据我的经验,有时,只是有时,
vector
可能比int[]
慢许多倍。需要记住的一件事是向量的向量与 int[][] 非常不同。因为元素在内存中可能不连续。这意味着您可以调整主向量内部不同向量的大小,但 CPU 可能无法像int[][]
那样缓存元素。In my experience, sometimes, just sometimes,
vector<int>
can be many times slower thanint[]
. One thing to keep mind is that vectors of vectors are very unlikeint[][]
. As the elements are probably not contiguous in memory. This means you can resize different vectors inside of the main one, but CPU might not be able to cache elements as well as in the case ofint[][]
.