std::vector 插入的摊销分析
我们如何分析 std::vector 后面的插入(push_back)?每次插入的摊销时间为 O(1)。特别是在 Stephan T Lavavej 的第 9 频道视频 和 在此( 17:42 起) )他说,为了获得最佳性能,微软实施此方法将向量的容量增加了大约 1.5。
这个常数是如何确定的呢?
How do we do the analysis of insertion at the back (push_back) in a std::vector? It's amortized time is O(1) per insertion. In particular in a video in channel9 by Stephan T Lavavej and in this ( 17:42 onwards ) he says that for optimal performance Microsoft's implementation of this method increases capacity of the vector by around 1.5.
How is this constant determined?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
假设你的意思是
push_back
而不是插入,我相信重要的部分是乘以某个常数(而不是每次抓取 N 个元素),只要你这样做,你就会得到摊销恒定时间。更改因子会改变平均情况和最坏情况的性能。具体来说:
如果您的常数因子太大,您将获得良好的平均情况性能,但最坏情况下的性能会很差,尤其是当数组变大时。例如,想象一下,仅仅因为推送了第 10001 个元素,就将 10000 大小的向量加倍 (2x)。编辑:正如迈克尔·伯尔间接指出的那样,这里的真正成本可能是你的内存增长得比你需要的要大得多。我想补充一点,如果您的因子太大,则存在影响速度的缓存问题。可以这么说,如果你的规模比你需要的大得多,就会产生实际成本(内存和计算)。
但是,如果您的常数因子太小,例如(1.1x),那么您将获得良好的最坏情况性能,但平均性能较差,因为您将不得不承担多次重新分配的成本。
另外,请参阅 Jon Skeet 对类似问题的回答之前。(感谢 @Bo Persson)
更多关于分析的信息:假设您有
n
个要推迟的项目,乘法因子为M
。那么重新分配的次数将大致为n
的以M
为底的对数 (log_M(n)
)。第 i 次重新分配的成本与 M^i 成正比(M
的i
次方)。那么所有推回的总时间将为M^1 + M^2 + ... M^(log_M(n))
。推回的数量为n
,因此您得到这个级数(这是一个几何级数,并在极限内减少到大约(nM)/(M-1)
)除以n
。这大致是一个常数,M/(M-1)
。对于较大的
M
值,您将超出很多范围,并且经常分配比您合理需要的更多的内容(我上面提到过)。对于较小的M
值(接近 1),此常数M/(M-1)
会变大。这个因素直接影响平均时间。Assuming you mean
push_back
and not insertion, I believe that the important part is the multiply by some constant (as opposed to grabbing N more elements each time) and as long as you do this you'll get amortized constant time. Changing the factor changes the average case and worst case performance.Concretely:
If your constant factor is too large, you'll have good average case performance, but bad worst case performance especially as the arrays get big. For instance, imagine doubling (2x) a 10000 size vector just because you have the 10001th element pushed. EDIT: As Michael Burr indirectly pointed out, the real cost here is probably that you'll grow your memory much larger than you need it to be. I would add to this that there are cache issues that affect speed if your factor is too large. Suffice it to say that there are real costs (memory and computation) if you grow much larger than you need.
However if your constant factor is too small, say (1.1x) then you're going to have good worst case performance, but bad average performance, because you're going to have to incur the cost of reallocating too many times.
Also, see Jon Skeet's answer to a similar question previously. (Thanks @Bo Persson)
A little more about the analysis: Say you have
n
items you are pushing back and a multiplication factor ofM
. Then the number of reallocations will be roughly log baseM
ofn
(log_M(n)
). And thei
th reallocation will cost proportional toM^i
(M
to thei
th power). Then the total time of all the pushbacks will beM^1 + M^2 + ... M^(log_M(n))
. The number of pushbacks isn
, and thus you get this series (which is a geometric series, and reduces to roughly(nM)/(M-1)
in the limit) divided byn
. This is roughly a constant,M/(M-1)
.For large values of
M
you will overshoot a lot and allocate much more than you need reasonably often (which I mentioned above). For small values ofM
(close to 1) this constantM/(M-1)
becomes large. This factor directly affects the average time.你可以通过数学计算来尝试弄清楚这种事情是如何运作的。
渐近分析的一种流行方法是银行家方法。您所做的就是用额外的成本标记所有操作,将其“保存”以供以后支付昂贵的操作。
让我们做一些转储假设来简化数学:
1
。 (与在数组之间插入和移动相同)我们的算法如下所示:
显然,当我们必须将元素移动到新数组时,就会发生“最坏情况”。我们尝试通过在插入成本中添加
d
常量标记来摊销此费用,使其每次操作的总计为(1 + d)
。在调整数组大小后,我们已经填满了 (1/K) 个数组,但没有节省任何费用。
当我们填满数组时,我们可以确保至少保存了
d * (1 - 1/K) * N
。由于这笔钱必须能够支付所有 N 个元素的移动费用,因此我们可以找出K
和d
之间的关系:一个有用的表格:
因此您可以得到一个粗略的数学家关于时间/内存权衡如何解决这个问题的想法。当然,有一些警告:当数组元素较少时,我并没有过度缩小数组,这只涵盖了最坏的情况,即没有元素被删除,并且没有考虑分配额外内存的时间成本。
他们很可能进行了一系列实验测试来解决这个问题,最终使我写的大部分内容变得无关紧要。
You can do the math to try to figure how this kind of thing works.
A popular method to work with asymptotic analysis is the Bankers method. What you do is markup all your operations with an extra cost, "saving" it for later to pay for an expensive operation latter on.
Let's make some dump assumptions to simplify the math:
1
. (Same for inserting and moving between arrays)And our algorithm looks like:
Obviously, the "worst case" happens when we have to move the elements to the new array. Let's try to amortize this by adding a constant markup of
d
to the insertion cost, bringing it to a total of(1 + d)
per operation.Just after an array has been resized, we have (1/K) of it filled up and no money saved.
By the time we fill the array up, we can be sure to have at least
d * (1 - 1/K) * N
saved up. Since this money must be able to pay for all N elements being moved, we can figure out a relation betweenK
andd
:A helpful table:
So from this you can get a rough mathematician's idea of how the time/memory tradeoff works for this problem. There are some caveats, of course: I didn't go over shrinking the array when it gets less elements, this only covers the worst case where no elements are ever removed and the time costs of allocating extra memory weren't accounted for.
They most likely ran a bunch of experimental tests to figure this out in the end making most of what I wrote irrelevant though.
嗯,当你熟悉数字系统(例如我们常用的十进制)时,分析非常简单。
为了简单起见,假设每次达到当前容量时,都会分配一个新的 10 倍大的缓冲区。
如果原始缓冲区的大小为 1,则第一次重新分配将复制 1 个元素,第二次重新分配(现在缓冲区的大小为 10)将复制 10 个元素,依此类推。因此,通过五次重新分配,您将执行 1+10+100+1000+10000 = 11111 个元素副本。乘以 9,得到 99999;现在加 1,就得到 100000 = 10^5。或者换句话说,向后执行,为支持这 5 次重新分配而执行的元素副本数量为 (10^5-1)/9。
经过 5 次重新分配(5 次乘以 10)后的缓冲区大小为 10^5。这大约比元素复制操作的数量大 9 倍。这意味着复制所花费的时间与生成的缓冲区大小大致呈线性关系。
使用基数 2 而不是 10,您将得到 (2^5-1)/1 = 2^5-1。
对于其他基础(或增加缓冲区大小的因素),依此类推。
干杯&嗯。
Uhm, the analysis is really simple when you're familiar with number systems, such as our usual decimal one.
For simplicity, then, assume that each time the current capacity is reached, a new 10x as large buffer is allocated.
If the original buffer has size 1, then the first reallocation copies 1 element, the second (where now the buffer has size 10) copies 10 elements, and so on. So with five reallocations, say, you have 1+10+100+1000+10000 = 11111 element copies performed. Multiply that by 9, and you get 99999; now add 1 and you have 100000 = 10^5. Or in other words, doing that backwards, the number of element copies performed to support those 5 reallocations has been (10^5-1)/9.
And the buffer size after 5 reallocations, 5 multiplications by 10, is 10^5. Which is roughly a factor of 9 larger than the number of element copy operations. Which means that the time spent on copying is roughly linear in the resulting buffer size.
With base 2 instead of 10 you get (2^5-1)/1 = 2^5-1.
And so on for other bases (or factors to increase buffer size by).
Cheers & hth.