std::vector 的容量如何自动增长?费率是多少?

发布于 2024-10-20 23:44:47 字数 1851 浏览 4 评论 0原文

我一直在读这本书: C++ Primer,第三版 作者:Stanley B. Lippman、Josée Lajoie,在第 6.3 条向量如何自行增长给出的程序中发现了 1 个错误,该程序错过了“ <”在 cout 中:

#include <vector>
#include <iostream>

using namespace std;

int main() {
    vector<int> ivec;
    cout < "ivec: size: " < ivec.size() < " capacity: "  < ivec.capacity() < endl;
    
    for (int ix = 0; ix < 24; ++ix) {
        ivec.push_back(ix);
        cout < "ivec: size: " < ivec.size()
        < " capacity: "  < ivec.capacity() < endl;
    }    
}

在那篇文章的后面:

“在 Rogue Wave 实施下,尺寸和容量 定义后的 ivec 均为 0。插入第一个元素时, 然而,ivec 的容量为 256,大小为 1。"

但是,在更正和运行代码时,我得到以下输出:


ivec: size: 0 capacity: 0
ivec[0]=0 ivec: size: 1 capacity: 1
ivec[1]=1 ivec: size: 2 capacity: 2
ivec[2]=2 ivec: size: 3 capacity: 4
ivec[3]=3 ivec: size: 4 capacity: 4
ivec[4]=4 ivec: size: 5 capacity: 8
ivec[5]=5 ivec: size: 6 capacity: 8
ivec[6]=6 ivec: size: 7 capacity: 8
ivec[7]=7 ivec: size: 8 capacity: 8
ivec[8]=8 ivec: size: 9 capacity: 16
ivec[9]=9 ivec: size: 10 capacity: 16
ivec[10]=10 ivec: size: 11 capacity: 16
ivec[11]=11 ivec: size: 12 capacity: 16
ivec[12]=12 ivec: size: 13 capacity: 16
ivec[13]=13 ivec: size: 14 capacity: 16
ivec[14]=14 ivec: size: 15 capacity: 16
ivec[15]=15 ivec: size: 16 capacity: 16
ivec[16]=16 ivec: size: 17 capacity: 32
ivec[17]=17 ivec: size: 18 capacity: 32
ivec[18]=18 ivec: size: 19 capacity: 32
ivec[19]=19 ivec: size: 20 capacity: 32
ivec[20]=20 ivec: size: 21 capacity: 32
ivec[21]=21 ivec: size: 22 capacity: 32
ivec[22]=22 ivec: size: 23 capacity: 32
ivec[23]=23 ivec: size: 24 capacity: 32

容量是否随着公式 2^N 增加,其中 N< /code> 是初始容量吗?请解释一下。

I had been going through the Book:
C++ Primer, Third Edition By Stanley B. Lippman, Josée Lajoie, found 1 mistake in the program given under the Article 6.3 How a vector Grows Itself, this program missed a "<" in the couts:

#include <vector>
#include <iostream>

using namespace std;

int main() {
    vector<int> ivec;
    cout < "ivec: size: " < ivec.size() < " capacity: "  < ivec.capacity() < endl;
    
    for (int ix = 0; ix < 24; ++ix) {
        ivec.push_back(ix);
        cout < "ivec: size: " < ivec.size()
        < " capacity: "  < ivec.capacity() < endl;
    }    
}

Later within that article:

"Under the Rogue Wave implementation, both the size and the capacity
of ivec after its definition are 0. On inserting the first element,
however, ivec's capacity is 256 and its size is 1."

But, on correcting and running the code i get the following output:


ivec: size: 0 capacity: 0
ivec[0]=0 ivec: size: 1 capacity: 1
ivec[1]=1 ivec: size: 2 capacity: 2
ivec[2]=2 ivec: size: 3 capacity: 4
ivec[3]=3 ivec: size: 4 capacity: 4
ivec[4]=4 ivec: size: 5 capacity: 8
ivec[5]=5 ivec: size: 6 capacity: 8
ivec[6]=6 ivec: size: 7 capacity: 8
ivec[7]=7 ivec: size: 8 capacity: 8
ivec[8]=8 ivec: size: 9 capacity: 16
ivec[9]=9 ivec: size: 10 capacity: 16
ivec[10]=10 ivec: size: 11 capacity: 16
ivec[11]=11 ivec: size: 12 capacity: 16
ivec[12]=12 ivec: size: 13 capacity: 16
ivec[13]=13 ivec: size: 14 capacity: 16
ivec[14]=14 ivec: size: 15 capacity: 16
ivec[15]=15 ivec: size: 16 capacity: 16
ivec[16]=16 ivec: size: 17 capacity: 32
ivec[17]=17 ivec: size: 18 capacity: 32
ivec[18]=18 ivec: size: 19 capacity: 32
ivec[19]=19 ivec: size: 20 capacity: 32
ivec[20]=20 ivec: size: 21 capacity: 32
ivec[21]=21 ivec: size: 22 capacity: 32
ivec[22]=22 ivec: size: 23 capacity: 32
ivec[23]=23 ivec: size: 24 capacity: 32

Is the capacity increasing with the formula 2^N where N is the initial capacity? Please explain.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

幸福%小乖 2024-10-27 23:44:47

标准要求向量容量增长的速率是指数级的(恕我直言,这是过度规范的)。标准指定这一点是为了满足 push_back 操作的摊销恒定时间要求。 摊销恒定时间意味着什么以及指数增长如何实现这一点很有趣。

每次向量容量增加时,都需要复制元素。如果您在向量的生命周期内“摊销”此成本,那么事实证明,如果您以指数因子增加容量,则最终会得到摊销的恒定成本。

这可能看起来有点奇怪,所以让我向您解释一下这是如何工作的...

  • size: 1capacity 1 - 没有复制任何元素,每个元素的副本成本为 0。
  • size: 2capacity 2 - 当向量的容量如果增加到 2,则必须复制第一个元素。每个元素的平均副本数为 0.5
  • 大小:3 容量 4 - 当向量的容量增加到 4 时,必须复制前两个元素。每个元素的平均副本为 (2 + 1 + 0) / 3 = 1。
  • 大小:4 容量 4 - 每个元素的平均副本为 (2 + 1 + 0 + 0) / 4 = 3 / 4 = 0.75。
  • 大小:5 容量 8 - 每个元素的平均副本数为 (3 + 2 + 1 + 1 + 0) / 5 = 7 / 5 = 1.4
  • ...
  • 大小:8 容量 8 - 每个元素的平均副本数为 (3 + 2 + 1 + 1 + 0 + 0 + 0 + 0) / 8 = 7 / 8 = 0.875
  • 大小:9 容量 16 - 每个元素的平均副本为 (4 + 3 + 2 + 2 + 1 + 1 + 1 + 1 + 0) / 9 = 15 / 9 = 1.67
  • ...
  • size 16 容量 16 - 每个元素的平均副本数为 15 / 16 = 0.938
  • size 17 容量 32 - 每个元素的平均副本数为 31 / 17 = 1.82

正如你所看到的,每次容量跳跃时,副本数量增加了数组先前的大小。但由于在容量再次跳跃之前数组的大小必须加倍,因此每个元素的副本数始终小于 2。

如果将容量增加 1.5 * N 而不是 2 * N,则最终会得到一个非常大的值。类似的效果,除了每个元素的副本上限会更高(我认为是 3)。

我怀疑实现会选择 1.5 而不是 2,这既是为了节省一点空间,也是因为 1.5 更接近 黄金比例。我有一个直觉(目前没有任何硬数据支持),符合黄金比例的增长率(因为它与斐波那契数列的关系)将被证明是现实世界负载最有效的增长率最大限度地减少使用的额外空间和时间。

The rate at which the capacity of a vector grows is required by the standard to be exponential (which, IMHO, is over-specification). The standard specifies this in order to meet the amortized constant time requirement for the push_back operation. What amortized constant time means and how exponential growth achieves this is interesting.

Every time a vector's capacity is grown the elements need to be copied. If you 'amortize' this cost out over the lifetime of the vector, it turns out that if you increase the capacity by an exponential factor you end up with an amortized constant cost.

This probably seems a bit odd, so let me explain to you how this works...

  • size: 1 capacity 1 - No elements have been copied, the cost per element for copies is 0.
  • size: 2 capacity 2 - When the vector's capacity was increased to 2, the first element had to be copied. Average copies per element is 0.5
  • size: 3 capacity 4 - When the vector's capacity was increased to 4, the first two elements had to be copied. Average copies per element is (2 + 1 + 0) / 3 = 1.
  • size: 4 capacity 4 - Average copies per element is (2 + 1 + 0 + 0) / 4 = 3 / 4 = 0.75.
  • size: 5 capacity 8 - Average copies per element is (3 + 2 + 1 + 1 + 0) / 5 = 7 / 5 = 1.4
  • ...
  • size: 8 capacity 8 - Average copies per element is (3 + 2 + 1 + 1 + 0 + 0 + 0 + 0) / 8 = 7 / 8 = 0.875
  • size: 9 capacity 16 - Average copies per element is (4 + 3 + 2 + 2 + 1 + 1 + 1 + 1 + 0) / 9 = 15 / 9 = 1.67
  • ...
  • size 16 capacity 16 - Average copies per element is 15 / 16 = 0.938
  • size 17 capacity 32 - Average copies per element is 31 / 17 = 1.82

As you can see, every time the capacity jumps, the number of copies goes up by the previous size of the array. But because the array has to double in size before the capacity jumps again, the number of copies per element always stays less than 2.

If you increased the capacity by 1.5 * N instead of by 2 * N, you would end up with a very similar effect, except the upper bound on the copies per element would be higher (I think it would be 3).

I suspect an implementation would choose 1.5 over 2 both to save a bit of space, but also because 1.5 is closer to the golden ratio. I have an intuition (that is currently not backed up by any hard data) that a growth rate in line with the golden ratio (because of its relationship to the fibonacci sequence) will prove to be the most efficient growth rate for real-world loads in terms of minimizing both extra space used and time.

一张白纸 2024-10-27 23:44:47

为了能够在 std::vector 末尾提供摊销常数时间插入,实现必须将向量的大小(需要时)增加一个因子 < code>K>1 (*),这样当尝试附加到已满的大小为 N 的向量时,向量将增长为 K*N >。

不同的实现使用不同的常量K来提供不同的好处,特别是大多数实现都采用K = 2K = 1.5。较高的 K 会使其速度更快,因为它需要的增长更少,但同时也会对内存产生更大的影响。例如,在 gcc K = 2 中,而在 VS (Dinkumware) K = 1.5 中。

(*) 如果向量以恒定量增长,则 push_back 的复杂度将变为线性,而不是摊销常量。例如,如果向量在需要时增长 10 个元素,则增长成本(将所有元素复制到新的内存地址)将为 O( N / 10 ) (每 10 个元素,移动所有元素)或O(N)

To be able to provide amortized constant time insertions at the end of the std::vector, the implementation must grow the size of the vector (when needed) by a factor K>1 (*), such that when trying to append to a vector of size N that is full, the vector grows to be K*N.

Different implementations use different constants K that provide different benefits, in particular most implementations go for either K = 2 or K = 1.5. A higher K will make it faster as it will require less grows, but it will at the same time have a greater memory impact. As an example, in gcc K = 2, while in VS (Dinkumware) K = 1.5.

(*) If the vector grew by a constant quantity, then the complexity of push_back would become linear instead of amortized constant. For example, if the vector grew by 10 elements when needed, the cost of growing (copy of all element to the new memory address) would be O( N / 10 ) (every 10 elements, move everything) or O( N ).

裸钻 2024-10-27 23:44:47

只是为了在 vector::push_back 上添加一些时间复杂度的数学证明,假设向量的大小为 n,我们在这里关心的是发生的副本数量,所以到目前为止,说y,请注意每次增长向量时都会发生复制。

增长因子 K

  y = K^1 + K^2 + K^3 ... K^log(K, n)
K*y =     + K^2 + K^3 ... K^log(K, n) + K*K^log(K, n)

K*y-y = K*K^log(K, n) - K
y = K(n-1)/(K-1) = (K/(K-1))(n-1)

T(n) = y/n = (K/(K-1)) * (n-1)/n < K/(K-1) = O(1)

K/(K-1) 是一个常数,看看最常见的情况:

  • K=2, T( n) = 2/(2-1) = 2
  • K=1.5, T(n) = 1.5/(1.5-1) = 3

实际上在不同的实现中选择 K 为 1.5 或 2 是有原因的,请参见此< a href="https://www.desmos.com/calculator/brtkj6myfk" rel="nofollow noreferrer">图表:当 T(n) 达到最小值时K 约为 2,使用较大的 K 并没有多大好处,但代价是分配更多内存

按恒定数量增长 C

y = C + 2*C + 3*C + 4*C +  ... (n/C) * C
  = C(1+2+3+...+n/C), say m = n/C
  = C*(m*(m-1)/2)
  = n(m-1)/2

T(n) = y/n = (n(m-1)/2)/n = (m-1)/2 = n/2C - 1/2 = O(n)

正如我们所看到的,它是班轮

Just to add some mathematic proof on the time complexity on vector::push_back, say the size of vector is n, what we care about here is the number of copies happened so far, say y, notice the copy happens every time you grow the vector.

Grow by factor of K

  y = K^1 + K^2 + K^3 ... K^log(K, n)
K*y =     + K^2 + K^3 ... K^log(K, n) + K*K^log(K, n)

K*y-y = K*K^log(K, n) - K
y = K(n-1)/(K-1) = (K/(K-1))(n-1)

T(n) = y/n = (K/(K-1)) * (n-1)/n < K/(K-1) = O(1)

K/(K-1) is a constant, and see the most common cases:

  • K=2, T(n) = 2/(2-1) = 2
  • K=1.5, T(n) = 1.5/(1.5-1) = 3

and actually there is a reason of choosing K as 1.5 or 2 in different implementations, see this graph: as T(n) reaching the minimum when K is around 2, there is not much benefit on using a larger K, at the cost of allocating more memory

Grow by constant quantity of C

y = C + 2*C + 3*C + 4*C +  ... (n/C) * C
  = C(1+2+3+...+n/C), say m = n/C
  = C*(m*(m-1)/2)
  = n(m-1)/2

T(n) = y/n = (n(m-1)/2)/n = (m-1)/2 = n/2C - 1/2 = O(n)

As we could see it is liner

℉絮湮 2024-10-27 23:44:47

vector 的容量完全依赖于实现,没有人能知道它是如何增长的。

The capacity of the vector is completely implementation-dependent, no one can tell how it's growing..

你怎么敢 2024-10-27 23:44:47

您正在使用“Rogue Wave”实施吗?

容量如何增长取决于实施。你的使用 2^N。

Are you using the "Rogue Wave" implementation?

How capacity grows is up to the implementation. Yours use 2^N.

烟火散人牵绊 2024-10-27 23:44:47

是的,每次超出时容量都会增加一倍。这是依赖于实现的。

Yes, the capacity doubles each time it is exceeded. This is implementation dependent.

淡忘如思 2024-10-27 23:44:47

在推回元素之前,向量检查大小是否大于其容量,如下所示,

我将用保留函数解释它:


void    push_back(const value_type &val)   //push_back actual prototype
{
   if (size_type < 10)
        reserve(size_type + 1);
   else if (size_type > (_capacity / 4 * 3))
        reserve(_capacity + (this->_capacity / 4));
   //then the vector get filled with value
}

size_type :向量大小。

_capacity:向量_capacity。

before pushing back an element the vector check if the size is greater than it's capacity like bellow

i will explain it with reserve function :


void    push_back(const value_type &val)   //push_back actual prototype
{
   if (size_type < 10)
        reserve(size_type + 1);
   else if (size_type > (_capacity / 4 * 3))
        reserve(_capacity + (this->_capacity / 4));
   //then the vector get filled with value
}

size_type : the vector size.

_capacity : the vector _capacity.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文