为什么erase()函数如此昂贵?
考虑一个 2d 向量 vector <向量
并且假设它的内容如下:
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
所以这里N的大小是4,即N.size() = 4
现在,考虑下面的代码:
int i = 0;
while(N != empty()){
N.erase(i);
++i;
}
我计算了时间这段代码单独具有不同大小的 N,结果如下:
N 的大小为 1000 执行时间:0.230000s
N的大小为10000 执行时间:22.900000s
N的大小为20000 执行时间:91.760000s
N的大小为30000 执行时间:206.620000s
N的大小为47895 执行时间:526.540000s
我的问题是为什么这个函数如此昂贵?如果是这样,那么许多程序中的条件擦除语句可能会因为这个函数而永远花费下去。当我在 std::map
中使用擦除函数时也是同样的情况。这个功能有什么替代方案吗?其他库(例如 Boost)是否提供任何功能?
请不要说我可以整体执行 N.erase()
因为我只是想分析这个函数。
Consider a 2d vector vector < vector <int> > N
and lets say its contents are as follows:
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
So the size of N here is 4 i.e. N.size() = 4
Now, consider the following code :
int i = 0;
while(N != empty()){
N.erase(i);
++i;
}
I calculated the time just for this piece of code alone with various sizes for N and following are the results:
The size of N is 1000
Execution Time: 0.230000s
The size of N is 10000
Execution Time: 22.900000s
The size of N is 20000
Execution Time: 91.760000s
The size of N is 30000
Execution Time: 206.620000s
The size of N is 47895
Execution Time: 526.540000s
My question is why is this function so expensive ? If it is so then conditional erase statements in many programs could take forever just because of this function. It is the same case when I use erase function in std::map
too. Is there any alternative for this function. Does other libraries like Boost offer any?
Please do not say I could do N.erase()
as a whole because I'm just trying to analyze this function.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
向量是一个数组,当您向其中添加元素时,它会自动增长。因此,向量中的元素在内存中是连续的。这允许对元素进行恒定的时间访问。因为它们是从末尾开始增长的,所以它们也需要摊销常数时间来添加或删除到末尾。
现在,当你从中间移除时会发生什么?嗯,这意味着擦除元素后存在的任何内容都必须向后移动一个位置。这是非常昂贵的。
如果您想在中间进行大量插入/删除操作,请使用链表,例如 std::deque 的 std::list。
A vector is an array that grows automatically as you add elements to it. As such, elements in a vector a contiguous in memory. This allows constant time access to an element. Because they grow from the end, they also take amortized constant time to add or remove to/from the end.
Now, what happens when you remove in the middle? Well, it means whatever exists after the erased element must be shifted back one position. This is very expensive.
If you want to do lots of insertion/removal in the middle, use a linked list such as std::list of std::deque.
正如 Oli 所说,从向量的第一个元素中删除意味着必须向下复制其后面的元素,以便数组能够按预期运行。
这就是为什么链表用于从列表中的随机位置删除元素的情况 - 它更快(在较大的列表上),因为没有复制,仅重置一些节点指针。
As Oli said, erasing from the first element of a vector means the elements following it have to be copied down in order for the array to behave as desired.
This is why linked lists are used for situations in which elements will be removed from random locations in the list - it is quicker (on larger lists) because there is no copying, only resetting some node pointers.
考虑一下当删除向量的第一个元素时会发生什么。向量的其余部分必须向下“移动”一个索引,这涉及到复制它。尝试从另一端擦除,看看这是否会产生影响(我怀疑它会......)
Consider what happens when you delete the first element of a vector. The rest of the vector must be "moved" down by one index, which involves copying it. Try erasing from the other end, and see if that makes a difference (I suspect it will...)
因为你的算法是O(n^2)。每次调用
erase
都会强制vector
将已擦除元素之后的所有元素移回原位。因此,在具有 4 元素向量的循环中,第一个循环导致 3 个元素移动,第二次迭代导致 1 个元素移动,之后您将出现未定义的行为。如果有 8 个元素,则第一次迭代将移动 7 个元素,下一次迭代将移动 5 个元素,下一次迭代将移动 3 个元素,最后的枚举将移动 1 个元素。 (同样,你有未定义的行为)
当你遇到这样的情况时,通常你应该使用标准算法(即
std::remove
、std::remove_if
),因为它们运行一次容器并将典型的 O(n^2) 算法转换为 O(n) 算法。有关详细信息,请参阅 Scott Meyers 的“有效 STL”第 43 项:优先使用算法调用而不是显式循环。Because your algorithm is O(n^2). Each call to
erase
forces thevector
to move all elements after the erased element back. So in your loop with the 4 element vector, the first loop causes 3 elements to be shifted, the second iteration causes 1 element to be shifted, and after that you have undefined behavior.If you had 8 elements, the first iteration would move 7 elements, the next would move 5 elements, the next would move 3 elements, and the final enumeration would move 1 element. (And again you have undefined behavior)
When you encounter situations like this, generally you should use the standard algorithms (i.e.
std::remove
,std::remove_if
) instead, as they run through the container once and turn typical O(n^2) algorithms into O(n) algorithms. For more information see Scott Meyers' "Effective STL" Item 43: Prefer Algorithm Calls to Explicit Loops.std::vector 在内部只是一个元素数组。如果删除中间的一个元素,则该元素后面的所有元素都必须下移。这可能非常昂贵 - 如果元素有一个可以完成大量工作的自定义
operator=
,则成本会更高!如果您需要
erase()
速度更快,您应该使用std::list
- 这将使用双向链表结构,允许从中间快速擦除(但是,其他操作会稍微慢一些)。如果您只需要快速从列表的开始中删除,请使用std::deque
- 这会创建一个数组的链接列表,并提供std::vector
的大部分速度优势,同时仍然允许快速擦除仅开始或结束。此外,请注意,您的循环使问题变得更糟 - 您首先扫描所有等于零的元素并删除它们。扫描需要 O(n) 时间,擦除也需要 O(n) 时间。然后重复 1,依此类推 - 总体时间为 O(n^2)。如果您需要删除多个值,您应该使用迭代器并使用
erase()
。或者,如果您使用向量
,您会发现复制到新向量中的速度会更快。至于
std::map
(和std::set
) - 这根本不是问题。std::map
既能够随机删除元素,也能够随机搜索元素,时间为O(lg n)
- 这对于大多数用途来说是相当合理的。即使你的天真循环也不应该太糟糕;手动迭代并一次性删除您想要删除的所有内容在某种程度上更有效,但远没有达到std::list
和朋友的程度。A std::vector is, internally, just an array of elements. If you delete an element in the middle, all the elements after it have to be shifted down. This can be very expensive - even more so if the elements have a custom
operator=
that does a lot of work!If you need
erase()
to be fast, you should use astd::list
- this will use a doubly linked list structure that allows fast erasure from the middle (however, other operations get somewhat slower). If you just need to remove from the start of the list quickly, usestd::deque
- this creates a linked list of arrays, and offers most of the speed advantages ofstd::vector
while still allowing fast erasures from the beginning or end only.Furthermore, note that your loop there makes the problem worse - you first scan through all elements equal to zero and erase them. The scan takes O(n) time, the erasure also O(n) time. You then repeat for 1, and so on - overall, O(n^2) time. If you need to erase multiple values, you should take an iterator and go through the
std::list
yourself, using the iterator variant oferase()
. Or if you use avector
, you'll find it can be faster to copy into a new vector.As for
std::map
(andstd::set
) - this isn't a problem at all.std::map
is capable of both removing elements at random, as well as searching for elements at random, withO(lg n)
time - which is quite reasonable for most uses. Even your naive loop there shouldn't be too bad; manually iterating through and removing everything you want to remove in one pass is somewhat more efficient, but not nearly to the extent that it is withstd::list
and friends.vector.erase 会将 i 之后的所有元素向前推进 1。这是一个 O(n) 操作。
此外,您是按值而不是按引用传递向量。
您的代码也不会删除整个向量。
例如:
我=0
擦除N[0]
N = {{2, 2, 2, 2}, {3, 3, 3, 3}, {4, 4, 4, 4}}
i = 1
擦除N[1]
N = {{2, 2, 2, 2}, {4, 4, 4, 4}}
i = 2
擦除 N[2] 没有任何反应,因为最大索引是 N[1]
最后,我认为这是 vector.erase() 的正确语法。您需要将迭代器传递到开始位置以删除所需的元素。
试试这个:
您还可以将其与从末尾擦除进行比较(它应该明显更快,特别是在使用值而不是引用时):
vector.erase will advance all elements after i forward by 1. This is an O(n) operation.
Additionally, you're passing vectors by value rather than by reference.
Your code also doesn't erase the entire vector.
For example:
i = 0
erase N[0]
N = {{2, 2, 2, 2}, {3, 3, 3, 3}, {4, 4, 4, 4}}
i = 1
erase N[1]
N = {{2, 2, 2, 2}, {4, 4, 4, 4}}
i = 2
erase N[2] nothing happens because the maximum index is N[1]
Lastly, I don' think that's the correct syntax for vector.erase(). You need to pass in an iterator to the begin location to erase the element you want.
Try this:
You can also compare this to erasing from the end (it should be significantly faster, especially when using values rather than references):