调用擦除时 STL 迭代器失效的问题
STL 标准定义,当 std::deque、std::list 等容器上发生擦除时,迭代器将失效。
我的问题如下,假设 std::deque 中包含整数列表,以及一对指示 std::deque 中元素范围的索引,删除所有偶数元素的正确方法是什么?
到目前为止,我有以下内容,但是这里的问题是假定的结束在擦除后无效:
#include <cstddef>
#include <deque>
int main()
{
std::deque<int> deq;
for (int i = 0; i < 100; deq.push_back(i++));
// range, 11th to 51st element
std::pair<std::size_t,std::size_t> r(10,50);
std::deque<int>::iterator it = deq.begin() + r.first;
std::deque<int>::iterator end = deq.begin() + r.second;
while (it != end)
{
if (*it % 2 == 0)
{
it = deq.erase(it);
}
else
++it;
}
return 0;
}
检查 std::remove_if 的实现方式,似乎正在进行一个非常昂贵的复制/下移过程。
是否有一种更有效的方法可以在不进行所有复制/移位的情况下实现上述目标
通常是删除/擦除元素比将其与序列中的下一个第 n 个值交换更昂贵(其中 n 是迄今为止删除/删除的元素数量)
注意:答案应该假设序列大小非常大,+1mil 元素,并且平均 1/3 的元素会增加用于擦除。
The STL standard defines that when an erase occurs on containers such as std::deque, std::list etc iterators are invalidated.
My question is as follows, assuming the list of integers contained in a std::deque, and a pair of indicies indicating a range of elements in the std::deque, what is the correct way to delete all even elements?
So far I have the following, however the problem here is that the assumed end is invalidated after an erase:
#include <cstddef>
#include <deque>
int main()
{
std::deque<int> deq;
for (int i = 0; i < 100; deq.push_back(i++));
// range, 11th to 51st element
std::pair<std::size_t,std::size_t> r(10,50);
std::deque<int>::iterator it = deq.begin() + r.first;
std::deque<int>::iterator end = deq.begin() + r.second;
while (it != end)
{
if (*it % 2 == 0)
{
it = deq.erase(it);
}
else
++it;
}
return 0;
}
Examining how std::remove_if is implemented, it seems there is a very costly copy/shift down process going on.
Is there a more efficient way of achieving the above without all the copy/shifts
In general is deleting/erasing an element more expensive than swapping it with the next nth value in the sequence (where n is the number of elements deleted/removed so far)
Note: Answers should assume the sequence size is quite large, +1mil elements and that on average 1/3 of elements would be up for erasure.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我会使用擦除删除惯用语。我认为链接的维基百科文章甚至显示了您正在做的事情——删除奇怪的元素。
remove_if
所做的复制并不比从容器中间删除元素时发生的复制成本更高。它甚至可能更有效率。I'd use the Erase-Remove Idiom. I think the Wikipedia article linked even shows what you're doing -- removing odd elements.
The copying that
remove_if
does is no more costly than what happens when you delete elements from the middle of the container. It might even be more efficient.调用
.erase()
也会导致“一个非常昂贵的复制/下移过程正在进行”。当您从容器中间擦除一个元素时,该点之后的所有其他元素都必须向下移动一个位置到可用空间中。如果删除多个元素,则每个删除的元素都会产生相应的成本。一些未擦除的元素会移动几个点,但被迫一次移动一个点,而不是一次全部移动。这是非常低效的。标准库算法
std::remove
和std::remove_if
优化了这项工作。他们使用巧妙的技巧来确保每个移动的元素仅移动一次。这比你自己做的事情快得多,与你的直觉相反。伪代码如下:
如您所见,原始序列中的每个元素都被仅考虑一次,如果需要保留,则将其复制一次,即到当前的 write_location。它永远不会被再次查看,因为 write_location 永远不能运行在 read_location 之前。
Calling
.erase()
also results in "a very costly copy/shift down process going on.". When you erase an element from the middle of the container, every other element after that point must be shifted down one spot into the available space. If you erase multiple elements, you incur that cost for every erased element. Some of the non-erased elements will move several spots, but are forced to move one spot at a time instead of all at once. That is very inefficient.The standard library algorithms
std::remove
andstd::remove_if
optimize this work. They use a clever trick to ensure that every moved element is only moved once. This is much, much faster than what you are doing yourself, contrary to your intuition.The pseudocode is like this:
As you can see, every element in the original sequence is considered exactly once, and if it needs to be kept, it gets copied exactly once, to the current write_location. It will never be looked at again, because the write_location can never run in front of the read_location.
请记住,双端队列是一个连续的内存容器(如向量,并且可能共享实现),因此删除容器中间的元素必然意味着将后续元素复制到空洞上。您只想确保进行一次迭代并将每个不可删除的对象直接复制到其最终位置,而不是在每次删除期间将所有对象一一移动。在这方面,
remove_if
是高效且适当的,而您的erase
循环则不然:它会进行大量不必要的复制。FWIW - 替代方案:
选择什么取决于性质、相对频率和特定操作的性能要求(例如,如果在非关键时间完成,您可能可以承受缓慢的删除,但需要尽可能最快的迭代 - 无论是什么,请确保您了解您的需求以及各种操作的影响数据结构)。
Remember that deque is a contiguous memory container (like vector, and probably sharing implementation), so removing elements mid-container necessarily means copying subsequent elements over the hole. You just want to make sure you're doing one iteration and copying each not-to-be-deleted object directly to its final position, rather than moving all objects one by one during each delete.
remove_if
is efficient and appropriate in this regard, yourerase
loop is not: it does massive amounts of unnecessary copying.FWIW - alternatives:
What to choose depends on the nature, relative frequency, and performance requirements of specific operations (e.g. it may be that you can afford slow removals if they're done at non-critical times, but need fastest-possible iteration - whatever it is, make sure you understand your needs and the implications of the various data structures).
尚未提及的一种替代方法是创建一个新的
deque
,将要保留的元素复制到其中,然后将其与旧的deque 进行
。交换
我不确定您是否有足够的内存来创建副本,但制作副本通常比尝试从大型集合中内联删除元素更快更容易。如果您仍然看到内存抖动,请通过调用 std::count_if 计算出要保留多少元素并保留这些元素。这样你就会有一个单一的内存分配。
One alternative that hasn't been mentioned is to create a new
deque
, copy the elements that you want to keep into it, andswap
it with the olddeque
.I'm not sure if you have enough memory to create a copy, but it usually is faster and easier to make a copy instead of trying to inline erase elements from a large collection. If you still see memory thrashing, then figure out how many elements you are going to keep by calling
std::count_if
and reserve that many. This way you would have a single memory allocation.