调用擦除时 STL 迭代器失效的问题

发布于 2024-10-05 18:31:40 字数 962 浏览 12 评论 0原文

STL 标准定义,当 std::deque、std::list 等容器上发生擦除时,迭代器将失效。

我的问题如下,假设 std::deque 中包含整数列表,以及一对指示 std::deque 中元素范围的索引,删除所有偶数元素的正确方法是什么?

到目前为止,我有以下内容,但是这里的问题是假定的结束在擦除后无效:

#include <cstddef>
#include <deque>

int main()
{
   std::deque<int> deq;
   for (int i = 0; i < 100; deq.push_back(i++));

   // range, 11th to 51st element
   std::pair<std::size_t,std::size_t> r(10,50);

   std::deque<int>::iterator it = deq.begin() + r.first;
   std::deque<int>::iterator end = deq.begin() + r.second;

   while (it != end)
   {
      if (*it % 2 == 0)
      {
         it = deq.erase(it);
      }
      else
        ++it;
   }

   return 0;
}

检查 std::remove_if 的实现方式,似乎正在进行一个非常昂贵的复制/下移过程。

  • 是否有一种更有效的方法可以在不进行所有复制/移位的情况下实现上述目标

  • 通常是删除/擦除元素比将其与序列中的下一个第 n 个值交换更昂贵(其中 n 是迄今为止删除/删除的元素数量)

注意:答案应该假设序列大小非常大,+1mil 元素,并且平均 1/3 的元素会增加用于擦除。

The STL standard defines that when an erase occurs on containers such as std::deque, std::list etc iterators are invalidated.

My question is as follows, assuming the list of integers contained in a std::deque, and a pair of indicies indicating a range of elements in the std::deque, what is the correct way to delete all even elements?

So far I have the following, however the problem here is that the assumed end is invalidated after an erase:

#include <cstddef>
#include <deque>

int main()
{
   std::deque<int> deq;
   for (int i = 0; i < 100; deq.push_back(i++));

   // range, 11th to 51st element
   std::pair<std::size_t,std::size_t> r(10,50);

   std::deque<int>::iterator it = deq.begin() + r.first;
   std::deque<int>::iterator end = deq.begin() + r.second;

   while (it != end)
   {
      if (*it % 2 == 0)
      {
         it = deq.erase(it);
      }
      else
        ++it;
   }

   return 0;
}

Examining how std::remove_if is implemented, it seems there is a very costly copy/shift down process going on.

  • Is there a more efficient way of achieving the above without all the copy/shifts

  • In general is deleting/erasing an element more expensive than swapping it with the next nth value in the sequence (where n is the number of elements deleted/removed so far)

Note: Answers should assume the sequence size is quite large, +1mil elements and that on average 1/3 of elements would be up for erasure.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

慕巷 2024-10-12 18:31:40

我会使用擦除删除惯用语。我认为链接的维基百科文章甚至显示了您正在做的事情——删除奇怪的元素。

remove_if 所做的复制并不比从容器中间删除元素时发生的复制成本更高。它甚至可能更有效率。

I'd use the Erase-Remove Idiom. I think the Wikipedia article linked even shows what you're doing -- removing odd elements.

The copying that remove_if does is no more costly than what happens when you delete elements from the middle of the container. It might even be more efficient.

撩人痒 2024-10-12 18:31:40

调用.erase()也会导致“一个非常昂贵的复制/下移过程正在进行”。当您从容器中间擦除一个元素时,该点之后的所有其他元素都必须向下移动一个位置到可用空间中。如果删除多个元素,则每个删除的元素都会产生相应的成本。一些未擦除的元素会移动几个点,但被迫一次移动一个点,而不是一次全部移动。这是非常低效的。

标准库算法 std::removestd::remove_if 优化了这项工作。他们使用巧妙的技巧来确保每个移动的元素仅移动一次。这比你自己做的事情得多,与你的直觉相反。

伪代码如下:

read_location <- beginning of range.
write_location <- beginning of range.
while read_location != end of range:
    if the element at read_location should be kept in the container:
        copy the element at the read_location to the write_location.
        increment the write_location.
    increment the read_location.

如您所见,原始序列中的每个元素都被仅考虑一次,如果需要保留,则将其复制一次,即到当前的 write_location。它永远不会被再次查看,因为 write_location 永远不能运行在 read_location 之前。

Calling .erase() also results in "a very costly copy/shift down process going on.". When you erase an element from the middle of the container, every other element after that point must be shifted down one spot into the available space. If you erase multiple elements, you incur that cost for every erased element. Some of the non-erased elements will move several spots, but are forced to move one spot at a time instead of all at once. That is very inefficient.

The standard library algorithms std::remove and std::remove_if optimize this work. They use a clever trick to ensure that every moved element is only moved once. This is much, much faster than what you are doing yourself, contrary to your intuition.

The pseudocode is like this:

read_location <- beginning of range.
write_location <- beginning of range.
while read_location != end of range:
    if the element at read_location should be kept in the container:
        copy the element at the read_location to the write_location.
        increment the write_location.
    increment the read_location.

As you can see, every element in the original sequence is considered exactly once, and if it needs to be kept, it gets copied exactly once, to the current write_location. It will never be looked at again, because the write_location can never run in front of the read_location.

裸钻 2024-10-12 18:31:40

请记住,双端队列是一个连续的内存容器(如向量,并且可能共享实现),因此删除容器中间的元素必然意味着将后续元素复制到空洞上。您只想确保进行一次迭代并将每个不可删除的对象直接复制到其最终位置,而不是在每次删除期间将所有对象一一移动。在这方面,remove_if 是高效且适当的,而您的 erase 循环则不然:它会进行大量不必要的复制。

FWIW - 替代方案:

  • 向对象添加“已删除”状态并将其标记为就地删除,但是每次对容器进行操作时,您都需要检查自己
  • 是否使用了一个列表,该列表是使用指向上一个和下一个元素的指针实现的,这样删除列表元素会改变相邻点以绕过该元素:无复制、高效迭代、只是无随机访问、更小(即低效)堆分配和指针开销

选择什么取决于性质、相对频率和特定操作的性能要求(例如,如果在非关键时间完成,您可能可以承受缓慢的删除,但需要尽可能最快的迭代 - 无论是什么,请确保您了解您的需求以及各种操作的影响数据结构)。

Remember that deque is a contiguous memory container (like vector, and probably sharing implementation), so removing elements mid-container necessarily means copying subsequent elements over the hole. You just want to make sure you're doing one iteration and copying each not-to-be-deleted object directly to its final position, rather than moving all objects one by one during each delete. remove_if is efficient and appropriate in this regard, your erase loop is not: it does massive amounts of unnecessary copying.

FWIW - alternatives:

  • add a "deleted" state to your objects and mark them deleted in place, but then every time you operate on the container you'll need to check yourself
  • use a list, which is implemented using pointers to previous and next elements, such that removing a list element alters the adjacent points to bypass that element: no copying, efficient iteration, just no random access, more small (i.e. inefficient) heap allocations and pointer overheads

What to choose depends on the nature, relative frequency, and performance requirements of specific operations (e.g. it may be that you can afford slow removals if they're done at non-critical times, but need fastest-possible iteration - whatever it is, make sure you understand your needs and the implications of the various data structures).

夏末的微笑 2024-10-12 18:31:40

尚未提及的一种替代方法是创建一个新的 deque,将要保留的元素复制到其中,然后将其与旧的 deque 进行交换

void filter(std::deque<int>& in, std::pair<std::size_t,std::size_t> range) {
    std::deque<int> out;
    std::deque<int>::const_iterator first = in.begin();
    std::deque<int>::const_iterator curr = first + range.first;
    std::deque<int>::const_iterator last = first + range.second;
    out.reserve(in.size() - (range.second-range.first));
    std::copy(first, curr, std::back_inserter(out));
    while (curr != last) {
        if (*curr & 1) {
            out.push_back(*curr);
        }
        ++curr;
    }
    std::copy(last, in.end(), std::back_inserter(out));
    in.swap(out);
}

我不确定您是否有足够的内存来创建副本,但制作副本通常比尝试从大型集合中内联删除元素更快更容易。如果您仍然看到内存抖动,请通过调用 std::count_if 计算出要保留多少元素并保留这些元素。这样你就会有一个单一的内存分配。

One alternative that hasn't been mentioned is to create a new deque, copy the elements that you want to keep into it, and swap it with the old deque.

void filter(std::deque<int>& in, std::pair<std::size_t,std::size_t> range) {
    std::deque<int> out;
    std::deque<int>::const_iterator first = in.begin();
    std::deque<int>::const_iterator curr = first + range.first;
    std::deque<int>::const_iterator last = first + range.second;
    out.reserve(in.size() - (range.second-range.first));
    std::copy(first, curr, std::back_inserter(out));
    while (curr != last) {
        if (*curr & 1) {
            out.push_back(*curr);
        }
        ++curr;
    }
    std::copy(last, in.end(), std::back_inserter(out));
    in.swap(out);
}

I'm not sure if you have enough memory to create a copy, but it usually is faster and easier to make a copy instead of trying to inline erase elements from a large collection. If you still see memory thrashing, then figure out how many elements you are going to keep by calling std::count_if and reserve that many. This way you would have a single memory allocation.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文