在保留原始顺序的同时擦除/删除多个 std::vector 元素的最有效方法?

发布于 2024-10-01 03:26:11 字数 1337 浏览 1 评论 0 原文


我有一个 std::vector 和第二个容器,其中保存此向量的迭代器或索引(没有键,我希望不断访问元素)以进行删除。 假设我有一个包含 1000 个元素的向量,并且想要删除其中 200 个。删除操作后,未删除元素的顺序应与之前相同。

我在问题的第一个版本中错过了另一件事:值是唯一的。他们是身份。

您将如何以安全(关于 stl 规则)和有效的方式(矢量的决定是最终的)来做到这一点?

可能性方法 我想到了:

  • erase-remove idiom (http://en.wikipedia.org/wiki/Erase-remove_idiom):最初用于删除满足条件的元素(包括线性搜索) )但我认为对于大小为 1 的范围,此方法可以用于已经给定的迭代器和虚拟条件。 问题:元素的原始顺序是否保留,是否比上一个方法性能更高?
  • 循环索引并使用vector.erase(vector.begin()删除元素+index+offset),同时将索引删除在容器中以计算偏移量。可以使用已删除元素的容器中的 std::lower_bound 为每次删除迭代确定此偏移量。 问题:由于随机位置删除,需要进行大量的binary_searches获取偏移量和大量的移动操作。
  • 目前我正在执行以下操作:获取元素的所有迭代器消除。根据向量中的位置对它们进行降序排序,并使用 vector.erase 循环遍历它们以进行最终删除。现在,我不会使任何迭代器无效,并且除了删除本身之外,没有向量重新排列操作。 问题:大量排序

那么,您将如何解决这个问题?有什么新想法吗?有什么建议吗?

感谢您的意见。

Sascha

编辑/更新/自己的结果:我实现了擦除删除惯用语,KennyTM 也提到了这一点,并使用基于提升中查找的谓词::dynamic_bitset 并且它速度非常快。此外,我尝试了 PigBen 的移动和截断方法(Steve Jessop 也提到过),该方法也在 while 循环中访问位集。对于我的数据来说,两者似乎都同样快。我尝试删除 1000 个元素(无符号整数)中的 100 个,这 100 个元素删除了 100 万次,没有显着差异。因为我认为基于 stl 的擦除删除习惯有点更“自然”,所以我选择这种方法(KennyTM 也提到了这个论点)。

i have a std::vector<int> and a second container holding iterators or indexes (no keys, i want constant access to the element) to this vector for deletion purposes.
Let's assume i have a vector of 1000 elements and want to erase 200 of them. The order of the non-removed elements should be the same after the deletion operations like before.

One more thing i missed in the first version of my question: the values are unique. They are identities.

How would you do that in a safe (regarding the stl rules) and efficient manner (the decision for a vector shall be final)?

Possibilities or Methods i thought about:

  • the erase-remove idiom (http://en.wikipedia.org/wiki/Erase-remove_idiom): originally for the deletion of elements which fulfill a condition (including linear search) but i think with ranges of size 1 this method could be used to with already given iterators and a dummy condition. Question: is the original order of elements kept and is it more performant than the last method?
  • loop over the indexes and erase the elements with the use of vector.erase(vector.begin()+index+offset) while keeping the indexes removed in a container for calculating the offset. This offset could be determined for every remove iteration with the use of a std::lower_bound n the container of already removed elements. The problem: A lot of binary_searches for getting the offset and a lot of move operations because of random-location-deletion.
  • At the moment I'm doing the following: get all the iterators for the elements to remove. Sort them in descending order according to the location in the vector and loop over them for the final deletion with vector.erase. Now I'm not invalidating any iterator and there are no vector rearrange-operations except for the deletion itself. The problem: a lot of sorting

So, how would you tackle this? Any new ideas? Any recommendations?

Thanks for your input.

Sascha

Edit / Update / Own results: I implemented the erase-remove idiom, which was also mentioned by KennyTM, with a predicate based on the lookup in a boost::dynamic_bitset and it's insanely fast. Furthermore i tried PigBen's move-and-truncate method (also mentioned by Steve Jessop) which is also accessing the bitset in it's while-loop. Both seem to be equally fast with my kind of data. I tried to delete 100 of 1000 Elements (unsigned ints), did this 100 deletes 1M times and there was no significant difference. Because i think the stl-based erase-remove idiom is kinda more "natural, i'm choosing this method (argument was also mentioned by KennyTM).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

书间行客 2024-10-08 03:26:12

中有一个remove_if function 将所有未删除的值挤压到前面以保持顺序。如果这 200 个元素可以完全由值而不是索引确定,则此方法有效。

这本质上是您链接到的擦除删除习惯用法。 remove_if 保证执行 O(N) 比较(最多 O(N) 复制),这比排序 (O(N log N)) 更有效,尽管你的最后一个选项不会如果索引是根据值确定的,则实际上需要排序(只需在复制时沿相反方向扫描)。

尽管如此,使用 remove_if (如果可以的话)比其他 2 个选项更好,因为已经为您编写了实现,因此出现逻辑错误的可能性较小,并且可以更好地传达内容 (不是如何)去做。

In <algorithm> there is a remove_if function which squeezes all values not removed to the front maintaining the order. This works if those 200 elements can be purely determined by the values, not index.

This is essentially the Erase-remove idiom you have linked to. remove_if is guaranteed to perform O(N) comparisons (and at most O(N) copyings), which would be more efficient than sorting (O(N log N)), although your last option doesn't actually require sorting if the indices are determined from values (just scan in the reversed direction while copying).

Nevertheless, using remove_if (if you can) is better than the other 2 options because the implementation has already been written for you, so there's less chance of logical error and conveys better what (not how) to do.

骑趴 2024-10-08 03:26:12

如何循环遍历向量,对于每个需要删除的元素,将下一个不需要删除的元素复制到该位置。然后,当到达末尾时,将其截断。

int last = 0;
for(int i=0; i<vec.size(); ++i, ++last)
{
   while(needs_to_be_removed(i))
      ++i;
   if(i >= vec.size()) break;

   vec[last] = vec[i];   
}

vec.resize(last);

How about looping through the vector, and for each element that needs to be removed, copy the next element that doesn't need to be removed in to that position. Then when you get to the end, truncate it.

int last = 0;
for(int i=0; i<vec.size(); ++i, ++last)
{
   while(needs_to_be_removed(i))
      ++i;
   if(i >= vec.size()) break;

   vec[last] = vec[i];   
}

vec.resize(last);
挽梦忆笙歌 2024-10-08 03:26:12

首先,不要调用 erase 过多的次数,因为对于向量来说,它会将所有后面的元素向下洗牌,从而使整个操作的最坏情况运行时间为 Ω(n*m) (n 向量的大小,m 要删除的索引列表的大小)。

我认为我要尝试的第一件事与您当前的代码类似:

  • 对索引进行排序,
  • 创建一个大小为 n - m 的新向量
  • ,迭代原始向量,复制 indexes[0] 元素,跳过一个元素,然后复制 indexes[1] -indexes[0] - 1 元素,跳过一个元素,依此类推。
  • 将原始向量与新向量交换

您也许可以使用 remove_copy_if 和一个包含状态的谓词(计算已复制的项目数以及在排序的索引列表中的距离)来执行第三步,但是< /em> 由于极其繁琐和模糊的原因,这不能保证有效(具有可变状态的算法谓词是有问题的,标准不保证相同的副本似乎是共识谓词在整个算法中使用)。所以我真的不建议尝试它,但记住你所写的基本上是 remove_copy_if 的修改版本可能会有所帮助。

您可以使用 back_inserter 来避免第二步,而不是预先调整向量的大小,尽管您可能仍会提前保留空间。

[编辑:想想看,我为什么要复制任何东西?不要实现修改后的 remove_copy_if,而是实现修改后的 remove_if,然后复制到向量中较早的点。然后在最后擦除/调整大小。在被证明是一个问题之前,我不会担心 O(m log m) 索引排序,因为读取所有值不太可能比 Ω(m) 操作慢得多被移除,并将它们存放在某种容器中。然后,在 remove_if 谓词中使用此容器可能会或可能不会O(1)。对于 m 的合理值,排序可能会更快。]

First thing is, don't call erase more times than you have to, because for a vector it shuffles all the later elements down, giving the whole operation an Ω(n*m) worst case run time (n the size of the vector, m the size of the list of indexes to remove).

I think the first thing I'd try would be similar to your current code:

  • sort the indexes
  • create a new vector of size n - m
  • iterate over the original vector, copying indexes[0] elements, skipping an element, then copying indexes[1] - indexes[0] - 1 elements, skip an element, and so on.
  • swap the original vector with the new one.

You might be able to do the third step with remove_copy_if and a predicate which contains state (counting how many items it has copied and how far it is through the sorted list of indexes), but for extremely tedious and obscure reasons this isn't guaranteed to work (algorithm predicates with mutable state are problematic, it seems to be the consensus that the standard doesn't guarantee that the same copy of the predicate is used throughout the algorithm). So I really don't advise trying it, but it might help to bear in mind that what you're writing basically is a modified version of remove_copy_if.

You could avoid the second step using a back_inserter rather than presizing the vector, although you'd presumably still reserve the space in advance.

[Edit: come to think of it, why am I copying anything? Rather than implementing a modified remove_copy_if, implement a modified remove_if, and just copy to an earlier point in the vector. Then erase/resize at the end. I wouldn't worry about the O(m log m) sort of the indexes until proven to be a problem, because it's unlikely to be significantly slower than the Ω(m) operation to read all the values to be removed, and store them in some kind of container. Then, using this container in the predicate to remove_if may or may not be O(1). Sorting might turn out faster for plausible values of m.]

夏九 2024-10-08 03:26:12

您可以将向量的所有元素复制到列表中,除非第二个容器中的索引,然后再复制回向量。即使您的算法是从矢量的末尾到前面,矢量的幕后仍然有很多工作要做。

将您的第二个容器设为地图,以便它自动为您排序索引。

编辑:

回应评论

维护地图的成本在最坏的情况下与维护另一个结构(列表或向量)然后对其进行排序相同。如果您已经这样做了,不妨将其保留为地图。抱怨映射的开销与排序列表的开销是没有意义的。

至于我建议的算法的性能,如果 m 是要删除的元素数量,n 是元素总数,则结果为 O(n - m)。

当然,这主要只是为了迎合您尝试使用矢量进行优化的行为。

1 - 如果您想进行随机访问删除,则不应使用向量。这不是他们所擅长的,如果可能的话,使用列表。由于您似乎对相对顺序而不是绝对索引更感兴趣,我想知道为什么需要向量。如果您给出了整个问题,可能有一个通用的解决方案可以让您使用最有效的数据结构来解决它。

2 - 不维护第二个数据结构,而是直接在其容器中标记需要删除的元素。一个简单的方法是使用container< T>使用容器< std::对< T,字符> >并使用 char 来跟踪元素状态。

如果执行 1 和 2,则可以完全删除所有复制并获得更高效的实现。

You can copy all elements of the vector to a list unless the index in your second container, and then back to a vector. Even with your algorithm of going from the end of the vector to the front, there's a lot of work going on behind the scenes in your vector.

Make your second container a map so it keeps the indeces sorted for you automatically.

edit:

To respond to the comment

The cost of maintaining a map is worst case the same as maintaining another structure (list or vector) and then sorting it. If you're already doing that, you might as well keep it as a map. It doesn't make sense to complain about the overhead of a map vs. the overhead of sorting a list.

As for the performance of my suggested algorithm, if m is the number of elements to be deleted, and n is the total number of elements then it results in O(n - m).

Of course, this is mostly just humoring your attempt to optimize with a vector.

1 - You shouldn't be using a vector if you want to do random access deletes. That's not what they're good at, use a list if at all possible. And since you seem to be much more interested in relative order rather than absolute index, I am wondering why a vector is needed at all. If you gave the entire problem, there's probably a common solution to let you use the most efficient data structure to solve it.

2 - Instead of maintaining a second data structure, mark elements that need to be deleted directly in their container. A trivial way is instead using a container< T > use a container< std::pair< T, char > > and use the char to keep track of the element status.

If you do 1 and 2, you remove all copying completely and get a much more efficient implementation.

彼岸花似海 2024-10-08 03:26:12

什么要素?也许我很认真地对待你的帖子,但如果你有一个包含 1000 个元素的向量,为什么不标记那些不再有效的元素并首先取消删除呢。显然,我在这里假设您的元素不需要大量内存。

我之所以提出这个只是因为你似乎关心速度。如果已经给出的建议不起作用,也许这个想法值得考虑!从本质上讲,通过不首先执行操作来加快速度。

Elements of what? Maybe I'm taking your post to seriously but if you have a vector of 1000 elements why not mark the ones that are not valid anymore and do away with erasing in the first place. Obviously I'm making an assumption here that your elements are not demanding a lot of memory.

I only bring this up because you seem to be concerned with speed. If the suggestions already given don't do the trick maybe this idea is worth a thought! In essence speed things up by not doing the operation in the first place.

爱*していゐ 2024-10-08 03:26:12

如果您有一组(例如无序的)想要删除的索引,您可以使用这个:

template <typename Type>
void erase_indices(
        const std::unordered_set<size_t>& indices_to_erase,
        std::vector<Type>& vec) {
    std::vector<bool> erase_index(vec.size(), false);
    for (const size_t i: indices_to_erase) {
        erase_index[i] = true;
    }
    std::vector<bool>::const_iterator it_to_erase = erase_index.cbegin();
    typename std::vector<Type>::iterator it_erase_from = std::remove_if(
        vec.begin(), vec.end(),
        [&it_to_erase](const Type&) -> bool {
          return *it_to_erase++ == true;
        }
    );
    vec.erase(it_erase_from, vec.end());
}

这是我想到的最快的解决方案。不过,您需要C++11。删除索引 2 和 5 处元素的用法示例:

constexpr size_t num = 10u;
std::vector<int> vec(num);
std::iota(vec.begin(), vec.end(), 0);

std::unordered_set<size_t> indices_to_erase;
indices_to_erase.insert(2u);
indices_to_erase.insert(5u);

erase_indices(indices_to_erase, vec);

之前:

0 1 2 3 4 5 6 7 8 9

之后:

0 1 3 4 6 7 8 9

编辑:
如果想要在保存要擦除的索引的容器类型方面更加灵活:

template <typename Type, typename Container>
void erase_indices(
        const Container& indices_to_erase,
        std::vector<Type>& vec) {
    typedef typename Container::value_type IndexType;
    static_assert(std::is_same<IndexType, std::size_t>::value,
        "Indices to be erased have to be of type std::size_t");
    std::vector<bool> erase_index(vec.size(), false);
    for (const IndexType idx_erase: indices_to_erase) {
        erase_index[idx_erase] = true;
    }
    std::vector<bool>::const_iterator it_to_erase = erase_index.cbegin();
    typename std::vector<Type>::iterator it_erase_from = std::remove_if(
        vec.begin(), vec.end(),
        [&it_to_erase](const Type&) -> bool {
          return *it_to_erase++ == true;
        }
    );
    vec.erase(it_erase_from, vec.end());
}

现在您可以使用 容器库提供要删除的索引,只要该容器的value_typestd::size_t。用法保持不变。

If you have a (e.g. unordered) set of indices that you want to erase, you can use this:

template <typename Type>
void erase_indices(
        const std::unordered_set<size_t>& indices_to_erase,
        std::vector<Type>& vec) {
    std::vector<bool> erase_index(vec.size(), false);
    for (const size_t i: indices_to_erase) {
        erase_index[i] = true;
    }
    std::vector<bool>::const_iterator it_to_erase = erase_index.cbegin();
    typename std::vector<Type>::iterator it_erase_from = std::remove_if(
        vec.begin(), vec.end(),
        [&it_to_erase](const Type&) -> bool {
          return *it_to_erase++ == true;
        }
    );
    vec.erase(it_erase_from, vec.end());
}

It is the fastest solution that came to my mind. You need C++11, though. Usage example to erase elements at index 2 and 5:

constexpr size_t num = 10u;
std::vector<int> vec(num);
std::iota(vec.begin(), vec.end(), 0);

std::unordered_set<size_t> indices_to_erase;
indices_to_erase.insert(2u);
indices_to_erase.insert(5u);

erase_indices(indices_to_erase, vec);

Before:

0 1 2 3 4 5 6 7 8 9

After:

0 1 3 4 6 7 8 9

Edit:
If want to be more flexible regarding type of container that hold the indices to erase:

template <typename Type, typename Container>
void erase_indices(
        const Container& indices_to_erase,
        std::vector<Type>& vec) {
    typedef typename Container::value_type IndexType;
    static_assert(std::is_same<IndexType, std::size_t>::value,
        "Indices to be erased have to be of type std::size_t");
    std::vector<bool> erase_index(vec.size(), false);
    for (const IndexType idx_erase: indices_to_erase) {
        erase_index[idx_erase] = true;
    }
    std::vector<bool>::const_iterator it_to_erase = erase_index.cbegin();
    typename std::vector<Type>::iterator it_erase_from = std::remove_if(
        vec.begin(), vec.end(),
        [&it_to_erase](const Type&) -> bool {
          return *it_to_erase++ == true;
        }
    );
    vec.erase(it_erase_from, vec.end());
}

Now you can use any kind of container from the Containers Library to provide the indices to be erased as long as the value_type of that container is std::size_t. Usage remains the same.

爱冒险 2024-10-08 03:26:12

我根据Benjamin Lindley的回答https://stackoverflow.com/a/4115582/2835054编写了一个函数。

#include <iostream>
#include <algorithm>
#include <vector>

template <typename elementType, typename indexType>
void remove_multiple_elements_from_vector(std::vector<elementType> &vector,
std::vector<indexType> &indexes)
{
    // 1. indexType is any integer.
    // 2. elementType is any type.
    // 3. Indexes should be unique.
    // 4. The largest index inside indexes shouldn't be larger than
    //    the largetst index in the vector.
    // 5. Indexes should be sorted in ascending order
    //    (it is done inside function).
    std::sort(indexes.begin(), indexes.end());
    indexType currentIndexInIndexesVector = 0;
    indexType last = 0;
    for(indexType i=0; i<vector.size(); ++i, ++last)
    {
       while(indexes[currentIndexInIndexesVector] == i)
       {
          ++i;
          ++currentIndexInIndexesVector;
       }
       if(i >= vector.size()) break;

       vector[last] = vector[i];   
    }

    vector.resize(last);
}


int main()
{
    std::vector<int> vector = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
    std::vector<int> indexes = {0, 10, 5};

    for (auto &vectorElement : vector)
    {
        std::cout << vectorElement << " ";
    }    
    std::cout << "\n";

    remove_multiple_elements_from_vector<int, int>(vector, indexes);

    for (auto &vectorElement : vector)
    {
        std::cout << vectorElement << " ";
    }
}

I've written a function, based on Benjamin Lindley answer https://stackoverflow.com/a/4115582/2835054.

#include <iostream>
#include <algorithm>
#include <vector>

template <typename elementType, typename indexType>
void remove_multiple_elements_from_vector(std::vector<elementType> &vector,
std::vector<indexType> &indexes)
{
    // 1. indexType is any integer.
    // 2. elementType is any type.
    // 3. Indexes should be unique.
    // 4. The largest index inside indexes shouldn't be larger than
    //    the largetst index in the vector.
    // 5. Indexes should be sorted in ascending order
    //    (it is done inside function).
    std::sort(indexes.begin(), indexes.end());
    indexType currentIndexInIndexesVector = 0;
    indexType last = 0;
    for(indexType i=0; i<vector.size(); ++i, ++last)
    {
       while(indexes[currentIndexInIndexesVector] == i)
       {
          ++i;
          ++currentIndexInIndexesVector;
       }
       if(i >= vector.size()) break;

       vector[last] = vector[i];   
    }

    vector.resize(last);
}


int main()
{
    std::vector<int> vector = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
    std::vector<int> indexes = {0, 10, 5};

    for (auto &vectorElement : vector)
    {
        std::cout << vectorElement << " ";
    }    
    std::cout << "\n";

    remove_multiple_elements_from_vector<int, int>(vector, indexes);

    for (auto &vectorElement : vector)
    {
        std::cout << vectorElement << " ";
    }
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文