当前位置：文江博客话题详情

在保留原始顺序的同时擦除/删除多个 std::vector 元素的最有效方法？

发布于 2024-10-01 03:26:11 字数 1337 浏览 1 评论 0 原文

我有一个 std::vector 和第二个容器，其中保存此向量的迭代器或索引（没有键，我希望不断访问元素）以进行删除。假设我有一个包含 1000 个元素的向量，并且想要删除其中 200 个。删除操作后，未删除元素的顺序应与之前相同。

我在问题的第一个版本中错过了另一件事：值是唯一的。他们是身份。

您将如何以安全（关于 stl 规则）和有效的方式（矢量的决定是最终的）来做到这一点？

可能性或方法我想到了：

erase-remove idiom (http://en.wikipedia.org/wiki/Erase-remove_idiom)：最初用于删除满足条件的元素（包括线性搜索））但我认为对于大小为 1 的范围，此方法可以用于已经给定的迭代器和虚拟条件。 问题：元素的原始顺序是否保留，是否比上一个方法性能更高？
循环索引并使用vector.erase(vector.begin()删除元素+index+offset)，同时将索引删除在容器中以计算偏移量。可以使用已删除元素的容器中的 std::lower_bound 为每次删除迭代确定此偏移量。 问题：由于随机位置删除，需要进行大量的binary_searches获取偏移量和大量的移动操作。
目前我正在执行以下操作：获取元素的所有迭代器消除。根据向量中的位置对它们进行降序排序，并使用 vector.erase 循环遍历它们以进行最终删除。现在，我不会使任何迭代器无效，并且除了删除本身之外，没有向量重新排列操作。 问题：大量排序

那么，您将如何解决这个问题？有什么新想法吗？有什么建议吗？

感谢您的意见。

Sascha

编辑/更新/自己的结果：我实现了擦除删除惯用语，KennyTM 也提到了这一点，并使用基于提升中查找的谓词::dynamic_bitset 并且它速度非常快。此外，我尝试了 PigBen 的移动和截断方法（Steve Jessop 也提到过），该方法也在 while 循环中访问位集。对于我的数据来说，两者似乎都同样快。我尝试删除 1000 个元素（无符号整数）中的 100 个，这 100 个元素删除了 100 万次，没有显着差异。因为我认为基于 stl 的擦除删除习惯有点更“自然”，所以我选择这种方法（KennyTM 也提到了这个论点）。

原文

i have a std::vector<int> and a second container holding iterators or indexes (no keys, i want constant access to the element) to this vector for deletion purposes.
Let's assume i have a vector of 1000 elements and want to erase 200 of them. The order of the non-removed elements should be the same after the deletion operations like before.

One more thing i missed in the first version of my question: the values are unique. They are identities.

How would you do that in a safe (regarding the stl rules) and efficient manner (the decision for a vector shall be final)?

Possibilities or Methods i thought about:

the erase-remove idiom (http://en.wikipedia.org/wiki/Erase-remove_idiom): originally for the deletion of elements which fulfill a condition (including linear search) but i think with ranges of size 1 this method could be used to with already given iterators and a dummy condition. Question: is the original order of elements kept and is it more performant than the last method?
loop over the indexes and erase the elements with the use of vector.erase(vector.begin()+index+offset) while keeping the indexes removed in a container for calculating the offset. This offset could be determined for every remove iteration with the use of a std::lower_bound n the container of already removed elements. The problem: A lot of binary_searches for getting the offset and a lot of move operations because of random-location-deletion.
At the moment I'm doing the following: get all the iterators for the elements to remove. Sort them in descending order according to the location in the vector and loop over them for the final deletion with vector.erase. Now I'm not invalidating any iterator and there are no vector rearrange-operations except for the deletion itself. The problem: a lot of sorting

So, how would you tackle this? Any new ideas? Any recommendations?

Thanks for your input.

Sascha

Edit / Update / Own results: I implemented the erase-remove idiom, which was also mentioned by KennyTM, with a predicate based on the lookup in a boost::dynamic_bitset and it's insanely fast. Furthermore i tried PigBen's move-and-truncate method (also mentioned by Steve Jessop) which is also accessing the bitset in it's while-loop. Both seem to be equally fast with my kind of data. I tried to delete 100 of 1000 Elements (unsigned ints), did this 100 deletes 1M times and there was no significant difference. Because i think the stl-based erase-remove idiom is kinda more "natural, i'm choosing this method (argument was also mentioned by KennyTM).

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

书间行客 2024-10-08 03:26:12

在中有一个remove_if function 将所有未删除的值挤压到前面以保持顺序。如果这 200 个元素可以完全由值而不是索引确定，则此方法有效。

这本质上是您链接到的擦除删除习惯用法。 remove_if 保证执行 O(N) 比较（最多 O(N) 复制），这比排序 (O(N log N)) 更有效，尽管你的最后一个选项不会如果索引是根据值确定的，则实际上需要排序（只需在复制时沿相反方向扫描）。

尽管如此，使用 remove_if （如果可以的话）比其他 2 个选项更好，因为已经为您编写了实现，因此出现逻辑错误的可能性较小，并且可以更好地传达内容（不是如何）去做。

回复收藏 0 原文

骑趴 2024-10-08 03:26:12

如何循环遍历向量，对于每个需要删除的元素，将下一个不需要删除的元素复制到该位置。然后，当到达末尾时，将其截断。

int last = 0;
for(int i=0; i<vec.size(); ++i, ++last)
{
   while(needs_to_be_removed(i))
      ++i;
   if(i >= vec.size()) break;

   vec[last] = vec[i];   
}

vec.resize(last);

How about looping through the vector, and for each element that needs to be removed, copy the next element that doesn't need to be removed in to that position. Then when you get to the end, truncate it.

int last = 0;
for(int i=0; i<vec.size(); ++i, ++last)
{
   while(needs_to_be_removed(i))
      ++i;
   if(i >= vec.size()) break;

   vec[last] = vec[i];   
}

vec.resize(last);

回复收藏 0 原文

挽梦忆笙歌 2024-10-08 03:26:12

首先，不要调用 erase 过多的次数，因为对于向量来说，它会将所有后面的元素向下洗牌，从而使整个操作的最坏情况运行时间为 Ω(n*m) （n 向量的大小，m 要删除的索引列表的大小）。

我认为我要尝试的第一件事与您当前的代码类似：

对索引进行排序，
创建一个大小为 n - m 的新向量
，迭代原始向量，复制 indexes[0] 元素，跳过一个元素，然后复制 indexes[1] -indexes[0] - 1 元素，跳过一个元素，依此类推。
将原始向量与新向量交换。

您也许可以使用 remove_copy_if 和一个包含状态的谓词（计算已复制的项目数以及在排序的索引列表中的距离）来执行第三步，但是< /em> 由于极其繁琐和模糊的原因，这不能保证有效（具有可变状态的算法谓词是有问题的，标准不保证相同的副本似乎是共识谓词在整个算法中使用）。所以我真的不建议尝试它，但记住你所写的基本上是 remove_copy_if 的修改版本可能会有所帮助。

您可以使用 back_inserter 来避免第二步，而不是预先调整向量的大小，尽管您可能仍会提前保留空间。

[编辑：想想看，我为什么要复制任何东西？不要实现修改后的 remove_copy_if，而是实现修改后的 remove_if，然后复制到向量中较早的点。然后在最后擦除/调整大小。在被证明是一个问题之前，我不会担心 O(m log m) 索引排序，因为读取所有值不太可能比 Ω(m) 操作慢得多被移除，并将它们存放在某种容器中。然后，在 remove_if 谓词中使用此容器可能会或可能不会O(1)。对于 m 的合理值，排序可能会更快。]

回复收藏 0 原文

夏九 2024-10-08 03:26:12

您可以将向量的所有元素复制到列表中，除非第二个容器中的索引，然后再复制回向量。即使您的算法是从矢量的末尾到前面，矢量的幕后仍然有很多工作要做。

将您的第二个容器设为地图，以便它自动为您排序索引。

编辑：

回应评论

维护地图的成本在最坏的情况下与维护另一个结构（列表或向量）然后对其进行排序相同。如果您已经这样做了，不妨将其保留为地图。抱怨映射的开销与排序列表的开销是没有意义的。

至于我建议的算法的性能，如果 m 是要删除的元素数量，n 是元素总数，则结果为 O(n - m)。

当然，这主要只是为了迎合您尝试使用矢量进行优化的行为。

1 - 如果您想进行随机访问删除，则不应使用向量。这不是他们所擅长的，如果可能的话，使用列表。由于您似乎对相对顺序而不是绝对索引更感兴趣，我想知道为什么需要向量。如果您给出了整个问题，可能有一个通用的解决方案可以让您使用最有效的数据结构来解决它。

2 - 不维护第二个数据结构，而是直接在其容器中标记需要删除的元素。一个简单的方法是使用container< T＞使用容器< std::对< T，字符> >并使用 char 来跟踪元素状态。

如果执行 1 和 2，则可以完全删除所有复制并获得更高效的实现。

回复收藏 0 原文

彼岸花似海 2024-10-08 03:26:12

什么要素？也许我很认真地对待你的帖子，但如果你有一个包含 1000 个元素的向量，为什么不标记那些不再有效的元素并首先取消删除呢。显然，我在这里假设您的元素不需要大量内存。

我之所以提出这个只是因为你似乎关心速度。如果已经给出的建议不起作用，也许这个想法值得考虑！从本质上讲，通过不首先执行操作来加快速度。

回复收藏 0 原文

爱*していゐ 2024-10-08 03:26:12

如果您有一组（例如无序的）想要删除的索引，您可以使用这个：

template <typename Type>
void erase_indices(
        const std::unordered_set<size_t>& indices_to_erase,
        std::vector<Type>& vec) {
    std::vector<bool> erase_index(vec.size(), false);
    for (const size_t i: indices_to_erase) {
        erase_index[i] = true;
    }
    std::vector<bool>::const_iterator it_to_erase = erase_index.cbegin();
    typename std::vector<Type>::iterator it_erase_from = std::remove_if(
        vec.begin(), vec.end(),
        [&it_to_erase](const Type&) -> bool {
          return *it_to_erase++ == true;
        }
    );
    vec.erase(it_erase_from, vec.end());
}

这是我想到的最快的解决方案。不过，您需要C++11。删除索引 2 和 5 处元素的用法示例：

constexpr size_t num = 10u;
std::vector<int> vec(num);
std::iota(vec.begin(), vec.end(), 0);

std::unordered_set<size_t> indices_to_erase;
indices_to_erase.insert(2u);
indices_to_erase.insert(5u);

erase_indices(indices_to_erase, vec);

之前：

0 1 2 3 4 5 6 7 8 9

之后：

0 1 3 4 6 7 8 9

编辑：
如果想要在保存要擦除的索引的容器类型方面更加灵活：

template <typename Type, typename Container>
void erase_indices(
        const Container& indices_to_erase,
        std::vector<Type>& vec) {
    typedef typename Container::value_type IndexType;
    static_assert(std::is_same<IndexType, std::size_t>::value,
        "Indices to be erased have to be of type std::size_t");
    std::vector<bool> erase_index(vec.size(), false);
    for (const IndexType idx_erase: indices_to_erase) {
        erase_index[idx_erase] = true;
    }
    std::vector<bool>::const_iterator it_to_erase = erase_index.cbegin();
    typename std::vector<Type>::iterator it_erase_from = std::remove_if(
        vec.begin(), vec.end(),
        [&it_to_erase](const Type&) -> bool {
          return *it_to_erase++ == true;
        }
    );
    vec.erase(it_erase_from, vec.end());
}

现在您可以使用容器库提供要删除的索引，只要该容器的value_type是std::size_t。用法保持不变。

If you have a (e.g. unordered) set of indices that you want to erase, you can use this:

template <typename Type>
void erase_indices(
        const std::unordered_set<size_t>& indices_to_erase,
        std::vector<Type>& vec) {
    std::vector<bool> erase_index(vec.size(), false);
    for (const size_t i: indices_to_erase) {
        erase_index[i] = true;
    }
    std::vector<bool>::const_iterator it_to_erase = erase_index.cbegin();
    typename std::vector<Type>::iterator it_erase_from = std::remove_if(
        vec.begin(), vec.end(),
        [&it_to_erase](const Type&) -> bool {
          return *it_to_erase++ == true;
        }
    );
    vec.erase(it_erase_from, vec.end());
}

It is the fastest solution that came to my mind. You need C++11, though. Usage example to erase elements at index 2 and 5:

constexpr size_t num = 10u;
std::vector<int> vec(num);
std::iota(vec.begin(), vec.end(), 0);

std::unordered_set<size_t> indices_to_erase;
indices_to_erase.insert(2u);
indices_to_erase.insert(5u);

erase_indices(indices_to_erase, vec);

Before:

0 1 2 3 4 5 6 7 8 9

After:

0 1 3 4 6 7 8 9

Edit:
If want to be more flexible regarding type of container that hold the indices to erase:

template <typename Type, typename Container>
void erase_indices(
        const Container& indices_to_erase,
        std::vector<Type>& vec) {
    typedef typename Container::value_type IndexType;
    static_assert(std::is_same<IndexType, std::size_t>::value,
        "Indices to be erased have to be of type std::size_t");
    std::vector<bool> erase_index(vec.size(), false);
    for (const IndexType idx_erase: indices_to_erase) {
        erase_index[idx_erase] = true;
    }
    std::vector<bool>::const_iterator it_to_erase = erase_index.cbegin();
    typename std::vector<Type>::iterator it_erase_from = std::remove_if(
        vec.begin(), vec.end(),
        [&it_to_erase](const Type&) -> bool {
          return *it_to_erase++ == true;
        }
    );
    vec.erase(it_erase_from, vec.end());
}

Now you can use any kind of container from the Containers Library to provide the indices to be erased as long as the value_type of that container is std::size_t. Usage remains the same.

回复收藏 0 原文

爱冒险 2024-10-08 03:26:12

我根据Benjamin Lindley的回答https://stackoverflow.com/a/4115582/2835054编写了一个函数。

#include <iostream>
#include <algorithm>
#include <vector>

template <typename elementType, typename indexType>
void remove_multiple_elements_from_vector(std::vector<elementType> &vector,
std::vector<indexType> &indexes)
{
    // 1. indexType is any integer.
    // 2. elementType is any type.
    // 3. Indexes should be unique.
    // 4. The largest index inside indexes shouldn't be larger than
    //    the largetst index in the vector.
    // 5. Indexes should be sorted in ascending order
    //    (it is done inside function).
    std::sort(indexes.begin(), indexes.end());
    indexType currentIndexInIndexesVector = 0;
    indexType last = 0;
    for(indexType i=0; i<vector.size(); ++i, ++last)
    {
       while(indexes[currentIndexInIndexesVector] == i)
       {
          ++i;
          ++currentIndexInIndexesVector;
       }
       if(i >= vector.size()) break;

       vector[last] = vector[i];   
    }

    vector.resize(last);
}


int main()
{
    std::vector<int> vector = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
    std::vector<int> indexes = {0, 10, 5};

    for (auto &vectorElement : vector)
    {
        std::cout << vectorElement << " ";
    }    
    std::cout << "\n";

    remove_multiple_elements_from_vector<int, int>(vector, indexes);

    for (auto &vectorElement : vector)
    {
        std::cout << vectorElement << " ";
    }
}

I've written a function, based on Benjamin Lindley answer https://stackoverflow.com/a/4115582/2835054.

#include <iostream>
#include <algorithm>
#include <vector>

template <typename elementType, typename indexType>
void remove_multiple_elements_from_vector(std::vector<elementType> &vector,
std::vector<indexType> &indexes)
{
    // 1. indexType is any integer.
    // 2. elementType is any type.
    // 3. Indexes should be unique.
    // 4. The largest index inside indexes shouldn't be larger than
    //    the largetst index in the vector.
    // 5. Indexes should be sorted in ascending order
    //    (it is done inside function).
    std::sort(indexes.begin(), indexes.end());
    indexType currentIndexInIndexesVector = 0;
    indexType last = 0;
    for(indexType i=0; i<vector.size(); ++i, ++last)
    {
       while(indexes[currentIndexInIndexesVector] == i)
       {
          ++i;
          ++currentIndexInIndexesVector;
       }
       if(i >= vector.size()) break;

       vector[last] = vector[i];   
    }

    vector.resize(last);
}


int main()
{
    std::vector<int> vector = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
    std::vector<int> indexes = {0, 10, 5};

    for (auto &vectorElement : vector)
    {
        std::cout << vectorElement << " ";
    }    
    std::cout << "\n";

    remove_multiple_elements_from_vector<int, int>(vector, indexes);

    for (auto &vectorElement : vector)
    {
        std::cout << vectorElement << " ";
    }
}

回复收藏 0 原文

~没有更多了~

关于作者

伊面

暂无简介

0 文章

0 评论

21 人气

关注发私信

友情链接

文江博客

在保留原始顺序的同时擦除/删除多个 std::vector 元素的最有效方法？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（7）

关于作者

相关话题

热门标签

推荐作者

qq_FjTq5B

18273202778

WordPress小学生

〃温暖了心ぐ

迷乱花海

niuniu

友情链接

在保留原始顺序的同时擦除/删除多个 std::vector 元素的最有效方法？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（7）

关于作者

相关话题

热门标签

推荐作者

qq_FjTq5B

18273202778

WordPress小学生

〃温暖了心ぐ

迷乱花海

niuniu

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。