删除数组中的重复项，同时保留 C++ 中的顺序；

发布于 2024-09-16 15:34:48 字数 486 浏览 2 评论 0原文

可能的重复：
如何使向量的元素唯一？（删除不相邻的重复项）

是否有作为 STL 算法的一部分提供的标准算法，可以在保留顺序的同时从数组中删除重复项。例如，如果我有一个类似 int a[] = {2,1,3,1,4,2}; 的数组，在删除重复项后，它应该是 a[] = {2,1,3,4}；。我无法使用 std::unique 因为数组未排序。其他解决方案，例如将其插入到 std::set 中，我会丢失顺序，因为元素将被排序。我可以使用其他算法组合还是必须编写自己的算法？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

情深如许 2024-09-23 15:34:48

对此没有标准算法，但它相当容易实现。原则是保留您迄今为止所看到的项目的 std::set ，并在复制到新向量或数组时跳过重复项。该操作需要 O(n lg n) 时间和 O(n) 内存。如果您使用的是 C++0x，则可以通过使用 std::unordered_set 作为已见项集将其降低到 O(n) 时间；这使用哈希表而不是二叉树，并且应该更快。

回复收藏 0 原文

心如狂蝶 2024-09-23 15:34:48

由于问题相对“复杂”，我不会尝试仅使用标准算法来强制解决方案（因为没有特殊的算法来解决您的问题。您可能可以使用remove_if，find和bind2nd或其他东西来破解一些东西）。
为了自己实现算法，您基本上有两种选择，通常是内存与速度的权衡。
第一个解决方案是迭代向量并搜索并删除当前项目的重复项。这是CPU密集型方法。
一种可能更快的方法是创建第二个向量（与第一个向量大小相同，以最大限度地减少内存重新分配）并将找到的项目存储在其中。然后，对于原始向量的每次迭代，仅需要搜索较短的第二向量来找出是否应删除当前项。
第一种方法适用于每个迭代器，而第二种方法仅限于随机访问迭代器。
以下是具体实现：

#include <iostream>
#include <algorithm>
#include <vector>

using namespace std;

template<typename T>
void remove_duplicates_ordered_mem_intensive(T &container)
{
   std::vector<typename T::value_type> items;
   items.reserve (container.size());

   typename T::iterator i = container.begin();
   while (i != container.end())
   {
      if (find (items.begin(), items.end(), *i) != items.end())
         i = container.erase(i);
      else
      {
         items.push_back(*i);
         ++i;
      }
   }
} 

template<typename T>
void remove_duplicates_ordered_slow(T &container)
{
   typename T::iterator i = container.begin();
   while (i != container.end())
   {
      typename T::iterator f = i;
      ++f;
      while (f != container.end())
      {
         if (*f == *i)
            f = container.erase(f);
         else
            ++f;
      }
      ++i;
   }
} 

int main ()
{
   vector<int> v;
   v.push_back (2);
   v.push_back (1);
   v.push_back (3);
   v.push_back (1);
   v.push_back (4);
   v.push_back (2); 

   cout << "Old:\n";
   for (vector<int>::const_iterator i = v.begin(); i != v.end(); ++i)
      cout << *i << endl;


   vector<int> a (v), b (v);
   remove_duplicates_ordered_mem_intensive (a);
   remove_duplicates_ordered_slow (b); 

   cout << "\nRemoved duplicates with intensive memory usage:\n";
   for (vector<int>::const_iterator i = a.begin(); i != a.end(); ++i)
      cout << *i << endl; 

   cout << "\nRemoved duplicates somewhat slower, without copying:\n";
   for (vector<int>::const_iterator i = b.begin(); i != b.end(); ++i)
      cout << *i << endl;
}

Since the problem is relatively "complex", I wouldn't try to force a solution by using standard algorithms only (since there is no special algorithm to solve your problem. You could probably hack something with remove_if, find and bind2nd or something).
For implementing the algorithm yourself, you have basically two options, with the usual memory vs. speed tradeoff.
The first solution would be to iterate the vector and search and remove duplicates for the current item. This is the cpu-intensive approach.
A maybe faster approach would be creating a second vector (of the same size as the first to minimize memory reallocations) and storing the found items in there. Then, for each iteration of the original vector, only the shorter second vector needs to be searched through to find out whether the current item should be deleted or not.
The first approach works with every iterator, while the second is limited to random access iterators.
Here are the implementations:

#include <iostream>
#include <algorithm>
#include <vector>

using namespace std;

template<typename T>
void remove_duplicates_ordered_mem_intensive(T &container)
{
   std::vector<typename T::value_type> items;
   items.reserve (container.size());

   typename T::iterator i = container.begin();
   while (i != container.end())
   {
      if (find (items.begin(), items.end(), *i) != items.end())
         i = container.erase(i);
      else
      {
         items.push_back(*i);
         ++i;
      }
   }
} 

template<typename T>
void remove_duplicates_ordered_slow(T &container)
{
   typename T::iterator i = container.begin();
   while (i != container.end())
   {
      typename T::iterator f = i;
      ++f;
      while (f != container.end())
      {
         if (*f == *i)
            f = container.erase(f);
         else
            ++f;
      }
      ++i;
   }
} 

int main ()
{
   vector<int> v;
   v.push_back (2);
   v.push_back (1);
   v.push_back (3);
   v.push_back (1);
   v.push_back (4);
   v.push_back (2); 

   cout << "Old:\n";
   for (vector<int>::const_iterator i = v.begin(); i != v.end(); ++i)
      cout << *i << endl;


   vector<int> a (v), b (v);
   remove_duplicates_ordered_mem_intensive (a);
   remove_duplicates_ordered_slow (b); 

   cout << "\nRemoved duplicates with intensive memory usage:\n";
   for (vector<int>::const_iterator i = a.begin(); i != a.end(); ++i)
      cout << *i << endl; 

   cout << "\nRemoved duplicates somewhat slower, without copying:\n";
   for (vector<int>::const_iterator i = b.begin(); i != b.end(); ++i)
      cout << *i << endl;
}

回复收藏 0 原文