删除数组中的重复项,同时保留 C++ 中的顺序;

发布于 2024-09-16 15:34:48 字数 486 浏览 2 评论 0原文

可能的重复:
如何使向量的元素唯一? (删除不相邻的重复项)

是否有作为 STL 算法的一部分提供的标准算法,可以在保留顺序的同时从数组中删除重复项。例如,如果我有一个类似 int a[] = {2,1,3,1,4,2}; 的数组,在删除重复项后,它应该是 a[] = {2,1,3,4};。我无法使用 std::unique 因为数组未排序。其他解决方案,例如将其插入到 std::set 中,我会丢失顺序,因为元素将被排序。我可以使用其他算法组合还是必须编写自己的算法?

Possible Duplicate:
How to make elements of vector unique? (remove non adjacent duplicates)

Is there any standard algorithm which is provided as part of STL algorithms which can remove duplicates from an array while preserving the order. For example, if I have an array like int a[] = {2,1,3,1,4,2}; after the removal of duplicates it should be a[] = {2,1,3,4};. I can not use std::unique as the array is not sorted. Other solutions like inserting it into an std::set I lose the order as the elements will get sorted. Is there any other combination of algorithms I can use or do I have to code my own?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

情深如许 2024-09-23 15:34:48

对此没有标准算法,但它相当容易实现。原则是保留您迄今为止所看到的项目的 std::set ,并在复制到新向量或数组时跳过重复项。该操作需要 O(n lg n) 时间和 O(n) 内存。如果您使用的是 C++0x,则可以通过使用 std::unordered_set 作为已见项集将其降低到 O(n) 时间;这使用哈希表而不是二叉树,并且应该更快。

There is no standard algorithm for this, but it's fairly easy to implement. The principle is to keep a std::set of the items you've seen so far, and skip duplicates while copying to a new vector or array. This operates in O(n lg n) time and O(n) memory. If you're using C++0x, you can get it down to O(n) time by using std::unordered_set for the seen-items set; this uses a hash table instead of binary trees and should be faster.

心如狂蝶 2024-09-23 15:34:48

由于问题相对“复杂”,我不会尝试仅使用标准算法来强制解决方案(因为没有特殊的算法来解决您的问题。您可能可以使用remove_if,find和bind2nd或其他东西来破解一些东西)。
为了自己实现算法,您基本上有两种选择,通常是内存与速度的权衡。
第一个解决方案是迭代向量并搜索并删除当前项目的重复项。这是CPU密集型方法。
一种可能更快的方法是创建第二个向量(与第一个向量大小相同,以最大限度地减少内存重新分配)并将找到的项目存储在其中。然后,对于原始向量的每次迭代,仅需要搜索较短的第二向量来找出是否应删除当前项。
第一种方法适用于每个迭代器,而第二种方法仅限于随机访问迭代器。
以下是具体实现:

#include <iostream>
#include <algorithm>
#include <vector>

using namespace std;

template<typename T>
void remove_duplicates_ordered_mem_intensive(T &container)
{
   std::vector<typename T::value_type> items;
   items.reserve (container.size());

   typename T::iterator i = container.begin();
   while (i != container.end())
   {
      if (find (items.begin(), items.end(), *i) != items.end())
         i = container.erase(i);
      else
      {
         items.push_back(*i);
         ++i;
      }
   }
} 

template<typename T>
void remove_duplicates_ordered_slow(T &container)
{
   typename T::iterator i = container.begin();
   while (i != container.end())
   {
      typename T::iterator f = i;
      ++f;
      while (f != container.end())
      {
         if (*f == *i)
            f = container.erase(f);
         else
            ++f;
      }
      ++i;
   }
} 

int main ()
{
   vector<int> v;
   v.push_back (2);
   v.push_back (1);
   v.push_back (3);
   v.push_back (1);
   v.push_back (4);
   v.push_back (2); 

   cout << "Old:\n";
   for (vector<int>::const_iterator i = v.begin(); i != v.end(); ++i)
      cout << *i << endl;


   vector<int> a (v), b (v);
   remove_duplicates_ordered_mem_intensive (a);
   remove_duplicates_ordered_slow (b); 

   cout << "\nRemoved duplicates with intensive memory usage:\n";
   for (vector<int>::const_iterator i = a.begin(); i != a.end(); ++i)
      cout << *i << endl; 

   cout << "\nRemoved duplicates somewhat slower, without copying:\n";
   for (vector<int>::const_iterator i = b.begin(); i != b.end(); ++i)
      cout << *i << endl;
}

Since the problem is relatively "complex", I wouldn't try to force a solution by using standard algorithms only (since there is no special algorithm to solve your problem. You could probably hack something with remove_if, find and bind2nd or something).
For implementing the algorithm yourself, you have basically two options, with the usual memory vs. speed tradeoff.
The first solution would be to iterate the vector and search and remove duplicates for the current item. This is the cpu-intensive approach.
A maybe faster approach would be creating a second vector (of the same size as the first to minimize memory reallocations) and storing the found items in there. Then, for each iteration of the original vector, only the shorter second vector needs to be searched through to find out whether the current item should be deleted or not.
The first approach works with every iterator, while the second is limited to random access iterators.
Here are the implementations:

#include <iostream>
#include <algorithm>
#include <vector>

using namespace std;

template<typename T>
void remove_duplicates_ordered_mem_intensive(T &container)
{
   std::vector<typename T::value_type> items;
   items.reserve (container.size());

   typename T::iterator i = container.begin();
   while (i != container.end())
   {
      if (find (items.begin(), items.end(), *i) != items.end())
         i = container.erase(i);
      else
      {
         items.push_back(*i);
         ++i;
      }
   }
} 

template<typename T>
void remove_duplicates_ordered_slow(T &container)
{
   typename T::iterator i = container.begin();
   while (i != container.end())
   {
      typename T::iterator f = i;
      ++f;
      while (f != container.end())
      {
         if (*f == *i)
            f = container.erase(f);
         else
            ++f;
      }
      ++i;
   }
} 

int main ()
{
   vector<int> v;
   v.push_back (2);
   v.push_back (1);
   v.push_back (3);
   v.push_back (1);
   v.push_back (4);
   v.push_back (2); 

   cout << "Old:\n";
   for (vector<int>::const_iterator i = v.begin(); i != v.end(); ++i)
      cout << *i << endl;


   vector<int> a (v), b (v);
   remove_duplicates_ordered_mem_intensive (a);
   remove_duplicates_ordered_slow (b); 

   cout << "\nRemoved duplicates with intensive memory usage:\n";
   for (vector<int>::const_iterator i = a.begin(); i != a.end(); ++i)
      cout << *i << endl; 

   cout << "\nRemoved duplicates somewhat slower, without copying:\n";
   for (vector<int>::const_iterator i = b.begin(); i != b.end(); ++i)
      cout << *i << endl;
}
居里长安 2024-09-23 15:34:48

从数组中删除重复项

这在技术上是不可能的,因为数组无法更改大小。

remove duplicates from an array

This is technically impossible, because arrays cannot change size.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文