使用数组并将重复项移动到末尾

发布于 2024-12-10 21:42:43 字数 357 浏览 5 评论 0原文

我在一次采访中得到了这个问题，最后被告知有一种更有效的方法可以做到这一点，但仍然无法弄清楚。您正在向函数传递一个整数数组和一个表示数组大小的整数。在数组中有很多数字，其中一些数字是重复的，例如 1,7,4,8,2,6,8,3,7,9,10。您想要获取该数组并返回一个数组，其中所有重复的数字都放在数组的末尾，因此上面的数组将变成 1,7,4,8,2,6,3,9,10 ,8,7。我使用的数字并不重要，而且我无法使用缓冲区数组。我打算使用 BST，但必须保持数字的顺序（重复的数字除外）。我不知道如何使用哈希表，所以我最终使用了双 for 循环（我知道 n^2 可怕）。我如何使用 C++ 更有效地做到这一点。不是在寻找代码，只是想知道如何做得更好。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

ゞ花落谁相伴 2024-12-17 21:42:43

其中：

arr是输入数组；
seen 是已经遇到的数字的哈希集；
l 是放置下一个唯一元素的索引；
r 是要考虑的下一个元素的索引。

由于您不是在寻找代码，因此这里有一个伪代码解决方案（恰好是有效的 Python）：

arr = [1,7,4,8,2,6,8,3,7,9,10]
seen = set()
l = 0
r = 0
while True:
  # advance `r` to the next not-yet-seen number
  while r < len(arr) and arr[r] in seen:
    r += 1
  if r == len(arr): break
  # add the number to the set
  seen.add(arr[r])
  # swap arr[l] with arr[r]
  arr[l], arr[r] = arr[r], arr[l]
  # advance `l`
  l += 1
print arr

在您的测试用例中，这会产生

[1, 7, 4, 8, 2, 6, 3, 9, 10, 8, 7]

In what follows:

arr is the input array;
seen is a hash set of numbers already encountered;
l is the index where the next unique element will be placed;
r is the index of the next element to be considered.

Since you're not looking for code, here is a pseudo-code solution (which happens to be valid Python):

arr = [1,7,4,8,2,6,8,3,7,9,10]
seen = set()
l = 0
r = 0
while True:
  # advance `r` to the next not-yet-seen number
  while r < len(arr) and arr[r] in seen:
    r += 1
  if r == len(arr): break
  # add the number to the set
  seen.add(arr[r])
  # swap arr[l] with arr[r]
  arr[l], arr[r] = arr[r], arr[l]
  # advance `l`
  l += 1
print arr

On your test case, this produces

[1, 7, 4, 8, 2, 6, 3, 9, 10, 8, 7]

回复收藏 0 原文

落花浅忆 2024-12-17 21:42:43

我将使用一个附加映射，其中键是数组中的整数值，该值是在开始时设置为 0 的整数。现在，如果键已在地图中，我将遍历数组并增加地图中的值。
最后我会再次遍历数组。当数组中的整数在映射中的值为 1 时，我不会更改任何内容。当它在映射中的值为 2 或更大时，我会将数组中的整数与最后一个整数交换。

这应该导致运行时间为 O(n*log(n))

回复收藏 0 原文

智商已欠费 2024-12-17 21:42:43

我这样做的方法是创建一个两倍于原始大小的数组并创建一组整数。

然后循环遍历原始数组，将每个元素添加到集合中，如果它已经存在，则将其添加到新数组的第二半，否则将其添加到新数组的前半部分。

最后你会得到一个看起来像这样的数组：（使用你的例子）

1,7,4,8,2,6,3,9,10,-,-,8,7,-,-,-,- ,-,-,-,-,-

然后我会再次循环原始数组，并使每个点等于下一个非空位置（或 0'd 或您决定的任何位置），

这将使原始数组变成您的解决方案...

这最终是 O(n) ，大约为我能想到的高效

Edit: since you can not use another array, when you find a value that is already in the
set you can move every value after it forward one and set the last value equal to the
number you just checked, this would in effect do the same thing but with a lot more operations.

The way I would do this would be to create an array twice the size of the original and create a set of integers.

Then Loop through the original array, add each element to the set, if it already exists add it to the 2nd half of the new array, else add it to the first half of the new array.

In the end you would get an array that looks like: (using your example)

1,7,4,8,2,6,3,9,10,-,-,8,7,-,-,-,-,-,-,-,-,-

Then I would loop through the original array again and make each spot equal to the next non-null position (or 0'd or whatever you decided)

That would make the original array turn into your solution...

This ends up being O(n) which is about as efficient as I can think of

Edit: since you can not use another array, when you find a value that is already in the
set you can move every value after it forward one and set the last value equal to the
number you just checked, this would in effect do the same thing but with a lot more operations.

回复收藏 0 原文

柏拉图鍀咏恒 2024-12-17 21:42:43

我已经失去联系有一段时间了，但我可能会从这样的事情开始，看看它如何随着更大的输入而扩展。我知道您没有要求代码，但在某些情况下它比解释更容易理解。

编辑：抱歉，我错过了不能使用缓冲区数组的要求。

// returns new vector with dupes a the end
std::vector<int> move_dupes_to_end(std::vector<int> input)
{
    std::set<int> counter;
    std::vector<int> result;
    std::vector<int> repeats;

    for (std::vector<int>::iterator i = input.begin(); i < input.end(); i++)
    {
        if (counter.find(*i) == counter.end())
            result.push_back(*i);
        else
            repeats.push_back(*i);
        counter.insert(*i);
    }

    result.insert(result.end(), repeats.begin(), repeats.end());

    return result;
}

I have been out of touch for a while, but I'd probably start out with something like this and see how it scales with larger input. I know you didn't ask for code but in some cases it's easier to understand than an explanation.

Edit: Sorry I missed the requirement that you cannot use a buffer array.

// returns new vector with dupes a the end
std::vector<int> move_dupes_to_end(std::vector<int> input)
{
    std::set<int> counter;
    std::vector<int> result;
    std::vector<int> repeats;

    for (std::vector<int>::iterator i = input.begin(); i < input.end(); i++)
    {
        if (counter.find(*i) == counter.end())
            result.push_back(*i);
        else
            repeats.push_back(*i);
        counter.insert(*i);
    }

    result.insert(result.end(), repeats.begin(), repeats.end());

    return result;
}

回复收藏 0 原文

哀由 2024-12-17 21:42:43

#include <algorithm>

T * array = [your array];
size_t size = [array size];
                                           // Complexity:
sort( array, array + size );               // n * log(n) and could be threaded
                                           // (if merge sort)
T * last = unique( array, array + size );  // n, but the elements after the last
                                           // unique element are not defined

检查排序和独特。

#include <algorithm>

T * array = [your array];
size_t size = [array size];
                                           // Complexity:
sort( array, array + size );               // n * log(n) and could be threaded
                                           // (if merge sort)
T * last = unique( array, array + size );  // n, but the elements after the last
                                           // unique element are not defined

Check sort and unique.

回复收藏 0 原文

没有你我更好 2024-12-17 21:42:43

void remove_dup(int* data, int count) {
    int* L=data; //place to put next unique number
    int* R=data+count; //place to place next repeat number
    std::unordered_set<int> found(count); //keep track of what's been seen
    for(int* cur=data; cur<R; ++cur) { //until we reach repeats
        if(found.insert(*cur).second == false) { //if we've seen it
            std::swap(*cur,*--R); //put at the beginning of the repeats
        } else                    //or else
            std::swap(*cur,*L++); //put it next in the unique list
    }
    std::reverse(R, data+count); //reverse the repeats to be in origional order
}

http://ideone.com/3choA
并不是说我会提交评论不佳的代码。另请注意，unordered_set 可能在内部使用它自己的数组，大于 data。（这已经根据aix的答案重写了，速度更快）

void remove_dup(int* data, int count) {
    int* L=data; //place to put next unique number
    int* R=data+count; //place to place next repeat number
    std::unordered_set<int> found(count); //keep track of what's been seen
    for(int* cur=data; cur<R; ++cur) { //until we reach repeats
        if(found.insert(*cur).second == false) { //if we've seen it
            std::swap(*cur,*--R); //put at the beginning of the repeats
        } else                    //or else
            std::swap(*cur,*L++); //put it next in the unique list
    }
    std::reverse(R, data+count); //reverse the repeats to be in origional order
}

http://ideone.com/3choA
Not that I would turn in code this poorly commented. Also note that unordered_set probably uses it's own array internally, bigger than data. (This has been rewritten based on aix's answer, to be much faster)

回复收藏 0 原文

诗化ㄋ丶相逢 2024-12-17 21:42:43

如果您知道整数值的界限 B 以及整数数组的大小 SZ，那么您可以执行如下操作：

创建一个数组包含 B 元素的布尔值 seen_before，初始化为 0。
使用 SZ 元素创建一个由整数组成的结果数组 result。
创建两个整数，一个用于 front_pos = 0，一个用于 back_pos = SZ - 1。
迭代原始列表：
- 将整型变量 val 设置为当前元素的值
- 如果 seen_before[val] 设置为 1，则将数字放入 result[back_pos]，然后递减 back_pos
- 如果 seen_before[val] 未设置为 1，则将数字放入 result[front_pos]，然后递增 front_pos 并设置 seen_before[val] 为 1。

完成对主列表的迭代后，所有唯一数字将位于列表的前面，而重复的数字将位于列表的前面。在后面。有趣的是，整个过程一次性完成。请注意，只有当您知道原始数组中出现的值的边界时，这才有效。

编辑：有人指出，所使用的整数没有限制，因此不要将 seen_before 初始化为包含 B 元素的数组，而是将其初始化作为 map，然后照常继续。这应该会给你带来 n*log(n) 的性能。

回复收藏 0 原文

司马昭之心 2024-12-17 21:42:43

这可以通过迭代数组来完成第一个变化的标记索引。
稍后将该标记索引值与下一个唯一值交换
&然后递增该标记索引以进行下一次交换

Java 实现：

public static void solve() {
                Integer[] arr = new Integer[] { 1, 7, 4, 8, 2, 6, 8, 3, 7, 9, 10 };
        final HashSet<Integer> seen = new HashSet<Integer>();
        int l = -1;

        for (int i = 0; i < arr.length; i++) {
            if (seen.contains(arr[i])) {
                if (l == -1) {
                    l = i;
                }
                continue;
            }
            if (l > -1) {
                final int temp = arr[i];
                arr[i] = arr[l];
                arr[l] = temp;
                l++;
            }
            seen.add(arr[i]);
        }

    }

输出为 1 7 4 8 2 6 3 9 10 8 7

This can be done by iterating the array & marking index of the first change.
later on swaping that mark index value with next unique value
& then incrementing that mark index for next swap

Java Implementation:

public static void solve() {
                Integer[] arr = new Integer[] { 1, 7, 4, 8, 2, 6, 8, 3, 7, 9, 10 };
        final HashSet<Integer> seen = new HashSet<Integer>();
        int l = -1;

        for (int i = 0; i < arr.length; i++) {
            if (seen.contains(arr[i])) {
                if (l == -1) {
                    l = i;
                }
                continue;
            }
            if (l > -1) {
                final int temp = arr[i];
                arr[i] = arr[l];
                arr[l] = temp;
                l++;
            }
            seen.add(arr[i]);
        }

    }

output is 1 7 4 8 2 6 3 9 10 8 7

回复收藏 0 原文

完美的未来在梦里 2024-12-17 21:42:43

虽然很难看，但是满足了将重复项原地移动到末尾的要求（无缓冲区数组）

// warning, some light C++11
void dup2end(int* arr, size_t cnt)
{
   std::set<int> k;
   auto end = arr + cnt-1;
   auto max = arr + cnt;
   auto curr = arr;

   while(curr < max)
   {
      auto res = k.insert(*curr);

      // first time encountered
      if(res.second)
      {
         ++curr;
      }
      else
      {
         // duplicate:
         std::swap(*curr, *end);
         --end;
         --max;
      }
   }
}

It's ugly, but it meets the requirements of moving the duplicates to the end in place (no buffer array)

// warning, some light C++11
void dup2end(int* arr, size_t cnt)
{
   std::set<int> k;
   auto end = arr + cnt-1;
   auto max = arr + cnt;
   auto curr = arr;

   while(curr < max)
   {
      auto res = k.insert(*curr);

      // first time encountered
      if(res.second)
      {
         ++curr;
      }
      else
      {
         // duplicate:
         std::swap(*curr, *end);
         --end;
         --max;
      }
   }
}

回复收藏 0 原文

青丝拂面 2024-12-17 21:42:43

void move_duplicates_to_end(vector<int> &A) {
    if(A.empty()) return;
    int i = 0, tail = A.size()-1;
    while(i <= tail) {
        bool is_first = true;    // check of current number is first-shown
        for(int k=0; k<i; k++) { // always compare with numbers before A[i]
            if(A[k] == A[i]) {
                is_first = false;
                break;
            }
        }
        if(is_first == true) i++;
        else {
            int tmp = A[i]; // swap with tail
            A[i] = A[tail];
            A[tail] = tmp;
            tail--;
        }
    }

如果输入数组为{1,7,4,8,2,6,8,3,7,9,10}，则输出为{1,7,4,8,2,6,10,3, 9,7,8}。与你的答案{1,7,4,8,2,6,3,9,10,8,7}相比，前半部分是相同的，而右半部分是不同的，因为我用尾部交换了所有重复项数组的。正如您所提到的，重复项的顺序可以是任意的。

void move_duplicates_to_end(vector<int> &A) {
    if(A.empty()) return;
    int i = 0, tail = A.size()-1;
    while(i <= tail) {
        bool is_first = true;    // check of current number is first-shown
        for(int k=0; k<i; k++) { // always compare with numbers before A[i]
            if(A[k] == A[i]) {
                is_first = false;
                break;
            }
        }
        if(is_first == true) i++;
        else {
            int tmp = A[i]; // swap with tail
            A[i] = A[tail];
            A[tail] = tmp;
            tail--;
        }
    }

If the input array is {1,7,4,8,2,6,8,3,7,9,10}, then the output is {1,7,4,8,2,6,10,3,9,7,8}. Comparing with your answer {1,7,4,8,2,6,3,9,10,8,7}, the first half is the same, while the right half is different, because I swap all duplicates with the tail of the array. As you mentioned, the order of the duplicates can be arbitrary.

回复收藏 0 原文

~没有更多了~