在将重复项移动到末尾的同时对数组进行排序？

发布于 2024-12-01 22:11:25 字数 955 浏览 0 评论 0原文

这是我朋友的编程课上的一个问题。

问如何对 int 数组进行排序，然后排列它们以使所有重复元素出现在数组末尾？

例如，给定输入输出

{5, 2, 7, 6, 1, 1, 5, 6, 2}

将是

{1, 2, 5, 6, 7, 1, 2, 5, 6}

请注意，数字已排序并且重复的数字在 7 之后，这是数组中的最大值。

这必须不使用任何 Java 库包/utils 来实现。

我建议首先使用插入或冒泡排序对数组进行排序，然后遍历数组，执行类似以下操作：

for (int i = 0; i < nums.length - 2; i++) {
    for (int j = i + 1; j < nums.length; j++) {
        //current and next are same, move elements up
        //and place the next number at the end.
        if (nums[i] == nums[j]) {
            int temp = nums[j];
            for (int k = j; k < nums.length - 1; k++) {
                nums[k] = nums[k + 1];
            }
            nums[nums.length - 1] = temp;
            break;
        }
    }
}

我稍后自己尝试了这个（这就是上面的代码）-当我尝试这个时，我认为这个可以通过使用更少的代码来实现，效率更高。也许我给出了错误的建议。

有什么想法吗？

原文

This was a question in one my friend's programming class.

Q. How do you sort an array of ints and then arrange them such that all duplicate elements appear at the end of the array?

For example, given the input

{5, 2, 7, 6, 1, 1, 5, 6, 2}

The output would be

{1, 2, 5, 6, 7, 1, 2, 5, 6}

Note that the numbers are sorted and duplicate numbers are after 7, which is the maximum in the array.

This has to be achieved with out using any Java library packages/utils.

I suggested to sort the array first using insertion or bubble sort, and then go over the array, perform something like the following :

for (int i = 0; i < nums.length - 2; i++) {
    for (int j = i + 1; j < nums.length; j++) {
        //current and next are same, move elements up
        //and place the next number at the end.
        if (nums[i] == nums[j]) {
            int temp = nums[j];
            for (int k = j; k < nums.length - 1; k++) {
                nums[k] = nums[k + 1];
            }
            nums[nums.length - 1] = temp;
            break;
        }
    }
}

I tried this myself later (and that is how the code above) - As I try this out, I think this could be achieved by using less code, be more efficiently. And may be I gave a wrong advice.

Any thoughts?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

随心而道 2024-12-08 22:11:25

根据问题的参数，有多种方法可以解决这个问题。

如果不允许使用 O(n) 外部存储器，那么一种选择是使用标准排序算法在 O(n log n) 时间内对数组进行就地排序，然后对它运行第二遍以将重复项移动到末尾（正如您所建议的那样）。您上面发布的代码需要 O(n²) 时间，但我认为使用稍微复杂的算法可以在 O(n log n) 时间内完成此步骤。这个想法分两步进行。第一步，在 O(n log n) 时间内将所有非重复元素按排序顺序放在前面，并将所有重复元素按非排序顺序放在后面。完成此操作后，您可以使用第一步中的排序算法在 O(n log n) 时间内对数组的后半部分进行排序。

我不会进入代码来对数组进行排序。我真的很喜欢排序，但是有很多其他关于如何就地对数组进行排序的好资源，所以我不能很好地利用我在这里的时间/空间来研究它们。如果有帮助，这里是堆排序，quicksort 和 smoothsort，所有这些都运行在 O(n log n) 时间内。堆排序和平滑排序仅使用 O(1) 外部存储器，而快速排序在最坏的情况下可以使用 O(n)（尽管好的实现可以使用可爱的技巧将其限制为 O(log n)）。

有趣的代码是将所有非重复元素带到范围前面的逻辑。直观地说，该代码通过存储两个指针来工作——一个读指针和一个写指针。读指针指向下一个要读取的元素，而写指针指向下一个唯一元素应放置的位置。例如，给定这个数组：

1 1 1 1 2 2 3 4 5 5

我们从最初指向 1 的读指针和写指针开始：

write  v
       1 1 1 1 2 2 3 4 5 5
read   ^

接下来，我们将读指针向前跳过到下一个不是 1 的元素。这会找到 2：

write  v
       1 1 1 1 2 2 3 4 5 5
read           ^

然后，我们将写指针指向下一个位置：

write    v
       1 1 1 1 2 2 3 4 5 5
read           ^

现在，我们将 2 交换到写指针所保存的位置：

write    v
       1 2 1 1 1 2 3 4 5 5
read           ^

将读指针前进到下一个不是 2 的值：

write    v
       1 2 1 1 1 2 3 4 5 5
read               ^

然后前进写指针：

write      v
       1 2 1 1 1 2 3 4 5 5
read               ^

再次，我们交换 'read 指向的值' 和 '写' 并移动向前写指针，然后将读指针移动到下一个唯一值：

write        v
       1 2 3 1 1 2 1 4 5 5
read                 ^

再次产生

write          v
       1 2 3 4 1 2 1 1 5 5
read                   ^

并且最终迭代给出

write            v
       1 2 3 4 5 2 1 1 1 5
read                      ^

如果我们现在从写指针排序到读指针，我们得到

write            v
       1 2 3 4 5 1 1 1 2 5
read                      ^

宾果！我们已经找到了我们正在寻找的答案。

在（未经测试的，抱歉...）Java 代码中，此修复步骤可能如下所示：

int read = 0;
int write = 0;

while (read < array.length) {
     /* Swap the values pointed at by read and write. */
     int temp = array[write];
     array[write] = array[read];
     array[read] = temp;

     /* Advance the read pointer forward to the next unique value.  Since we
      * moved the unique value to the write location, we compare values
      * against array[write] instead of array[read].
      */
     while (read < array.length && array[write] == array[read])
         ++ read;

     /* Advance the write pointer. */
     ++ write;
}

该算法运行时间为 O(n)，这导致该问题的整体算法为 O(n log n)。由于重新排序步骤使用 O(1) 内存，因此总体内存使用量将为 O(1)（对于平滑排序或堆排序等）或 O(log n)（对于快速排序等）。

编辑：在与朋友讨论后，我认为基于快速排序的修改，有一个更优雅的解决方案。通常，当您运行快速排序时，最终会将数组划分为三个区域：

 +----------------+----------------+----------------+
 | values < pivot | values = pivot | values > pivot |
 +----------------+----------------+----------------+

然后递归对第一个和最后一个区域进行排序，以将它们按排序顺序排列。但是，我们可以针对我们的问题版本进行修改。我们需要旋转算法作为原语，该算法获取数组中两个相邻的值块并在 O(n) 时间内交换它们。它不会更改这些块中元素的相对顺序。例如，我们可以使用旋转将数组转换

1 2 3 4 5 6 7 8

为

3 4 5 6 7 8 1 2

O(n) 时间。

快速排序的修改版本可以使用 Bentley-McIlroy 三路分区算法（描述为此处）使用 O(1) 额外空间，将数组元素重新排列为如上所示的配置。接下来，我们应用旋转来重新排序元素，使它们看起来像这样：

 +----------------+----------------+----------------+
 | values < pivot | values > pivot | values = pivot |
 +----------------+----------------+----------------+

接下来，我们执行交换，以便将主元元素的一个副本移动到至少与主元一样大的元素集中。这可能有额外的枢轴副本。然后我们递归地将排序算法应用于 <<和>范围。当我们这样做时，生成的数组将如下所示：

 +---------+-------------+---------+-------------+---------+
 | < pivot | dup < pivot | > pivot | dup > pivot | = pivot |
 +---------+-------------+---------+-------------+---------+

然后我们对范围应用两次旋转以将其放入最终顺序。首先，将小于主元的重复值与大于主元的值进行旋转。此时

 +---------+---------+-------------+-------------+---------+
 | < pivot | > pivot | dup < pivot | dup > pivot | = pivot |
 +---------+---------+-------------+-------------+---------+

，第一个范围是按升序排列的唯一元素：

 +---------------------+-------------+-------------+---------+
 | sorted unique elems | dup < pivot | dup > pivot | = pivot |
 +---------------------+-------------+-------------+---------+

最后，对大于枢轴的重复元素和等于枢轴的元素进行最后一次旋转，以产生以下结果：

 +---------------------+-------------+---------+-------------+
 | sorted unique elems | dup < pivot | = pivot | dup > pivot |
 +---------------------+-------------+---------+-------------+

请注意，最后三个块只是排序后的重复值：

 +---------------------+-------------------------------------+
 | sorted unique elems |      sorted duplicate elements      |
 +---------------------+-------------------------------------+

瞧！我们已经按照我们想要的顺序得到了一切。使用与普通快速排序相同的分析，加上我们只在每个级别执行 O(n) 工作（三个额外旋转），在最好的情况下，这会达到 O(n log n)内存使用量为 O(log n)。在 O(log n) 内存的最坏情况下，它仍然是 O(n²)，但发生这种情况的概率极低。

如果允许使用 O(n) 内存，一种选择是从存储键/值对的所有元素中构建平衡二叉搜索树，其中每个键都是数组和值是它出现的次数。然后，您可以按如下格式对数组进行排序：

对于数组中的每个元素：
- 如果该元素已存在于 BST 中，则增加其计数。
- 否则，向 BST 添加一个新节点，其中该元素的计数为 1。
对 BST 进行中序遍历。当遇到一个节点时，输出它的key。
对 BST 进行第二次有序遍历。当遇到一个节点时，如果它的count大于1，则输出该节点的n-1个副本，其中n是它出现的次数。

该算法的运行时间为 O(n log n)，但从头开始编写 BST 是相当棘手的。它还需要外部空间，我不确定你是否可以这样做。

但是，如果允许外部空间并且要排序的数组很小并且包含小整数，则可以使用修改后的计数排序。只需将 BST 替换为一个足够大的数组，使原始数组中的每个整数都可以作为键。这将运行时间减少到 O(n + k)，内存使用量为 O(k)，其中 k 是数组中最大的元素。

希望这有帮助！

Depending on the parameters of your problem, there are many approaches to solving this.

If you are not allowed to use O(n) external memory, then one option would be to use a standard sorting algorithm to sort the array in-place in O(n log n) time, then to run a second pass over it to move the duplicates to the end (as you've suggested). The code you posted above takes O(n²) time, but I think that this step can be done in O(n log n) time using a slightly more complicated algorithm. The idea works in two steps. In the first step, in O(n log n) time you bring all non-duplicated elements to the front in sorted order and bring all the duplicates to the back in non-sorted order. Once you've done that, you then sort the back half of the array in O(n log n) time using the sorting algorithm from the first step.

I'm not going to go into the code to sort the array. I really love sorting, but there are so many other good resources on how to sort arrays in-place that it's not a good use of my time/space here to go into them. If it helps, here's links to Java implementations of heapsort, quicksort, and smoothsort, all of which runs in O(n log n) time. Heapsort and smoothsort use only O(1) external memory, while quicksort can use O(n) in the worst case (though good implementations can limit this to O(log n) using cute tricks).

The interesting code is the logic to bring all the non-duplicated elements to the front of the range. Intuitively, the code works by storing two pointers - a read pointer and a write pointer. The read pointer points to the next element to read, while the write pointer points to the location where the next unique element should be placed. For example, given this array:

1 1 1 1 2 2 3 4 5 5

We start with the read and write pointers initially pointing at 1:

write  v
       1 1 1 1 2 2 3 4 5 5
read   ^

Next, we skip the read pointer ahead to the next element that isn't 1. This finds 2:

write  v
       1 1 1 1 2 2 3 4 5 5
read           ^

Then, we bump the write pointer to the next location:

write    v
       1 1 1 1 2 2 3 4 5 5
read           ^

Now, we swap the 2 into the spot held by the write pointer:

write    v
       1 2 1 1 1 2 3 4 5 5
read           ^

advance the read pointer to the next value that isn't 2:

write    v
       1 2 1 1 1 2 3 4 5 5
read               ^

then advance the write pointer:

write      v
       1 2 1 1 1 2 3 4 5 5
read               ^

Again, we exchange the values pointed at by 'read' and 'write' and move the write pointer forward, then move the read pointer to the next unique value:

write        v
       1 2 3 1 1 2 1 4 5 5
read                 ^

Once more yields

write          v
       1 2 3 4 1 2 1 1 5 5
read                   ^

and the final iteration gives

write            v
       1 2 3 4 5 2 1 1 1 5
read                      ^

If we now sort from the write pointer to the read pointer, we get

write            v
       1 2 3 4 5 1 1 1 2 5
read                      ^

and bingo! We've got the answer we're looking for.

In (untested, sorry...) Java code, this fixup step might look like this:

int read = 0;
int write = 0;

while (read < array.length) {
     /* Swap the values pointed at by read and write. */
     int temp = array[write];
     array[write] = array[read];
     array[read] = temp;

     /* Advance the read pointer forward to the next unique value.  Since we
      * moved the unique value to the write location, we compare values
      * against array[write] instead of array[read].
      */
     while (read < array.length && array[write] == array[read])
         ++ read;

     /* Advance the write pointer. */
     ++ write;
}

This algorithm runs in O(n) time, which leads to an overall O(n log n) algorithm for the problem. Since the reordering step uses O(1) memory, the overall memory usage would be either O(1) (for something like smoothsort or heapsort) or O(log n) (for something like quicksort).

EDIT: After talking this over with a friend, I think that there is a much more elegant solution to the problem based on a modification of quicksort. Typically, when you run quicksort, you end up partitioning the array into three regions:

 +----------------+----------------+----------------+
 | values < pivot | values = pivot | values > pivot |
 +----------------+----------------+----------------+

The recursion then sorts the first and last regions to put them into sorted order. However, we can modify this for our version of the problem. We'll need as a primitive the rotation algorithm, which takes two adjacent blocks of values in an array and exchanges them in O(n) time. It does not change the relative order of the elements in those blocks. For example, we could use rotation to convert the array

1 2 3 4 5 6 7 8

into

3 4 5 6 7 8 1 2

and can do so in O(n) time.

The modified version of quicksort would work by using the Bentley-McIlroy three-way partition algortihm (described here) to, using O(1) extra space, rearrange the array elements into the configuration shown above. Next, we apply a rotation to reorder the elements so that they look like this:

 +----------------+----------------+----------------+
 | values < pivot | values > pivot | values = pivot |
 +----------------+----------------+----------------+

Next, we perform a swap so that we move exactly one copy of the pivot element into the set of elements at least as large as the pivot. This may have extra copies of the pivot behind. We then recursively apply the sorting algorithm to the < and > ranges. When we do this, the resulting array will look like this:

 +---------+-------------+---------+-------------+---------+
 | < pivot | dup < pivot | > pivot | dup > pivot | = pivot |
 +---------+-------------+---------+-------------+---------+

We then apply two rotations to the range to put it into the final order. First, rotate the duplicate values less than the pivot with the values greater than the pivot. This gives

 +---------+---------+-------------+-------------+---------+
 | < pivot | > pivot | dup < pivot | dup > pivot | = pivot |
 +---------+---------+-------------+-------------+---------+

At this point, this first range is the unique elements in ascending order:

 +---------------------+-------------+-------------+---------+
 | sorted unique elems | dup < pivot | dup > pivot | = pivot |
 +---------------------+-------------+-------------+---------+

Finally, do one last rotation of the duplicate elements greater than the pivot and the elements equal to the pivot to yield this:

 +---------------------+-------------+---------+-------------+
 | sorted unique elems | dup < pivot | = pivot | dup > pivot |
 +---------------------+-------------+---------+-------------+

Notice that these last three blocks are just the sorted duplicate values:

 +---------------------+-------------------------------------+
 | sorted unique elems |      sorted duplicate elements      |
 +---------------------+-------------------------------------+

and voila! We've got everything in the order we want. Using the same analysis that you'd do for normal quicksort, plus the fact that we're only doing O(n) work at each level (three extra rotations), this works out to O(n log n) in the best case with O(log n) memory usage. It's still O(n²) in the worst case with O(log n) memory, but that happens with extremely low probability.

If you are allowed to use O(n) memory, one option would be to build a balanced binary search tree out of all of the elements that stores key/value pairs, where each key is an element of the array and the value is the number of times it appears. You could then sort the array in your format as follows:

For each element in the array:
- If that element already exists in the BST, increment its count.
- Otherwise, add a new node to the BST with that element having count 1.
Do an inorder walk of the BST. When encountering a node, output its key.
Do a second inorder walk of the BST. When encountering a node, if it has count greater than one, output n - 1 copies of that node, where n is the number of times it appears.

The runtime of this algorithm is O(n log n), but it would be pretty tricky to code up a BST from scratch. It also requires external space, which I'm not sure you're allowed to do.

However, if you are allowed external space and the arrays you are sorting are small and contain small integers, you could modify the above approach by using a modified counting sort. Just replace the BST with an array large enough for each integer in the original array to be a key. This reduces the runtime to O(n + k), with memory usage O(k), where k is the largest element in the array.

Hope this helps!

回复收藏 0 原文