合并排序对三个输入数组进行排序
合并算法通过重复比较两个输入数组的最小元素,并将两者中较小的一个移动到输出,将两个已排序的输入数组合并为一个已排序的输出数组。
现在我们需要将三个相同长度的已排序输入数组(A1、A2和A3)合并为一个(已排序)输出数组,有两种方法:
使用上面的Merge算法将A1和A2合并为A4 ,然后使用相同的算法将A4和A3合并到输出数组中。
修改上述合并算法,通过重复比较三个输入数组的最小元素,并将三个输入数组中最小的一个移动到输出数组。
修改
如果仅考虑数组元素移动(即赋值)的最坏情况,以上两种算法中哪一种更有效?
如果仅考虑数组元素比较的最坏情况,以上两种算法哪种更有效?
在这两种算法中,哪一种在最坏情况下具有更高的整体效率?
A Merge algorithm merges two sorted input arrays into a sorted output array, by repeatedly comparing the smallest elements of the two input arrays, and moving the smaller one of the two to the output.
Now we need to merge three sorted input arrays (A1, A2, and A3) of the same length into a (sorted) output array, and there are two methods:
Using the above Merge algorithm to merge A1 and A2 into A4, then using the same algorithm to merge A4 and A3 into the output array.
Revising the above Merge algorithm, by repeatedly comparing the smallest elements of the three input arrays, and moving the smallest one of the three to the output array.
Which of the above two algorithms is more efficient, if only considering the worst case of array element movements (i.e., assignments)?
Which of the above two algorithms is more efficient, if only considering the worst case of array element comparisons?
Between these two algorithms, which one has a higher overall efficiency in worst case?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果您关心的只是数组写入的次数,则第二个版本(三向合并)比第一个算法(双向合并的两个实例)更快。三路合并算法将执行 3n 次写入(其中 n 是任何序列的长度),因为它一次合并所有三个范围。第一种方法会将两个范围合并在一起,执行 2n 次写入,然后将该序列与第三个序列合并,执行 3n 次写入,总共净写入 5n 次。
更一般地,假设您有 k 个元素范围,全部长度为 n。如果您成对合并这些范围,然后再次成对合并这些合并,等等,那么您将执行大约 k/2 合并步骤,合并长度为 n 的范围,然后 k/4 合并长度为 2n 的范围,然后 k/8 合并长度 4n 等。这给出了总和
kn/2 + kn/2 + ... + kn/2 (log n 次)
对于 O(kn lg n) 的数组写入净次数。另一方面,如果您在每个步骤中使用 k 路比较,则您将执行精确的 kn 次写入,这要小得多。
现在,让我们考虑一下您在每个设置中进行了多少次比较。在三路合并中,写入输出序列的每个元素都需要找到三个值中的最小值。这需要两次比较 - 一次比较前两个序列的第一个值,一次比较这两个值的最小值与第三个数组的第一个值。因此,对于写入结果序列的每个值,我们使用两次比较,并且由于写入了 3n 个值,因此我们总共需要进行最多 6n 次比较。
更好的方法是将序列存储在最小堆中,其中序列按其第一个元素进行比较。在每个步骤中,我们将具有最小第一个值的序列从堆中出列,将该值写入结果,然后将序列的其余部分重新排回堆中。对于 k 个序列,这意味着写出的每个元素最多需要 O(lg k) 次比较,因为堆插入的运行时间为 O(lg k)。这给出了 O(kn lg k) 的净运行时间,因为写出的每个 kn 元素都需要 O(lg k) 处理时间。
在另一个版本中,我们首先进行标准的双向合并,这需要对每个写出的元素进行一次比较,总共进行 2n 次比较。在合并的第二遍中,在最坏的情况下,我们总共进行 3n 次比较,因为有 3G 元素被合并。这总共给出了 5n 次比较。如果我们使用上面描述的成对合并的广义构造,我们将需要使用 O(kn lg n) 比较,因为写入的每个元素都需要一次比较,而我们执行 O(kn lg n) 写入。
简而言之,对于 k=3 的特定情况,三路合并执行 3n 次写入和 6n 次比较,总共 9n 次内存读写。迭代双向合并进行 5n 次写入和 5n 次比较,总共 10n 次内存读取和写入,因此三向合并版本更好。
如果我们考虑广义结构,k 路合并会执行 O(nk) 次写入和 O(nk lg k) 次比较,总共执行 O(nk lg k) 次内存操作。迭代双向合并算法执行 O(nk lg n) 次写入和 O(nk lg n) 次比较,总共执行 O(nk lg n) 次内存操作。因此,对于一些长序列,k 路合并渐近更好,而对于许多短序列,迭代合并排序更快。
希望这有帮助!
If all that you care about are the number of array writes, the second version (three-way merge) is faster than the first algorithm (two instances of two-way merge). The three-way merge algorithm will do exactly 3n writes (where n is the length of any of the sequences), since it merges all three ranges in one pass. The first approach will merge two of the ranges together, doing 2n writes, and will then merge that sequence with the third sequence, doing 3n writes for a net total of 5n writes.
More generally, suppose that you have k ranges of elements, all of length n. If you merge those ranges pairwise, then merge those merges pairwise again, etc., then you will do roughly k/2 merge steps merging ranges of length n, then k/4 merges of ranges of length 2n, then k/8 merges of length 4n, etc. This gives the sum
kn/2 + kn/2 + ... + kn/2 (log n times)
For a net number of array writes that are O(kn lg n). If, on the other hand, you use a k-way comparison at each step, you do exactly kn writes, which is much smaller.
Now, let's think about how many comparisons you do in each setup. In the three-way merge, each element written into the output sequence requires finding the minimum of three values. This requires two comparisons - one to compare the first values of the first two sequences, and one to compare the minimum of those two values to the first value of the third array. Thus for each value written out to the resulting sequence, we use two comparisons, and since there are 3n values written we need to do a total of at most 6n comparisons.
A much better way to do this would be to store the sequences in a min-heap, where sequences are compared by their first element. On each step, we dequeue the sequence from the heap with the smallest first value, write that value to the result, then enqueue tue rest of the sequence back into the heap. With k sequences, this means that each element written out requires at most O(lg k) comparisons, since heap insertion runs in O(lg k). This gives a net runtime of O(kn lg k), since each of the kn elements written out requires O(lg k) processing time.
In the other version, we begin by doing a standard two-way merge, which requires one comparison per element written out, for a net total of 2n comparisons. In the second pass of the merge, in the worst case we do a total of 3n comparisons, since there are 3G elements being merged. This gives a net total of 5n comparisons. If we use the generalized construction for pairwise merging that's described above, we will need to use O(kn lg n) comparisons, since each element written requires one comparison and we do O(kn lg n) writes.
In short, for the specific case of k=3, we have that the three-way merge does 3n writes and 6n comparisons for a net of 9n memory reads and writes. The iterated two-way merge does 5n writes and 5n comparisons for a net total of 10n memory reads and writes, and so the three-way-merge version is better.
If we consider the generalized constructions, the k-way merge does O(nk) writes and O(nk lg k) comparisons for a total of O(nk lg k) memory operations. The iterated two-way merge algorithm does O(nk lg n) writes and O(nk lg n) comparisons for a total of O(nk lg n) memory operations. Thus the k-way merge is asymptotically better for a few long sequences, while the iterated merge sort is faster for many short sequences.
Hope this helps!