合并排序数组,最佳时间复杂度是多少?
我有 m 个数组,每个数组的长度为 n。每个数组都已排序。我想创建一个长度为 m*n 的单个数组,其中包含先前数组的所有值(包括重复值),并已排序。我必须合并这些数组..
我认为最佳时间复杂度是 m*n*log(m)
这是算法的草图..
我创建一个长度为 m 的支持数组 H,包含第一个元素的所有值每个数组。
然后,我对该数组进行排序 (m log m),并将最小值移至输出数组。
然后,我将移动的值替换为从数组中获取的下一个值。实际上我没有替换它,而是将它插入到正确的(已排序的)位置。我认为这需要记录。
我对所有 m*n 值重复此操作...因此 m*n*log m
我的问题..您能想到更有效的算法吗?如果 mnlogm 实际上是最佳的,你至少能想到一个更简单、更优雅的算法吗?
I have m arrays, every array is of length n. Each array is sorted. I want to create a single array of length m*n, containing all the values of the previous arrays (including repeating values), sorted. I have to merge these arrays..
I think the optimum time complexity is m*n*log(m)
Here's the sketch of the algorithm..
I create a support array H of lenth m, containing all the values of the first element of each array.
I then sort this array (m log m), and move the min value to the output array.
I then replace the moved value with the next one, from the array it was taken. Actually I don't replace it, but I insert it in the right (sorted) position. This take log m I think.
And I repeat this for all m*n values... therefore m*n*log m
My question.. can you think of a more efficient algorithm? If mnlogm is actually optimum, can you at least think of a simpler, more elegant algorith?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
复杂度刚刚好!但是,您的算法思想有一个小缺陷:您无法在
log m
的排序数组中插入项目。在这种复杂性中,您可以使用二分搜索找到它的位置,但您可能必须移动元素才能将其实际放置在那里。要解决这个问题,您可以使用堆数据结构!多路合并(这是算法的通用名称)通常使用另一种“合并”数据结构:锦标赛树来实现。您可以在 Knuth 的“计算机编程艺术”(排序章节,iirc)中找到描述。在这种特定情况下,与堆相比,它在理论上和实践中具有较低的常数因子。
如果您想查看实现,我非常确定 GNU C++ 标准库并行扩展中的并行多路合并就是通过这种方式实现的。
编辑:我引用了错误的书,现已修复。
The complexity is right! However, there's a small flaw in your algorithm idea: You cannot insert an item in a sorted array in
log m
. You can find its position using binary search in that complexity, but you might have to move elements around to actually place it there. To fix this problem, you can use a heap data-structure instead!Multi-way merge (which is the common name of your algorithm) is usually implemented with yet another 'merging' data-structure: the tournament-tree. You can find a description in Knuth's "The Art of Computer Programming" (Chapter on Sorting, iirc). It has a lower constant factor in theory and in practice when compared to heaps in this specific case.
If you want to look implementations, I'm pretty sure that the parallel multi-way merge in the GNU C++ Standard library parallel-extensions is implemented this way.
Edit: I referenced the wrong book, which is fixed now.
你能做的最好的是 O(m*n + d)。类似于计数排序: http://en.wikipedia.org/wiki/Counting_sort 如果你知道可能的值范围(例如 d),您可以初始化长度为 d 的数组,然后扫描 m 个数组中的每一个,为 d 中与该 bin 对应的每个值的每个“bin”添加 1。然后,在长度为 m*n 的新数组中,为 d 中的每个值添加 bin 具有的计数。
Best you can do is O(m*n + d). Similar to counting sort: http://en.wikipedia.org/wiki/Counting_sort If you know the range of values possible (d, say) you can initialize an array of length d, and then scan through each of the m arrays adding 1 to each 'bin' in d for each value corresponding to that bin. Then in your new array of length m*n for each value in d you add however many counts that bin has.