C语言中如何对一个非常大的数组进行排序
我想在 C 语言中按四百万个 long long
的顺序进行排序。通常我只需将 malloc()
一个缓冲区用作数组并调用 qsort ()
但 400 万 * 8 字节是一大块连续内存。
做到这一点最简单的方法是什么?我对此的评价是轻松而不是纯粹的速度。我不想使用任何库,结果需要在 Windows 和 Linux 下的普通上网本上运行。
I want to sort on the order of four million long long
s in C. Normally I would just malloc()
a buffer to use as an array and call qsort()
but four million * 8 bytes is one huge chunk of contiguous memory.
What's the easiest way to do this? I rate ease over pure speed for this. I'd prefer not to use any libraries and the result will need to run on a modest netbook under both Windows and Linux.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
只需分配一个缓冲区并调用
qsort
。如今,即使在普通的上网本上,32MB 也不算太大了。如果您确实必须将其拆分:对较小的块进行排序,将它们写入文件,然后合并它们(合并对每个要合并的内容进行一次线性传递)。但是,真的,不要。只需排序即可。
(Knuth 的第 2 卷中对排序和合并方法进行了很好的讨论,其中称为“外部排序”。当 Knuth 撰写该文章时,外部数据可能位于磁带上,但其原理并不是很清楚。与光盘不同:您仍然希望 I/O 尽可能连续,但与 SSD 的权衡略有不同。)
Just allocate a buffer and call
qsort
. 32MB isn't so very big these days even on a modest netbook.If you really must split it up: sort smaller chunks, write them to files, and merge them (a merge takes a single linear pass over each of the things being merged). But, really, don't. Just sort it.
(There's a good discussion of the sort-and-merge approach in volume 2 of Knuth, where it's called "external sorting". When Knuth was writing that, the external data would have been on magnetic tape, but the principles aren't very different with discs: you still want your I/O to be as sequential as possible. The tradeoffs are a bit different with SSDs.)
32MB?那不是太大......快速排序应该可以解决问题。
32 MB? thats not too big.... quicksort should do the trick.
最好的选择是尽可能防止数据无序。就像已经提到的那样,您最好将数据从磁盘(或网络或任何来源)直接读取到自组织容器(一棵树,也许
std::set
就可以)。这样,您就不必对大量内容进行排序,也不必担心内存管理。如果您知道容器所需的容量,则可以通过使用
std::vector(initialcapacity)
或预先调用vector::reserve
来挤出额外的性能。然后,最好建议您使用
std::make_heap
对任何现有元素进行堆化,然后使用push_heap
逐个添加元素(请参阅还有pop_heap
)。这本质上是与自排序集相同的范例,但(哦,小细节,请注意堆上的
sort_heap
最多需要 N log N 次比较,其中 N 是元素数量)如果您认为这是一个有趣的方法,请告诉我。我真的需要更多有关用例的信息
Your best option would be to prevent having the data unordered if possible. Like it has been mentioned, you'd be better of reading the data from disk (or network or whatever the source) directly into a selforganizing container (a tree, perhaps
std::set
will do).That way, you'll never have to sort through the lot, or have to worry about memory management. If you know the required capacity of the container, you might squeeze out additional performance by using
std::vector(initialcapacity)
or callvector::reserve
up front.You'd then best be advised to use
std::make_heap
to heapify any existing elements, and then add element by element usingpush_heap
(see alsopop_heap
). This essentially is the same paradigm as the self-ordering set but(Oh, minor detail, note that
sort_heap
on the heap takes at most N log N comparisons, where N is the number of elements)Let me know if you think this is an interesting approach. I'd really need a bit more info on the use case