C语言中如何对一个非常大的数组进行排序

发布于 2024-10-30 19:00:52 字数 224 浏览 9 评论 0原文

我想在 C 语言中按四百万个 long long 的顺序进行排序。通常我只需将 malloc() 一个缓冲区用作数组并调用 qsort () 但 400 万 * 8 字节是一大块连续内存。

做到这一点最简单的方法是什么?我对此的评价是轻松而不是纯粹的速度。我不想使用任何库,结果需要在 Windows 和 Linux 下的普通上网本上运行。

I want to sort on the order of four million long longs in C. Normally I would just malloc() a buffer to use as an array and call qsort() but four million * 8 bytes is one huge chunk of contiguous memory.

What's the easiest way to do this? I rate ease over pure speed for this. I'd prefer not to use any libraries and the result will need to run on a modest netbook under both Windows and Linux.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

哎呦我呸! 2024-11-06 19:00:52

只需分配一个缓冲区并调用qsort。如今,即使在普通的上网本上,32MB 也不算太大了。

如果您确实必须将其拆分:对较小的块进行排序,将它们写入文件,然后合并它们(合并对每个要合并的内容进行一次线性传递)。但是,真的,不要。只需排序即可。

(Knuth 的第 2 卷中对排序和合并方法进行了很好的讨论,其中称为“外部排序”。当 Knuth 撰写该文章时,外部数据可能位于磁带上,但其原理并不是很清楚。与光盘不同:您仍然希望 I/O 尽可能连续,但与 SSD 的权衡略有不同。)

Just allocate a buffer and call qsort. 32MB isn't so very big these days even on a modest netbook.

If you really must split it up: sort smaller chunks, write them to files, and merge them (a merge takes a single linear pass over each of the things being merged). But, really, don't. Just sort it.

(There's a good discussion of the sort-and-merge approach in volume 2 of Knuth, where it's called "external sorting". When Knuth was writing that, the external data would have been on magnetic tape, but the principles aren't very different with discs: you still want your I/O to be as sequential as possible. The tradeoffs are a bit different with SSDs.)

抚你发端 2024-11-06 19:00:52

32MB?那不是太大......快速排序应该可以解决问题。

32 MB? thats not too big.... quicksort should do the trick.

无可置疑 2024-11-06 19:00:52

最好的选择是尽可能防止数据无序。就像已经提到的那样,您最好将数据从磁盘(或网络或任何来源)直接读取到自组织容器(一棵树,也许 std::set 就可以)。

这样,您就不必对大量内容进行排序,也不必担心内存管理。如果您知道容器所需的容量,则可以通过使用 std::vector(initialcapacity) 或预先调用 vector::reserve 来挤出额外的性能。

然后,最好建议您使用 std::make_heap 对任何现有元素进行堆化,然后使用 push_heap 逐个添加元素(请参阅还有pop_heap)。这本质上是与自排序集相同的范例,但

  • 重复是可以的,
  • 存储被“优化”为平面数组(这非常适合例如共享内存映射内存映射文件< /em>)

(哦,小细节,请注意堆上的 sort_heap 最多需要 N log N 次比较,其中 N 是元素数量)

如果您认为这是一个有趣的方法,请告诉我。我真的需要更多有关用例的信息

Your best option would be to prevent having the data unordered if possible. Like it has been mentioned, you'd be better of reading the data from disk (or network or whatever the source) directly into a selforganizing container (a tree, perhaps std::set will do).

That way, you'll never have to sort through the lot, or have to worry about memory management. If you know the required capacity of the container, you might squeeze out additional performance by using std::vector(initialcapacity) or call vector::reserve up front.

You'd then best be advised to use std::make_heap to heapify any existing elements, and then add element by element using push_heap (see also pop_heap). This essentially is the same paradigm as the self-ordering set but

  • duplicates are ok
  • the storage is 'optimized' as a flat array (which is perfect for e.g. shared memory maps or memory mapped files)

(Oh, minor detail, note that sort_heap on the heap takes at most N log N comparisons, where N is the number of elements)

Let me know if you think this is an interesting approach. I'd really need a bit more info on the use case

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文