在现实生活中,您会使用 heapq Python 模块做什么?

发布于 2024-12-22 21:52:32 字数 368 浏览 2 评论 0 原文

阅读 Guido 的 对一百万个 32 位进行排序后使用 Python 在 2MB RAM 中计算整数,我发现了 heapq 模块,但这个概念对我来说非常抽象。

原因之一是我不完全理解堆的概念,但我确实理解 Guido 是如何使用它的。

现在,除了他有点疯狂的例子之外,您还会使用 heapq 模块做什么?

它必须始终与排序或最小值相关吗?您使用它只是因为它比其他方法更快吗?或者你能做一些你离不开的真正优雅的事情吗?

After reading Guido's Sorting a million 32-bit integers in 2MB of RAM using Python, I discovered the heapq module, but the concept is pretty abstract to me.

One reason is that I don't understand the concept of a heap completely, but I do understand how Guido used it.

Now, beside his kinda crazy example, what would you use the heapq module for?

Must it always be related to sorting or minimum value? Is it only something you use because it's faster than other approaches? Or can you do really elegant things that you can't do without?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

貪欢 2024-12-29 21:52:32

heapq 模块 通常用于实现 优先级队列

您会看到事件调度程序中的优先级队列不断添加新事件,并且需要使用堆来有效地定位下一个调度的事件。一些示例包括:

heapq 文档包括 优先级队列实现说明,它解决了常见的用例。

此外,堆非常适合实现部分排序。例如,heapq.nsmallestheapq.nlargest 与完全排序和切片相比,内存效率更高,并且执行的比较次数要少得多:

>>> from heapq import nlargest
>>> from random import random
>>> nlargest(5, (random() for i in xrange(1000000)))
[0.9999995650034837, 0.9999985756262746, 0.9999971934450994, 0.9999960394998497, 0.9999949126363714]

The heapq module is commonly use to implement priority queues.

You see priority queues in event schedulers that are constantly adding new events and need to use a heap to efficiently locate the next scheduled event. Some examples include:

The heapq docs include priority queue implementation notes which address the common use cases.

In addition, heaps are great for implementing partial sorts. For example, heapq.nsmallest and heapq.nlargest can be much more memory efficient and do many fewer comparisons than a full sort followed by a slice:

>>> from heapq import nlargest
>>> from random import random
>>> nlargest(5, (random() for i in xrange(1000000)))
[0.9999995650034837, 0.9999985756262746, 0.9999971934450994, 0.9999960394998497, 0.9999949126363714]
猫烠⑼条掵仅有一顆心 2024-12-29 21:52:32

与自平衡二叉树相比,如果只考虑复杂性,堆似乎并没有给你带来太多好处:

  • 插入:两者都是 O(logN)
  • 删除最大元素:两者都是 O(logN)
  • 从数组构建结构堆的元素数量为 O(N),二叉树的元素数量为 O(N log N)。

但是,二叉树往往需要每个节点都指向其子节点以提高效率,而堆将其数据紧密地存储在数组中。这允许您在固定数量的内存中存储更多的数据。

因此,对于只需要插入和最大删除的情况,堆是完美的,并且通常可以使用自平衡二叉树一半的内存(如果必须的话,更容易实现)。标准用例是优先级队列。

Comparing it to a self-balancing binary tree, a heap doesn't seem to gain you much if you just look at complexity:

  • Insertion: O(logN) for both
  • Remove max element: O(logN) for both
  • Build structure from an array of elements O(N) for heap, O(N log N) for binary tree.

But whereas a binary tree tends to need each node pointing to its children for efficiency, a heap stores its data packed tightly into an array. This allows you to store much more data in a fixed amount of memory.

So for the cases when you only need insertion and max-removal, a heap is perfect and can often use half as much memory as a self-balancing binary tree (and much easier to implement if you have to). The standard use-case is a priority queue.

横笛休吹塞上声 2024-12-29 21:52:32

这是我在尝试如何在 Python 2.6 中实现 Counter 模块时偶然发现的。只需查看 collections.Counter 的实现和用法即可。这实际上是通过 heapq 实现的。

This was an accidental discovery of me by trying to see how could I implement the Counter Module in Python 2.6. Just have a look into the implementation and usage of collections.Counter. This is actually implemented through heapq.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文