阅读 Guido 的 对一百万个 32 位进行排序后使用 Python 在 2MB RAM 中计算整数,我发现了 heapq
模块,但这个概念对我来说非常抽象。
原因之一是我不完全理解堆的概念,但我确实理解 Guido 是如何使用它的。
现在,除了他有点疯狂的例子之外,您还会使用 heapq 模块做什么?
它必须始终与排序或最小值相关吗?您使用它只是因为它比其他方法更快吗?或者你能做一些你离不开的真正优雅的事情吗?
After reading Guido's Sorting a million 32-bit integers in 2MB of RAM using Python, I discovered the heapq
module, but the concept is pretty abstract to me.
One reason is that I don't understand the concept of a heap completely, but I do understand how Guido used it.
Now, beside his kinda crazy example, what would you use the heapq
module for?
Must it always be related to sorting or minimum value? Is it only something you use because it's faster than other approaches? Or can you do really elegant things that you can't do without?
发布评论
评论(3)
heapq 模块 通常用于实现 优先级队列。
您会看到事件调度程序中的优先级队列不断添加新事件,并且需要使用堆来有效地定位下一个调度的事件。一些示例包括:
heapq 文档包括 优先级队列实现说明,它解决了常见的用例。
此外,堆非常适合实现部分排序。例如,heapq.nsmallest 和heapq.nlargest 与完全排序和切片相比,内存效率更高,并且执行的比较次数要少得多:
The heapq module is commonly use to implement priority queues.
You see priority queues in event schedulers that are constantly adding new events and need to use a heap to efficiently locate the next scheduled event. Some examples include:
The heapq docs include priority queue implementation notes which address the common use cases.
In addition, heaps are great for implementing partial sorts. For example, heapq.nsmallest and heapq.nlargest can be much more memory efficient and do many fewer comparisons than a full sort followed by a slice:
与自平衡二叉树相比,如果只考虑复杂性,堆似乎并没有给你带来太多好处:
但是,二叉树往往需要每个节点都指向其子节点以提高效率,而堆将其数据紧密地存储在数组中。这允许您在固定数量的内存中存储更多的数据。
因此,对于只需要插入和最大删除的情况,堆是完美的,并且通常可以使用自平衡二叉树一半的内存(如果必须的话,更容易实现)。标准用例是优先级队列。
Comparing it to a self-balancing binary tree, a heap doesn't seem to gain you much if you just look at complexity:
But whereas a binary tree tends to need each node pointing to its children for efficiency, a heap stores its data packed tightly into an array. This allows you to store much more data in a fixed amount of memory.
So for the cases when you only need insertion and max-removal, a heap is perfect and can often use half as much memory as a self-balancing binary tree (and much easier to implement if you have to). The standard use-case is a priority queue.
这是我在尝试如何在 Python 2.6 中实现 Counter 模块时偶然发现的。只需查看 collections.Counter 的实现和用法即可。这实际上是通过 heapq 实现的。
This was an accidental discovery of me by trying to see how could I implement the Counter Module in Python 2.6. Just have a look into the implementation and usage of collections.Counter. This is actually implemented through heapq.