堆与二叉搜索树 (BST)
堆和 BST 有什么区别?
何时使用堆、何时使用 BST?
如果你想以排序的方式获取元素,BST 是否比堆更好?
What is the difference between a heap and BST?
When to use a heap and when to use a BST?
If you want to get the elements in a sorted fashion, is BST better over heap?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
摘要
此表中除插入之外的所有平均时间均与其最差时间相同。
*
:在这个答案中的任何地方,BST ==平衡BST,因为不平衡渐进地糟糕**
:使用这个答案中解释的一个简单的修改***:
log(n)
表示指针树堆,n
表示动态数组堆二叉堆相对于 BST 的优点
平均时间插入二叉堆是
O(1)
,对于 BST 来说是O(log(n))
。 这是堆的杀手级功能。还有其他达到
O(1)
摊销(更强)的堆,例如 斐波那契堆,甚至最坏的情况,例如 Brodal队列,尽管由于非渐近性能,它们可能不实用:斐波那契堆或布罗达尔队列在实践中是否在任何地方使用?二进制堆可以在之上有效地实现href="https://en.wikipedia.org/wiki/Dynamic_array" rel="noreferrer">动态数组或基于指针的树,BST仅基于指针的树。因此,对于堆,如果我们能够承受偶尔调整大小的延迟,我们可以选择空间效率更高的数组实现。
二进制堆创建最坏情况是
O(n)
,O(n log(n))
对于 BST。BST 相对于二叉堆的优势
搜索任意元素的时间复杂度为
O(log(n))
。 这是 BST 的杀手级功能。对于堆来说,一般来说是
O(n)
,除了最大的元素是O(1)
。堆相对于 BST 的“错误”优势
堆查找最大值、BST 的时间为
O(1)
O(log(n))
.这是一种常见的误解,因为修改 BST 来跟踪最大元素并在该元素可能更改时更新它是微不足道的:在插入较大的交换时,在删除时查找第二大的。 我们可以使用二叉搜索树来模拟堆操作吗?< /a>(由 Yeo 提及)。
实际上,与 BST 相比,这是堆的一个限制:唯一有效的搜索是最大元素的搜索。
平均二进制堆插入是
O(1)
来源:
直观的论据:
在二叉堆中,以a处增加值出于同样的原因,给定索引也是
O(1)
。但如果您想这样做,您可能需要在堆操作上保留一个额外的索引如何为基于最小堆的优先级队列实现 O(logn) 减少键操作? 例如迪克斯特拉。无需额外时间成本即可实现。GCC C++ 标准库在真实硬件上插入基准
我对 C++
std::set
(红黑树 BST)和std::priority_queue
(动态数组堆)插入来看看我对插入时间的判断是否正确,这就是我得到的:很明显:
堆插入时间基本恒定。
我们可以清楚地看到动态数组调整大小点。由于我们对每 10k 插入进行平均 为了能够看到系统噪声之上的任何东西,这些峰值实际上大约是比显示的大一万倍!
缩放图基本上仅排除了数组调整大小点,并显示几乎所有插入都在 25 纳秒以下。
BST 是对数的。所有插入都比平均堆插入慢得多。
BST 与 hashmap 详细分析位于:什么数据结构体位于 C++ 中的 std::map 内部?
GCC C++ 标准库在 gem5 上插入基准测试
gem5 是一个完整的系统模拟器,因此通过
m5 dumpstats
提供无限精确的时钟。所以我尝试用它来估计单个插入的时间。解释:
堆仍然是恒定的,但是现在我们更详细地看到有几行,并且每一个较高的行都更加稀疏。
这必须对应于越来越高的插入的内存访问延迟。
TODO 我无法真正完整地解释 BST,因为它看起来不那么对数并且更恒定。
有了这个更详细的信息,我们还可以看到一些不同的线,但我不确定它们代表什么:我希望底线更细,因为我们插入了顶部底部?
在 Buildroot 设置 上进行基准测试aarch64 HPI CPU。
BST 无法在数组上有效实现
堆操作只需要在单个树枝上向上或向下冒泡,因此
O(log(n))
最坏情况交换,O(1)
平均。保持 BST 平衡需要树旋转,这可能会更改另一个树的顶部元素,并且需要移动整个数组 (
O(n)
)。可以在数组上有效地实现堆
可以根据当前索引计算父索引和子索引如此处所示。
没有像 BST 这样的平衡操作。
删除 min 是最令人担忧的操作,因为它必须是自上而下的。但它总是可以通过“渗透”堆的单个分支来完成 如此处解释。这会导致 O(log(n)) 最坏情况,因为堆始终保持良好平衡。
如果您为删除的每个节点插入一个节点,那么您将失去堆提供的渐近 O(1) 平均插入的优势,因为删除将占主导地位,您还不如使用 BST。然而,Dijkstra 每次删除都会多次更新节点,所以我们没问题。
动态数组堆与指针树堆
堆可以在指针堆之上有效地实现:是否可以实现高效的基于指针的二进制堆实现?
动态数组实现的空间效率更高。假设每个堆元素只包含一个指向结构体的指针:
4n
(3 个树指针 + 1 个struct
指针)。树 BST 还需要进一步的平衡信息,例如黑红度。
动态数组实现的大小在加倍后可以是
2n
。因此平均为1.5n
。另一方面,树堆具有更好的最坏情况插入,因为复制后备动态数组使其大小加倍需要
O(n)
最坏情况,而树堆只是为每个进行新的小分配节点。不过,后备数组加倍是
O(1)
摊销的,因此它归结为最大延迟考虑。 此处提到。哲学
BST 维护父代和所有子代之间的全局属性(左小,右大)。
BST 的顶部节点是中间元素,需要全局知识来维护(知道有多少个较小和较大的元素)。
此全局属性的维护成本较高(log n insert),但提供更强大的搜索(log n search)。
堆维护父级和直接子级之间的本地属性(父级>子级)。
堆的顶部节点是大元素,只需要局部知识来维护(知道你的父节点)。
比较 BST、Heap 和 Hashmap:
BST:可以是合理的:
堆:只是一个排序机。不可能是高效的无序集合,因为您只能快速检查最小/最大元素。
哈希映射:只能是无序集合,不是高效的排序机,因为哈希会混淆任何排序。
双向链表
双向链表可以看作堆的子集,其中第一项具有最高优先级,因此我们也在这里比较它们:
O(1)
最坏的情况,因为我们有指向项目的指针,而且更新非常简单O(1)
,因此比链表差。权衡具有更通用的插入位置。O(n)
对于两者一个用例是当堆的键是当前时间戳时:在这种情况下,新条目将始终位于列表的开头。所以我们甚至可以完全忘记确切的时间戳,而只保留列表中的位置作为优先级。
这可用于实现 LRU 缓存。就像对于像 Dijkstra 这样的堆应用程序,您将需要保留一个从键到列表相应节点的附加哈希映射,以找到要快速更新的节点。
不同平衡 BST 的比较
虽然到目前为止我所看到的通常被归类为“平衡 BST”的所有数据结构的渐近插入和查找时间都是相同的,但不同的 BBST 确实有不同的交易 -关闭。我还没有完全研究过这一点,但最好在这里总结这些权衡:
另请参阅
关于 CS 的类似问题:https://cs.stackexchange。 com/questions/27860/二进制搜索树和二进制堆之间的区别是什么
Summary
All average times on this table are the same as their worst times except for Insert.
*
: everywhere in this answer, BST == Balanced BST, since unbalanced sucks asymptotically**
: using a trivial modification explained in this answer***
:log(n)
for pointer tree heap,n
for dynamic array heapAdvantages of binary heap over a BST
average time insertion into a binary heap is
O(1)
, for BST isO(log(n))
. This is the killer feature of heaps.There are also other heaps which reach
O(1)
amortized (stronger) like the Fibonacci Heap, and even worst case, like the Brodal queue, although they may not be practical because of non-asymptotic performance: Are Fibonacci heaps or Brodal queues used in practice anywhere?binary heaps can be efficiently implemented on top of either dynamic arrays or pointer-based trees, BST only pointer-based trees. So for the heap we can choose the more space efficient array implementation, if we can afford occasional resize latencies.
binary heap creation is
O(n)
worst case,O(n log(n))
for BST.Advantage of BST over binary heap
search for arbitrary elements is
O(log(n))
. This is the killer feature of BSTs.For heap, it is
O(n)
in general, except for the largest element which isO(1)
."False" advantage of heap over BST
heap is
O(1)
to find max, BSTO(log(n))
.This is a common misconception, because it is trivial to modify a BST to keep track of the largest element, and update it whenever that element could be changed: on insertion of a larger one swap, on removal find the second largest. Can we use binary search tree to simulate heap operation? (mentioned by Yeo).
Actually, this is a limitation of heaps compared to BSTs: the only efficient search is that for the largest element.
Average binary heap insert is
O(1)
Sources:
Intuitive argument:
In a binary heap, increasing the value at a given index is also
O(1)
for the same reason. But if you want to do that, it is likely that you will want to keep an extra index up-to-date on heap operations How to implement O(logn) decrease-key operation for min-heap based Priority Queue? e.g. for Dijkstra. Possible at no extra time cost.GCC C++ standard library insert benchmark on real hardware
I benchmarked the C++
std::set
(Red-black tree BST) andstd::priority_queue
(dynamic array heap) insert to see if I was right about the insert times, and this is what I got:So clearly:
heap insert time is basically constant.
We can clearly see dynamic array resize points. Since we are averaging every 10k inserts to be able to see anything at all above system noise, those peaks are in fact about 10k times larger than shown!
The zoomed graph excludes essentially only the array resize points, and shows that almost all inserts fall under 25 nanoseconds.
BST is logarithmic. All inserts are much slower than the average heap insert.
BST vs hashmap detailed analysis at: What data structure is inside std::map in C++?
GCC C++ standard library insert benchmark on gem5
gem5 is a full system simulator, and therefore provides an infinitely accurate clock with with
m5 dumpstats
. So I tried to use it to estimate timings for individual inserts.Interpretation:
heap is still constant, but now we see in more detail that there are a few lines, and each higher line is more sparse.
This must correspond to memory access latencies are done for higher and higher inserts.
TODO I can't really interpret the BST fully one as it does not look so logarithmic and somewhat more constant.
With this greater detail however we can see can also see a few distinct lines, but I'm not sure what they represent: I would expect the bottom line to be thinner, since we insert top bottom?
Benchmarked with this Buildroot setup on an aarch64 HPI CPU.
BST cannot be efficiently implemented on an array
Heap operations only need to bubble up or down a single tree branch, so
O(log(n))
worst case swaps,O(1)
average.Keeping a BST balanced requires tree rotations, which can change the top element for another one, and would require moving the entire array around (
O(n)
).Heaps can be efficiently implemented on an array
Parent and children indexes can be computed from the current index as shown here.
There are no balancing operations like BST.
Delete min is the most worrying operation as it has to be top down. But it can always be done by "percolating down" a single branch of the heap as explained here. This leads to an O(log(n)) worst case, since the heap is always well balanced.
If you are inserting a single node for every one you remove, then you lose the advantage of the asymptotic O(1) average insert that heaps provide as the delete would dominate, and you might as well use a BST. Dijkstra however updates nodes several times for each removal, so we are fine.
Dynamic array heaps vs pointer tree heaps
Heaps can be efficiently implemented on top of pointer heaps: Is it possible to make efficient pointer-based binary heap implementations?
The dynamic array implementation is more space efficient. Suppose that each heap element contains just a pointer to a
struct
:the tree implementation must store three pointers for each element: parent, left child and right child. So the memory usage is always
4n
(3 tree pointers + 1struct
pointer).Tree BSTs would also need further balancing information, e.g. black-red-ness.
the dynamic array implementation can be of size
2n
just after a doubling. So on average it is going to be1.5n
.On the other hand, the tree heap has better worst case insert, because copying the backing dynamic array to double its size takes
O(n)
worst case, while the tree heap just does new small allocations for each node.Still, the backing array doubling is
O(1)
amortized, so it comes down to a maximum latency consideration. Mentioned here.Philosophy
BSTs maintain a global property between a parent and all descendants (left smaller, right bigger).
The top node of a BST is the middle element, which requires global knowledge to maintain (knowing how many smaller and larger elements are there).
This global property is more expensive to maintain (log n insert), but gives more powerful searches (log n search).
Heaps maintain a local property between parent and direct children (parent > children).
The top node of a heap is the big element, which only requires local knowledge to maintain (knowing your parent).
Comparing BST vs Heap vs Hashmap:
BST: can either be either a reasonable:
heap: is just a sorting machine. Cannot be an efficient unordered set, because you can only check for the smallest/largest element fast.
hash map: can only be an unordered set, not an efficient sorting machine, because the hashing mixes up any ordering.
Doubly-linked list
A doubly linked list can be seen as subset of the heap where first item has greatest priority, so let's compare them here as well:
O(1)
worst case since we have pointers to the items, and the update is really simpleO(1)
average, thus worse than linked list. Tradeoff for having more general insertion position.O(n)
for bothAn use case for this is when the key of the heap is the current timestamp: in that case, new entries will always go to the beginning of the list. So we can even forget the exact timestamp altogether, and just keep the position in the list as the priority.
This can be used to implement an LRU cache. Just like for heap applications like Dijkstra, you will want to keep an additional hashmap from the key to the corresponding node of the list, to find which node to update quickly.
Comparison of different Balanced BST
Although the asymptotic insert and find times for all data structures that are commonly classified as "Balanced BSTs" that I've seen so far is the same, different BBSTs do have different trade-offs. I haven't fully studied this yet, but it would be good to summarize these trade-offs here:
See also
Similar question on CS: https://cs.stackexchange.com/questions/27860/whats-the-difference-between-a-binary-search-tree-and-a-binary-heap
堆只是保证较高级别上的元素比较低级别上的元素更大(对于最大堆)或更小(对于最小堆),而 BST 则保证顺序(从“左”到“右”)。如果你想要排序的元素,请使用 BST。
Heap just guarantees that elements on higher levels are greater (for max-heap) or smaller (for min-heap) than elements on lower levels, whereas BST guarantees order (from "left" to "right"). If you want sorted elements, go with BST.
堆更适合 findMin/findMax (
O(1)
),而 BST 则擅长所有查找 (O(logN)
)。对于这两种结构,插入都是O(logN)
。如果您只关心findMin/findMax(例如与优先级相关),请使用堆。如果你想让一切都排序,那就选择 BST。此处的前几张幻灯片非常清楚地解释了事情。
Heap is better at findMin/findMax (
O(1)
), while BST is good at all finds (O(logN)
). Insert isO(logN)
for both structures. If you only care about findMin/findMax (e.g. priority-related), go with heap. If you want everything sorted, go with BST.First few slides from here explain things very clearly.
正如其他人提到的,堆可以在 O(1) 中执行
findMin
或findMax
,但不能在同一数据结构中执行这两者。不过我不同意 Heap 在 findMin/findMax 中更好。事实上,只需稍加修改,BST 就可以两者findMin
和findMax
在 O(1) 内。在此修改后的 BST 中,每次执行可能会修改数据结构的操作时,您都会跟踪最小节点和最大节点。例如,在插入操作中,您可以检查最小值是否大于新插入的值,然后将最小值分配给新添加的节点。相同的技术可以应用于最大值。因此,此 BST 包含这些信息,您可以在 O(1) 中检索它们。 (与二进制堆相同)
在此 BST(平衡 BST)中,当您
pop min
或pop max
时,要分配的下一个最小值是后继< /em> 的最小节点,而下一个要分配的最大值是最大节点的前任值。因此它的执行时间为 O(1)。然而我们需要重新平衡树,因此它仍然会运行 O(log n)。 (与二进制堆相同)我有兴趣在下面的评论中听到您的想法。谢谢:)
更新
交叉引用类似问题我们可以使用二叉搜索树来模拟堆操作吗?有关模拟堆的更多讨论使用二叉搜索树。
As mentioned by others, Heap can do
findMin
orfindMax
in O(1) but not both in the same data structure. However I disagree that Heap is better in findMin/findMax. In fact, with a slight modification, the BST can do bothfindMin
andfindMax
in O(1).In this modified BST, you keep track of the the min node and max node everytime you do an operation that can potentially modify the data structure. For example in insert operation you can check if the min value is larger than the newly inserted value, then assign the min value to the newly added node. The same technique can be applied on the max value. Hence, this BST contain these information which you can retrieve them in O(1). (same as binary heap)
In this BST (Balanced BST), when you
pop min
orpop max
, the next min value to be assigned is the successor of the min node, whereas the next max value to be assigned is the predecessor of the max node. Thus it perform in O(1). However we need to re-balance the tree, thus it will still run O(log n). (same as binary heap)I would be interested to hear your thought in the comment below. Thanks :)
Update
Cross reference to similar question Can we use binary search tree to simulate heap operation? for more discussion on simulating Heap using BST.
二叉搜索树使用这样的定义:对于每个节点,其左侧的节点具有较小的值(key),而其右侧的节点具有较大的值(key)。
作为二叉树的实现,堆使用以下定义:
如果 A 和 B 是节点,其中 B 是 A 的子节点,则 A 的值(键)必须大于或等于该值B.的(关键)
键(A) ≥ 键(B)。
http://wiki.answers.com/Q/Difference_ Between_binary_search_tree_and_heap_tree
我今天遇到了同样的问题我的考试我做对了。微笑 ... :)
A binary search tree uses the definition: that for every node,the node to the left of it has a less value(key) and the node to the right of it has a greater value(key).
Where as the heap,being an implementation of a binary tree uses the following definition:
If A and B are nodes, where B is the child node of A,then the value(key) of A must be larger than or equal to the value(key) of B.That is,
key(A) ≥ key(B).
http://wiki.answers.com/Q/Difference_between_binary_search_tree_and_heap_tree
I ran in the same question today for my exam and I got it right. smile ... :)
BST 在堆上的另一种用途;因为一个重要的区别:
在堆上使用 BST:现在,假设我们使用一个数据结构来存储航班的着陆时间。如果着陆时间差小于“d”,我们无法安排航班着陆。并假设许多航班已计划降落在数据结构(BST 或堆)中。
现在,我们想要安排另一个在 t 降落的航班。因此,我们需要计算t与其后继者和前任者的差值(应该>d)。
因此,我们为此需要一个 BST,如果平衡的话,它可以在 O(logn) 内快速即。
编辑:
排序 BST 需要 O(n) 时间来按排序顺序打印元素(中序遍历),而 Heap 可以在 O(n logn) 时间内完成。
堆提取最小元素并重新堆化数组,这使得它在 O(n logn) 时间内完成排序。
Another use of BST over Heap; because of an important difference :
Use of BST over a Heap: Now, Lets say we use a data structure to store landing time of flights. We cannot schedule a flight to land if difference in landing times is less than 'd'. And assume many flights have been scheduled to land in a data structure(BST or Heap).
Now, we want to schedule another Flight which will land at t. Hence, we need to calculate difference of t with its successor and predecessor (should be >d).
Thus, we will need a BST for this, which does it fast i.e. in O(logn) if balanced.
EDITed:
Sorting BST takes O(n) time to print elements in sorted order (Inorder traversal), while Heap can do it in O(n logn) time.
Heap extracts min element and re-heapifies the array, which makes it do the sorting in O(n logn) time.
将数组中的所有 n 个元素插入 BST 需要 O(n logn)。数组中的 n 个元素可以在 O(n) 时间内插入到堆中。这给堆带来了明显的优势
Insert all n elements from an array to BST takes O(n logn). n elemnts in an array can be inserted to a heap in O(n) time. Which gives heap a definite advantage
参考
Reference