什么时候选择RB树、B树还是AVL树?
作为一名程序员,我什么时候应该考虑使用 RB 树、B 树或 AVL 树? 在做出选择之前需要考虑哪些关键点?
有人可以用每个树结构的场景解释一下为什么选择它而不是其他树结构并参考关键点吗?
As a programmer when should I consider using a RB tree, B- tree or an AVL tree?
What are the key points that needs to be considered before deciding on the choice?
Can someone please explain with a scenario for each tree structure why it is chosen over others with reference to the key points?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
请对这一点持保留态度:
当您管理数千个项目并且从磁盘或某些慢速存储介质对它们进行分页时,请使用 B 树。
RB 树,当您在树上进行相当频繁的插入、删除和检索时。
当插入和删除相对于检索不频繁时,使用 AVL 树。
Take this with a pinch of salt:
B-tree when you're managing more than thousands of items and you're paging them from a disk or some slow storage medium.
RB tree when you're doing fairly frequent inserts, deletes and retrievals on the tree.
AVL tree when your inserts and deletes are infrequent relative to your retrievals.
我认为 B+ 树是一种很好的通用有序容器数据结构,即使在主内存中也是如此。即使虚拟内存不是问题,缓存友好性通常也会成为问题,并且 B+ 树特别适合顺序访问 - 与链表具有相同的渐近性能,但缓存友好性接近简单数组。所有这些都与 O(log n) 搜索、插入和删除有关。
不过,B+ 树确实存在问题 - 例如,当您执行插入/删除时,项目在节点内移动,从而使指向这些项目的指针无效。我有一个执行“游标维护”的容器库 - 游标将自身附加到它们当前在链接列表中引用的叶节点,因此它们可以自动修复或失效。由于光标很少超过一两个,因此它工作得很好 - 但仍然是一项额外的工作。
另一件事是 B+ 树本质上就是这样。我想您可以根据是否需要剥离或重新创建非叶节点,但是使用二叉树节点您可以获得更大的灵活性。二叉树可以在不复制节点的情况下转换为链表并返回 - 您只需更改指针,然后记住您现在将其视为不同的数据结构。除此之外,这意味着您可以相当轻松地进行 O(n) 树合并 - 将两棵树转换为列表,合并它们,然后转换回树。
另一件事是内存分配和释放。在二叉树中,这可以与算法分开 - 用户可以创建一个节点然后调用插入算法,删除可以提取节点(将它们从树中分离出来,但不释放内存)。在 B 树或 B+ 树中,这显然不起作用 - 数据将存在于多项目节点中。编写插入方法来“计划”操作而不修改节点,直到知道需要多少个新节点并且可以分配它们为止,这是一个挑战。
红黑 vs. AVL?我不确定这有什么大的区别。我自己的库有一个基于策略的“工具”类来操作节点,具有双链表、简单二叉树、展开树、红黑树和陷阱的方法,包括各种转换。其中一些方法只是因为我有时感到无聊而实施的。我不确定我是否测试过trap方法。我选择红黑树而不是 AVL 的原因是因为我个人更了解这些算法 - 这并不意味着它们更简单,这只是我对它们更熟悉的历史侥幸。
最后一件事 - 我最初开发 B+ 树容器只是作为实验。这是那些从未真正结束的实验之一,但我不鼓励其他人重复这样做。如果您需要的只是一个有序容器,那么最好的答案是使用现有库提供的容器 - 例如 C++ 中的 std::map 等。我的库经过多年的发展,花了相当长的时间才变得稳定,而且我最近才发现它在技术上是不可移植的(依赖于一些未定义的行为 WRT offsetof)。
I think B+ trees are a good general-purpose ordered container data structure, even in main memory. Even when virtual memory isn't an issue, cache-friendliness often is, and B+ trees are particularly good for sequential access - the same asymptotic performance as a linked list, but with cache-friendliness close to a simple array. All this and O(log n) search, insert and delete.
B+ trees do have problems, though - such as the items moving around within nodes when you do inserts/deletes, invalidating pointers to those items. I have a container library that does "cursor maintenance" - cursors attach themselves to the leaf node they currently reference in a linked list, so they can be fixed or invalidated automatically. Since there's rarely more than one or two cursors, it works well - but it's an extra bit of work all the same.
Another thing is that the B+ tree is essentially just that. I guess you can strip off or recreate the non-leaf nodes depending on whether you need them or not, but with binary tree nodes you get a lot more flexibility. A binary tree can be converted to a linked list and back without copying nodes - you just change the pointers then remember that you're treating it as a different data structure now. Among other things, this means you get fairly easy O(n) merging of trees - convert both trees to lists, merge them, then convert back to a tree.
Yet another thing is memory allocation and freeing. In a binary tree, this can be separated out from the algorithms - the user can create a node then call the insert algorithm, and deletes can extract nodes (detach them from the tree, but dont free the memory). In a B-tree or B+-tree, that obviously doesn't work - the data will live in a multi-item node. Writing insert methods that "plan" the operation without modifying nodes until they know how many new nodes are needed and that they can be allocated is a challenge.
Red black vs. AVL? I'm not sure it makes any big difference. My own library has a policy-based "tool" class to manipulate nodes, with methods for double-linked lists, simple binary trees, splay trees, red-black trees and treaps, including various conversions. Some of those methods were only implemented because I was bored at one time or another. I'm not sure I've even tested the treap methods. The reason I chose red-black trees rather than AVL is because I personally understand the algorithms better - which doesn't mean they're simpler, it's just a fluke of history that I'm more familiar with them.
One last thing - I only originally developed my B+ tree containers as an experiment. It's one of those experiments that never ended really, but it's not something I'd encourage others to repeat. If all you need is an ordered container, the best answer is to use the one that your existing library provides - e.g. std::map etc in C++. My library evolved over years, it took quite a while to get it stable, and I just relatively recently discovered it's technically non-portable (dependent on a bit of undefined behaviour WRT offsetof).
在内存中,当项目数量超过 32000 时,B-Tree 具有优势......请查看 speedtest.pdf 来自 stx-btree。
In memory B-Tree has the advantage when the number of items is more than 32000... Look at speedtest.pdf from stx-btree.
在选择数据结构时,您需要权衡诸如
我首先阅读引用的维基百科文章作者:罗伯特·哈维。
实际上,当使用 Java 等语言时,普通程序员倾向于使用提供的集合类。如果在性能调优活动中发现收集性能有问题,那么可以寻求替代实现。这很少不是业务主导的开发必须考虑的第一件事。需要手动实现此类数据结构的情况极为罕见,通常有可以使用的库。
When choosing data structures you are trading off factors such as
I would start by reading the Wikipedia articles referenced by Robert Harvey.
Pragmatically, when working in languages such as Java the average programmer tends to use the collection classes provided. If in a performance tuning activity one discovers that the collection performance is problematic then one can seek alternative implementations. It's rarely the first thing a business-led development has to consider. It's extremely rare that one needs to implement such data structures by hand, there are usually libraries that can be used.