BTree-预定大小?
我在维基百科上读到了这个:
在 B 树中,内部(非叶)节点可以具有可变数量的 某个预定义范围内的子节点。当插入数据或 从节点中删除后,其子节点的数量会发生变化。为了 保持预定义的范围,内部节点可以加入或分裂。 由于允许存在一定范围的子节点,因此 B 树不需要 与其他自平衡搜索树一样频繁地重新平衡,但是 可能会浪费一些空间,因为节点并未完全满。
我们必须为 B 树指定这个范围。即使当我查找 CLRS(算法简介)时,它似乎也使用数组作为键和子项。我的问题是 - 有没有办法通过将键和子项定义为列表而不是预先确定的数组来减少这种空间浪费?这也太麻烦了吧?
另外,在我的一生中,我无法在 btreeDeleteNode 上获得像样的伪代码。也非常感谢这里的任何帮助。
I read this on wikipedia:
In B-trees, internal (non-leaf) nodes can have a variable number of
child nodes within some pre-defined range. When data is inserted or
removed from a node, its number of child nodes changes. In order to
maintain the pre-defined range, internal nodes may be joined or split.
Because a range of child nodes is permitted, B-trees do not need
re-balancing as frequently as other self-balancing search trees, but
may waste some space, since nodes are not entirely full.
We have to specify this range for B trees. Even when I looked up CLRS (Intro to Algorithms), it seemed to make to use of arrays for keys and children. My question is- is there any way to reduce this wastage in space by defining the keys and children as lists instead of predetermined arrays? Is this too much of a hassle?
Also, for the life of me I'm not able to get a decent psedocode on btreeDeleteNode. Any help here is appreciated too.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
当您说“列表”时,您是指链接列表吗?
某种元素的数组在每个槽中占用一个元素的内存,无论该槽是否已满。链表仅占用其实际包含的元素的内存,但对于每个元素,它占用一个元素的内存,加上一个指针的大小(如果是双向链表则为两个,除非你可以使用异或技巧来重叠它们)。
如果您存储指针并使用单链表,则每个列表链接的大小是每个数组槽大小的两倍。这意味着除非列表少于一半,否则链接列表将使用更多内存,而不是更少。
如果您使用的语言的运行时具有每个对象的开销(如 Java 和 C,除非您自己处理内存分配),那么您还必须为每个列表链接支付该开销,但仅在一个列表链接上支付一次阵,而且比例更差。
我建议你的平衡算法应该保持树节点至少半满。如果在节点已满时对其进行拆分,则会创建两个半满节点。然后,当相邻节点少于半满时,您需要合并它们。然后,您可以使用数组,因为知道它比链表更有效。
不知道删除的细节,抱歉!
When you say "lists", do you mean linked lists?
An array of some kind of element takes up one element's worth of memory per slot, whether that slot is filled or not. A linked list only takes up memory for elements it actually contains, but for each one, it takes up one element's worth of memory, plus the size of one pointer (two if it's a doubly-linked list, unless you can use the xor trick to overlap them).
If you are storing pointers, and using a singly-linked list, then each list link is twice the size of each array slot. That means that unless the list is less than half full, a linked list will use more memory, not less.
If you're using a language whose runtime has per-object overhead (like Java, and like C unless you are handling memory allocation yourself), then you will also have to pay for that overhead on each list link, but only once on an array, and the ratio is even worse.
I would suggest that your balancing algorithm should keep tree nodes at least half full. If you split a node when it is full, you will create two half-full nodes. You then need to merge adjacent nodes when they are less than half full. You can then use an array, safe in the knowledge that it is more efficient than a linked list.
No idea about the details of deletion, sorry!
B-Tree 节点有一个重要的特性,节点中的所有键都是排序的。当查找特定键时,使用二分搜索来查找正确的位置。使用二分搜索保持了 B-Tree 中搜索算法的复杂度
O(logn)
。如果用某种链接列表替换预分配的数组,则会丢失顺序。除非你使用一些复杂的数据结构,比如跳表,来保持搜索算法的复杂度为
O(logn)
。但这完全没有必要,跳过列表本身更好。B-Tree node has an important characteristic, all keys in the node is sorted. When finding a specific key, binary search is used to find the right position. Using binary search keeps the complexity of search algorithm in B-Tree
O(logn)
.If you replace the preallocated array with some kind of linked list, you lost the ordering. Unless you use some complex data structures, like skip list, to keep the search algorithm with
O(logn)
. But it's totally unnecessary, skip list itself is better.