分裂 b+ 中的节点树
我试图弄清楚当节点溢出时到底会发生什么。 信息: 在我的 b+ 树中,每个块有 4 个指针和 3 个数据部分。 问题: 我明白,当出现溢出时,我们会分成 2 个节点,每个节点有 2 个节点 键, 并将中间值插入父节点,而不从子节点中删除(与 b 树不同)。
但是我遇到了这样的情况:
|21|30|50|
|10|20|-| |21|22|25| |30|40|-| |50|60|80|
我想插入钥匙 23 首先我分裂|21|22|25|进入:|21|22|-|和|23|25|-| 现在我需要将密钥 23 插入父项 |21|30|50|女巫导致另一次分裂。 |21|23|-|和|30|50|-| 但是30之前的指针指向哪里呢? 是否有可能 this 指针 & 23后面的那个指向|23|25|-| ?
I'm trying to figure out what exactly happens when there is a node overflow.
info:
in my b+ tree there are 4 pointers per block and 3 data sections .
problem:
I understood that when there is an overflow we split into 2 nodes in my case each with 2
keys,
and insert the parent node the mid value, without erasing from the son(unlike in b tree).
however I got into situation:
|21|30|50|
|10|20|-| |21|22|25| |30|40|-| |50|60|80|
and I want to insert the key 23
first I split |21|22|25| into: |21|22|-| and |23|25|-|
now I need to insert the key 23 to the parent |21|30|50| witch causes another split.
|21|23|-| and |30|50|-|
but where does the pointer before 30 points to?
is it possible that both this pointer & the one after 23 point to |23|25|-|
?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
当插入23时:
基本上, insert 将使树的深度增加 1
When inserting 23:
Basically, that insert will increase the depth of the tree by 1
以下是应该如何处理指针。这是插入之前的 B+ 树。 (一些填充用于使指针更容易看到)
插入 23 后,您将有 2 个节点。重要的是要知道左分割应该始终是同一个实例,而右分割应该是一个新节点。这将使处理指针变得更容易。
所以分割应该是这样的。
由于左节点是同一实例,根处的键
21
具有正确的右指针。这是分裂根之前和分裂叶之后节点的样子。我们正在处理中。仅应将具有键
23
的新项目添加到根目录中21
旁边。由于 root 没有空间,它也会分裂。 中键是23(右分割的第一项)。因为根是分裂的,所以我们必须将我们的中键添加到左或右分裂中,并找到一个新的中键来提升。注意右分割处的空指针。因为该节点是新的,所以必须尽快初始化!
(根从右侧拆分,这意味着在项目数量奇数的情况下,右侧节点中的项目较少。但一般来说,采用哪种拆分方式并不重要。)
我们的中间键是 23。所以我们添加 23。
好!如果这里没有分裂的话,我们的插入现在就完成了。但既然这个层面上存在分裂,我们就必须找到新的中间键来推动。也不要忘记新鲜节点的空指针。
(在分裂叶子的情况下,您必须在叶子上保留键值并提升中间键的副本,但在分裂内部节点的情况下,您可以安全地移动并向上提升键。)
这里新的中键是 30。让我们从左侧节点弹出它。
(如何选择中间键很重要。始终是从底部分割中获取中间键的节点,应该为新的中间键删除一个项目。如果该节点是左节点,则从中删除最后一项,否则删除第一项。)
请注意,30 不在任何拆分内。这是我们必须处理的中间键。 始终将中间键的右侧节点赋予新鲜节点的左侧。
然后中间键的右侧指针将指向右侧拆分的新鲜节点。最后,中键被提升,创建具有左分割的左部和右分割的右部以及中键的键的新根。
恭喜!您已完成插入。它在纸上看起来可能很容易,但我不得不承认在编码时有点挑战。
有几种方法可以在内存中表示这些
key-node
项。我的方法是每个key-node
项都有带有键的右指针。因此每个内部节点都会有一个key-node
数组。这样,键和节点指针始终保持在一起。最左边的子节点由单独的左指针处理,该指针保留在每个内部节点上并且永远不会为空(完成操作后)。Here are how pointers should be handled. this is your B+ tree before insertion. (some padding used to make pointers easier to see)
After inserting 23 you will have 2 nodes. Its important to know that left split should be always same instance and right split should be a fresh node. this will make handling pointers somewhat easier.
So splits should look like this.
since left node is same instance, key
21
at root has the correct right pointer. Here is what nodes look like before splitting root and after splitting leaf. we are in middle of process.only a new item with key
23
should be added next to21
in root. since root has no space it will also split. middle key is 23 (first item of right split).Because root is split, we must add our middle key to left or right splits and find a new middle key to promote. notice the null pointer at right split. because that node is fresh, it must be initialized soon!
(root was split from right meaning that less items lay in right node in case of having odd amount of items. but in general it doesn't matter which way you split.)
our middle key was 23. so lets add 23.
Good! If there was no split here, our insertion would be finished by now. but since there is split at this level, we must find new middle key to promote. also don't forget null pointer for fresh node.
(In case of splitting leafs, you must keep key-values at leafs and promote copy of middle key, but in case of splitting internal nodes you can safely move and promote keys up.)
here new middle key is 30. lets pop it from left node.
(Its important how to choose middle key. Always a node that gets the middle key from bottom splits, should drop one item for a new middle key. if that node is left node, drop last item from it, otherwise drop first item.)
Notice that 30 is not inside any of splits. that's our middle key we have to handle. Always give right node of middle key to left of fresh node.
Then right pointer of middle key will point to fresh node on right split. finally middle key is promoted, new root with left of left split and right of right split and key of middle key is created.
Congrats! you have done your insertion. it may look easy on paper, but I have to admit its a bit of challenge when coding it.
There are couple of ways to represent these
key-node
items in memory. my approach is eachkey-node
item has right pointer with a key. so each internal node will have array ofkey-node
s. this way, keys and node pointers are always kept together. left most child is handled by a separate left pointer, this pointer is kept on each internal node and is never null (after complete operations).在B+树中,叶子节点和非叶子节点具有不同的结构。因此,它们的溢出机制也不同。对于叶节点的溢出,您所说的是正确的(这是在不删除子节点的父密钥的情况下完成的)。但是当非叶子节点溢出并分裂时,子节点不会保留父键(父键会从子节点中删除)。
In B+ tree, the leaf and non-leaf nodes have different structures. Due to this, their overflow mechanisms are also different. What you say is true for overflow of leaf nodes(It is done without erasing the parent key from the son). But when non-leaf nodes overflow and are split, the child node doesn't retain the parent key (parent key is erased from the son).
根据我的了解,最后一个节点的节点数小于 n/2 是可以的。因此,在您的示例中,顶部节点将变为:
我认为这是有道理的。如果你想一想,你有 5 个离开节点,所以你需要 5 个来自上层的指针。这种安排是您可以拥有 5 个指针的唯一方法,所有其他组合要么会溢出节点,要么会创建额外的指针。
如果节点是 |21|22|-|和|23|25|-|,则根节点将为|23|-|-|。那么,在右侧节点中包含 23 是没有意义的,因为右侧节点中的任何内容都必须等于或大于 23!
From what I've been taught, it is ok for the last node to have one node less than n/2. So in your example, the top nodes will become:
I think it kind of make sense. If you think about it, you have 5 leave nodes, so you need 5 pointers from the upper level. This arrangement is the only way such that you can have 5 pointers, all other combinations would either overflow the node or create extra pointers.
If the nodes were |21|22|-| and |23|25|-|, then the root node would be |23|-|-|. Then, it does not make sense to have 23 in the right node, because whatever that is in the right node must be equal or greater than 23 anyway!
您必须了解会发生什么问题是因为您使用的表示模型不好。您应该有一个与指针关联的数据值,并且该数据值是指针引用的子树中的最小值的不变量。
因此,这就是插入之前应如何表示 B 树节点的方式。
在此表示中,根节点中值 10 之后的指针指向第一个值为 10 的叶节点,依此类推。
当您插入 23 时,它将插入到第一个值为 21 的叶节点中。这会生成以下树和无需拆分。
当插入 24 产生您所描述的效果时,您得到的是以下内容,
如您所见,不再有任何歧义。叶子节点被分裂,关键指针对24|必须插入根节点。由于已经满了,所以必须分开。现在有 5 个值,您将获得 1 个具有 3 个值的节点和一个具有 2 个值的节点。您可以自由选择是左节点还是右节点获得这三个值。重要的是基本不变量被保留。节点中的每个键值都是其关联指针引用的子树中的最小元素。必须添加一个新的根节点,并且它的键值指针集现在应该是显而易见的。没有歧义。
这也表明许多优化策略都是可能的。我们可以将值 21 移动到未满的左叶节点中,而不是拆分根节点。第一个值为 10。这避免了分割根节点的需要,并保持 b 树高度不变,并产生更好的 b 树填充。这样您就可以最大限度地减少空间和搜索时间。但这意味着我们可以有效地检查横向平衡是否可行。仍然需要更改根节点中的键值。如
您所见,b 树中的值 10 在非叶节点中并不是真正需要的,并且在 b 树表示中经常被省略(即 wikipedia) 但这可能会让人感到困惑,这可能就是您所在的原因。 :)
The problem you have to understand what happens is because of the bad representation model you are using. You should have one data value associated with a pointer and the invariant that the data value is the smallest value in the subtree referenced by the pointer.
So this is how you should represent the b-tree node prior of insertion.
In this representation the pointer after the value 10 in the root node points to the leaf node with the first value 10, etc.
When you insert 23, it is inserted in the leaf node with first value of 21. This yields the following tree and no need to split.
When inserting 24 which produces the effect you described, what you get is the following
As you see there is no ambiguity anymore. The leaf node was split, and the key pointer pair 24| had to be inserted in the root node. Since it was full, it had to be split. Now that there are 5 values, you get one node with 3 values and one node with 2. You are free to choose if the left node or the right node get the three value. What is important is that the fundamental invariant is preserved. Each key value in a node is the smallest element in the subtree referenced by its associated pointer. A new root node had to be added and its key value pointer set should now be obvious. No ambiguity.
This also makes obvious that many optimization strategies are possible. Instead of splitting the root node, we could have moved the value 21 into the left leaf node which was not full. The one with the first value 10. This avoids the need to split the root node and keeps the b-tree hight unchanged and yields a better filling of the b-tree. So you minimize space and search time. But this implies one can efficiently check if sideway balancing is feasible. A change of key value in the root node is still needed. etc.
As you see, the value 10 in your b-tree is not really required in none leaf nodes and it is often omitted in b-tree representation (i.e. wikipedia) but this can make it confusing and was probably the reason you where. :)