B 树上的最小/最大记录数?

发布于 2024-12-06 15:38:08 字数 356 浏览 1 评论 0原文

我正在寻找最好的& B+Tree 的最坏情况 (http://en.wikipedia.org/wiki/ B-tree#Best_case_and_worst_case_heights),但我不知道如何根据我所掌握的信息使用这个公式。 假设我有一个包含 1,000 条记录的树 B,B 可以拥有的最大(和最大)级别数是多少? 我可以在每一页上有尽可能多/小的按键。我也可以有尽可能多/少量的页面。 有什么想法吗? (如果你想知道,这不是一个家庭作业问题,但它肯定会帮助我理解硬件的一些东西。)

I was looking at the best & worst case scenarios for a B+Tree (http://en.wikipedia.org/wiki/B-tree#Best_case_and_worst_case_heights) but I don't know how to use this formula with the information I have.
Let's say I have a tree B with 1,000 records, what is the maximum (and maximum) number of levels B can have?
I can have as many/little keys on each page. I can also have as many/little number of pages.
Any ideas?
(In case you are wondering, this is not a homework question, but it will surely help me understand some stuff for hw.)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

思慕 2024-12-13 15:38:08

我没有方便的数学知识,但是......

基本上,树深度的主要因素是树中每个节点的“扇出”。

通常,在简单的 B 树中,扇出为 2,树中每个节点有 2 个节点作为子节点。

但对于 B+Tree,它们的扇形通常要大得多。

起作用的因素之一是磁盘上节点的大小。

例如,如果您有 4K 页面大小,并且有 4000 字节的可用空间(不包括与该节点相关的任何其他指针或其他元数据),并且假设指向树中任何其他节点的指针是4 字节整数。如果您的 B+Tree 实际上存储 4 字节整数,则组合大小(4 字节指针信息 + 4 字节键信息)= 8 字节。 4000 个可用字节 / 8 个字节 == 500 个可能的子级。

对于这个人为的案例,你的粉丝满分是 500 分。

因此,使用一页索引(即根节点)或高度为 1 的树,可以引用 500 条记录。添加另一个级别,您的分辨率为 500*500,因此对于 501 个 4K 页面,您可以引用 250,000 行。

显然,密钥大小越大,或者节点的页面大小越小,树的扇出能力就越低。如果每个节点中允许可变长度密钥,那么扇出很容易发生变化。

但希望您能明白这一切是如何运作的要点。

I don't have the math handy, but...

Basically, the primary factor to tree depth is the "fan out" of each node in the tree.

Normally, in a simply B-Tree, the fan out is 2, 2 nodes as children for each node in the tree.

But with a B+Tree, typically they have a fan out much larger.

One factor that comes in to play is the size of the node on disk.

For example, if you have a 4K page size, and, say, 4000 byte of free space (not including any other pointers or other meta data related to the node), and lets say that a pointer to any other node in the tree is a 4 byte integer. If your B+Tree is in fact storing 4 byte integers, then the combined size (4 bytes of pointer information + 4 bytes of key information) = 8 bytes. 4000 free bytes / 8 bytes == 500 possible children.

That give you a fan out of 500 for this contrived case.

So, with one page of index, i.e. the root node, or a height of 1 for the tree, you can reference 500 records. Add another level, and you're at 500*500, so for 501 4K pages, you can reference 250,000 rows.

Obviously, the large the key size, or the smaller the page size of your node, the lower the fan out that the tree is capable of. If you allow variable length keys in each node, then the fan out can easily vary.

But hopefully you can see the gist of how this all works.

埖埖迣鎅 2024-12-13 15:38:08

这取决于树的数量。您必须定义该值。如果你说每个节点可以有 4 个子节点,并且有 1000 条记录,那么高度为

最佳情况 log_4(1000) = 5

最坏情况 log_{4/2}(1000) = 10

数量为 m,记录是n。

It depends on the arity of the tree. You have to define this value. If you say that each node can have 4 children then and you have 1000 records, then the height is

Best case log_4(1000) = 5

Worst case log_{4/2}(1000) = 10

The arity is m and the number of records is n.

梦魇绽荼蘼 2024-12-13 15:38:08

最好和最坏的情况取决于否。每个节点可以拥有的子节点数。对于最好的情况,我们考虑这样的情况:每个节点具有最大数量的子节点(即 m 表示 m 叉树),每个节点具有 m-1 个键。所以,

第一层(或根)有 m-1 个条目
第二层有 m*(m-1) 个条目(因为根有 m 个子项,每个子项有 m-1 个键)
第三级有 m^2*(m-1) 个条目
....
第 H 层有 m^(h-1)*(m-1)

因此,如果 H 是树的高度,则条目总数等于 n=m^H-1
相当于 H=log_m(n+1)

因此,在您的情况下,如果您有 n=1000 条记录,每个节点有 m 个子节点(m 应该是奇数),那么最佳情况高度将等于 log_m(1000 +1)

同样,对于最坏的情况:

1 级(根)至少有 1 个条目(且至少有 2 个子条目)
第二级至少有 2*(d-1) 个条目(其中 d=ceil(m/2) 是每个内部节点(根除外)可以拥有的最小子节点数)
第三级有 2d*(d-1) 个条目
...
第 H 层有 2*d^(h-2)*(d-1) 个条目

因此,如果 H 是树的高度,则条目总数等于 n=2*d^H-1 ,相当于H=log_d((n+1)/2+1)

因此,在您的情况下,如果您有 n=1000 条记录,每个节点有 m 个子节点(m 应该是奇数),那么最坏情况的高度将等于log_d((1000+1)/2+1)

The best and worst case depends on the no. of children each node can have. For the best case, we consider the case, when each node has the maximum number of children (i.e. m for an m-ary tree) with each node having m-1 keys. So,

1st level(or root) has m-1 entries
2nd level has m*(m-1) entries (since the root has m children with m-1 keys each)
3rd level has m^2*(m-1) entries
....
Hth level has m^(h-1)*(m-1)

Thus, if H is the height of the tree, the total number of entries is equal to n=m^H-1
which is equivalent to H=log_m(n+1)

Hence, in your case, if you have n=1000 records with each node having m children (m should be odd), then the best case height will be equal to log_m(1000+1)

Similarly, for the worst case scenario:

Level 1(root) has at least 1 entry (and minimum 2 children)
2nd level has as least 2*(d-1) entries (where d=ceil(m/2) is the minimum number of children each internal node (except root) can have)
3rd level has 2d*(d-1) entries
...
Hth level has 2*d^(h-2)*(d-1) entries

Thus, if H is the height of the tree, the total number of entries is equal to n=2*d^H-1 which is equivalent to H=log_d((n+1)/2+1)

Hence, in your case, if you have n=1000 records with each node having m children (m should be odd), then the worst case height will be equal to log_d((1000+1)/2+1)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文