数据库索引:示例
我很快就要考试了,我想知道如何解决这些关于索引的问题:
1.数据库由关系 R(A,B,C) 组成。 A 和 B 是整数 [0,10 000](各 4B),C 是 varchar(20) (20B)。关系 R 由 10^6 个元组组成。块大小为 2048 B。
A) 如果我们询问此查询,如果 B 上有 B+ 树索引,我们必须读取多少个块(最佳和最差):
SELECT C FROM R WHERE A=100 and B=10
B )索引 A 有意义吗?如果是,哪种类型的索引最好?
另一个类似的问题是:
2.数据库由关系R(A,B,C)组成。 A 和 B 是整数 [0,10 000],C 是 varchar(150)。关系 R 由 10^6 个元组组成。块大小是 2048 B,A、B 是键。
A) 如果我们询问查询“ SELECT C FROM R WHERE A=4711 并且:
- 我们没有索引。
- 我们在 A 和 B 上有一个 B+ 树索引, 那么在最好和最坏的情况下我们必须读取多少个块b )
分别为 B 和 A 建立索引而不是在 A 和 B 上建立一个索引是否有意义。
编辑:
这是我所做的:
>问题1 A)
元组的大小 = 20+4+4=28 B
2048/28=73 个元组/块,向下舍入
10^6/73= 13 699 个块,用于整个关系,向上舍入
< em>索引读数: 4*n+4(n+1)≤2048 B == n=255
B+树第一层向下取整= 255<10^6 B+树没有
第二层= 255*256<10^6 B+树没有
第三层= 255*256*257>10^6 是的,10^6 元组可以放入高度为 3 的 B+ 树中。
数据读取: 如果我们假设 A=100 的概率为 1/10001,B=10 的概率相同,那么我们有:
1/10001*1/10001*10^6 四舍五入 = 1 个元组
在最坏和最好的情况下:1 个元组= 1 个块
那么我们有 3+1 个块读取,
是吗?
我不知道如何做B)..
而且我真的不知道如何回答问题2..请帮助我
I have an exam soon and I want to know how to solve these questions about indexing:
1.A database consist of a relation R(A,B,C). A and B are integers [0,10 000] (4B each) and C is a varchar(20) (20B). The relation R consist of 10^6 tuples. The blocksize is 2048 B.
A) How many blocks do we have to read (best and worst) if we ask this query if we have an B+-tree index on B:
SELECT C FROM R WHERE A=100 and B=10
B) Does it make sense to index A? If yes, what type of indeces are the best?
Another similar question is:
2.A database consist of a relation R(A,B,C). A and B are integers [0,10 000] and C is a varchar(150). The relation R consist of 10^6 tuples. The blocksize is 2048 B and A, B are the keys.
A) How many blocks do we have to read in best and worst case if we ask the query " SELECT C FROM R WHERE A=4711 and:
- We don't have an index.
- We have a B+-tree index on A and B.
b) Does it make sense to index B separetely and A separetely instead of having one index on A and B. What type of indeces are the best?
EDIT:
Here is what I have done:
Question 1
A)
A tuple is of size = 20+4+4=28 B
2048/28=73 tuples/block rounded down
10^6/73= 13 699 blocks for the whole relation, rounded up
Indexreadings:
4*n+4(n+1)<=2048 B => n=255 rounded down
first level of the B+tree= 255<10^6
No
second level of the B+tree= 255*256<10^6
No
third level of the B+tree= 255*256*257>10^6
Yes the 10^6 tuples can fit in a B+tree with height 3.
Datareadings:
If we assume that the A=100 has the probability 1/10001 and B=10 has the same probability then we have:
1/10001*1/10001*10^6 rounded up = 1 tuples
In worst and best case: 1 tuples= 1 block
Then we have 3+1 blockreadings
Is it right?
I don't know how to do the B)..
And I really don't know how to answer Question 2 .. Please help me
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
只是一些笔记。你的 1A 最佳情况似乎几乎是正确的(有点,但 se 添加:在底部),但我认为你的块大小计算是错误的。只有 B(的值)和所需磁盘块的指针/引用应存储在 b+ 索引树中。 A 和 C 都不应该在 b+ 树中。
而你最坏的情况确实是错误的。不要求B唯一,谈论最坏情况也不需要做概率。
想象一下如果所有元组中 B=10 会发生什么。然后您必须阅读所有这些内容才能找到 A=100 的值。 (请记住,您被要求提供最佳/最差情况,而不是平均情况)。
这个最坏情况的例子也是解决问题 1B 的提示。如果您有太多具有相同 B 值的值,以至于无法将它们定位在几个块中,则索引可能会很有用。 (你可以做精确的数学计算)。
2A好像没那么难。如果我们没有索引,那么在最好的情况下您需要读取 1 个块,在最坏的情况下需要读取所有块。 (这假设 A 是唯一的。这就是 A 作为键的含义,对吧?)
但我对最后一个问题有点困惑。如果 A 和 B 是 2 个不同的(外来???)键,那么对它们使用组合索引似乎是错误的。
ps:这是几年前我为大学做的,所以我可能是错的,但我希望我至少给了你一些思考:}
补充: -------------- ---
我关于计算单个节点中可以有多少条目的注释似乎确实不对。试试这个:(仍然是凭记忆,对于任何错误,我们深表歉意)。
b+ 树中的一个条目包含 2 个 B 值(最小值/最大值)以及包含最小值和最大值之间的值的块的磁盘块引用。因此每个条目为 12 个字节(假设 4 字节块引用),因此您可以在每个块中存储 2048/12=170 个条目。
Just some notes. Your 1A best case seems almost right(Kinda, but se added: at the bottom), but I think your block size calculation is wrong. Only (The value of) B and a pointer/reference to the needed disk block should be stored in the b+ index tree. Neither A nor C should be in the b+ tree at all.
And your worst case is really wrong. There is no requirement that B is unique, and there is no need to do probability when talking about worst case.
So imagine what happens if B=10 in all your tuples. Then you will have to read them all to find the values where A=100. (Remember you are asked for best/worst case not average).
This worst case example is also your hint to solve question 1B. An index might be useful if you have so many values with equal B values that they can't be located in a few blocks. (You can do the exact math).
2A don't seem so difficult. If we don't have an index then you need to read 1 block in the best case, and all the blocks in the worst case. (This assume that A is unique. This is what A being a key means right?)
But I am a bit confused about this last question. If A and B are 2 different (foreign???) keys, then having a combined index on them both seems wrong.
ps: It is a few years ago I did this for university, so I may be wrong, but I hope I at least gave you something to think about :}
Added: -----------------
My note about your calculations of how many entries you can have in a single node seems really off. Try this instead: (Still from memory, so sorry for any errors).
An entry in your b+ tree, contains 2 values of B (min/max) and a disk block reference for the block containing the values between min and max. So each entry is 12 bytes (Assuming 4 byte block references) and you can thus store 2048/12=170 entries in each block.