数据库索引:示例

发布于 2024-10-21 07:44:01 字数 1227 浏览 2 评论 0原文

我很快就要考试了,我想知道如何解决这些关于索引的问题:

1.数据库由关系 R(A,B,C) 组成。 A 和 B 是整数 [0,10 000](各 4B),C 是 varchar(20) (20B)。关系 R 由 10^6 个元组组成。块大小为 2048 B。

A) 如果我们询问此查询,如果 B 上有 B+ 树索引,我们必须读取多少个块(最佳和最差):

SELECT C FROM R WHERE A=100 and B=10

B )索引 A 有意义吗?如果是,哪种类型的索引最好?

另一个类似的问题是:

2.数据库由关系R(A,B,C)组成。 A 和 B 是整数 [0,10 000],C 是 varchar(150)。关系 R 由 10^6 个元组组成。块大小是 2048 B,A、B 是键。

A) 如果我们询问查询“ SELECT C FROM R WHERE A=4711 并且:

  • 我们没有索引。
  • 我们在 A 和 B 上有一个 B+ 树索引, 那么在最好和最坏的情况下我们必须读取多少个块b )

分别为 B 和 A 建立索引而不是在 A 和 B 上建立一个索引是否有意义。

编辑

这是我所做的:

>问题1 A)

元组的大小 = 20+4+4=28 B

2048/28=73 个元组/块,向下舍入

10^6/73= 13 699 个块,用于整个关系,向上舍入

< em>索引读数: 4*n+4(n+1)≤2048 B == n=255

B+树第一层向下取整= 255<10^6 B+树没有

第二层= 255*256<10^6 B+树没有

第三层= 255*256*257>10^6 是的,10^6 元组可以放入高度为 3 的 B+ 树中。

数据读取: 如果我们假设 A=100 的概率为 1/10001,B=10 的概率相同,那么我们有:

1/10001*1/10001*10^6 四舍五入 = 1 个元组

在最坏和最好的情况下:1 个元组= 1 个块

那么我们有 3+1 个块读取,

是吗?

我不知道如何做B)..

而且我真的不知道如何回答问题2..请帮助我

I have an exam soon and I want to know how to solve these questions about indexing:

1.A database consist of a relation R(A,B,C). A and B are integers [0,10 000] (4B each) and C is a varchar(20) (20B). The relation R consist of 10^6 tuples. The blocksize is 2048 B.

A) How many blocks do we have to read (best and worst) if we ask this query if we have an B+-tree index on B:

SELECT C FROM R WHERE A=100 and B=10

B) Does it make sense to index A? If yes, what type of indeces are the best?

Another similar question is:

2.A database consist of a relation R(A,B,C). A and B are integers [0,10 000] and C is a varchar(150). The relation R consist of 10^6 tuples. The blocksize is 2048 B and A, B are the keys.

A) How many blocks do we have to read in best and worst case if we ask the query " SELECT C FROM R WHERE A=4711 and:

  • We don't have an index.
  • We have a B+-tree index on A and B.

b) Does it make sense to index B separetely and A separetely instead of having one index on A and B. What type of indeces are the best?

EDIT:

Here is what I have done:

Question 1
A)

A tuple is of size = 20+4+4=28 B

2048/28=73 tuples/block rounded down

10^6/73= 13 699 blocks for the whole relation, rounded up

Indexreadings:
4*n+4(n+1)<=2048 B => n=255 rounded down

first level of the B+tree= 255<10^6
No

second level of the B+tree= 255*256<10^6
No

third level of the B+tree= 255*256*257>10^6
Yes the 10^6 tuples can fit in a B+tree with height 3.

Datareadings:
If we assume that the A=100 has the probability 1/10001 and B=10 has the same probability then we have:

1/10001*1/10001*10^6 rounded up = 1 tuples

In worst and best case: 1 tuples= 1 block

Then we have 3+1 blockreadings

Is it right?

I don't know how to do the B)..

And I really don't know how to answer Question 2 .. Please help me

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

高速公鹿 2024-10-28 07:44:01

只是一些笔记。你的 1A 最佳情况似乎几乎是正确的(有点,但 se 添加:在底部),但我认为你的块大小计算是错误的。只有 B(的值)和所需磁盘块的指针/引用应存储在 b+ 索引树中。 A 和 C 都不应该在 b+ 树中。

而你最坏的情况确实是错误的。不要求B唯一,谈论最坏情况也不需要做概率。

想象一下如果所有元组中 B=10 会发生什么。然后您必须阅读所有这些内容才能找到 A=100 的值。 (请记住,您被要求提供最佳/最差情况,而不是平均情况)。

这个最坏情况的例子也是解决问题 1B 的提示。如果您有太多具有相同 B 值的值,以至于无法将它们定位在几个块中,则索引可能会很有用。 (你可以做精确的数学计算)。

2A好像没那么难。如果我们没有索引,那么在最好的情况下您需要读取 1 个块,在最坏的情况下需要读取所有块。 (这假设 A 是唯一的。这就是 A 作为键的含义,对吧?)

但我对最后一个问题有点困惑。如果 A 和 B 是 2 个不同的(外来???)键,那么对它们使用组合索引似乎是错误的。

ps:这是几年前我为大学做的,所以我可能是错的,但我希望我至少给了你一些思考:}

补充: -------------- ---

我关于计算单个节点中可以有多少条目的注释似乎确实不对。试试这个:(仍然是凭记忆,对于任何错误,我们深表歉意)。

b+ 树中的一个条目包含 2 个 B 值(最小值/最大值)以及包含最小值和最大值之间的值的块的磁盘块引用。因此每个条目为 12 个字节(假设 4 字节块引用),因此您可以在每个块中存储 2048/12=170 个条目。

Just some notes. Your 1A best case seems almost right(Kinda, but se added: at the bottom), but I think your block size calculation is wrong. Only (The value of) B and a pointer/reference to the needed disk block should be stored in the b+ index tree. Neither A nor C should be in the b+ tree at all.

And your worst case is really wrong. There is no requirement that B is unique, and there is no need to do probability when talking about worst case.

So imagine what happens if B=10 in all your tuples. Then you will have to read them all to find the values where A=100. (Remember you are asked for best/worst case not average).

This worst case example is also your hint to solve question 1B. An index might be useful if you have so many values with equal B values that they can't be located in a few blocks. (You can do the exact math).

2A don't seem so difficult. If we don't have an index then you need to read 1 block in the best case, and all the blocks in the worst case. (This assume that A is unique. This is what A being a key means right?)

But I am a bit confused about this last question. If A and B are 2 different (foreign???) keys, then having a combined index on them both seems wrong.

ps: It is a few years ago I did this for university, so I may be wrong, but I hope I at least gave you something to think about :}

Added: -----------------

My note about your calculations of how many entries you can have in a single node seems really off. Try this instead: (Still from memory, so sorry for any errors).

An entry in your b+ tree, contains 2 values of B (min/max) and a disk block reference for the block containing the values between min and max. So each entry is 12 bytes (Assuming 4 byte block references) and you can thus store 2048/12=170 entries in each block.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文