给定 R 的直方图,关系 R 上的自连接操作的估计大小
查询优化器通常使用数据分布的摘要来估计查询处理期间生成的中间表的大小。一种流行的此类汇总方案是直方图,其中输入范围被划分为多个桶,并且维护落在每个桶中的元组数量的累积计数。为了估计的目的,假设桶内的分布是均匀的。
下面显示了具有域 [1..10]
的离散属性 a 上的关系 R
的这样一个直方图:
Bucket 1: range = [1..2] Cumulative tuple count = 6
Bucket 2: range = [3..8] Cumulative tuple count = 30
Bucket 3: range = [9..10] Cumulative tuple count = 10
自连接操作 R 的估计大小是多少x R
A) 46
B) 218
C) 248
D) 1,036
E) 5,672
解中给出的答案:B
答案如何计算?
Query optimizers typically use summaries of data distributions to estimate the sizes of the intermediate tables generated during query processing. One popular such summarization scheme is a histogram, whereby the input range is partitioned into buckets and a cumulative count is maintained of the number of tuples falling in each bucket. The distribution within a bucket is assumed to be uniform for the purposes of estimation.
The following shows one such histogram for a relation R
on a discrete attribute a with domain [1..10]
:
Bucket 1: range = [1..2] Cumulative tuple count = 6
Bucket 2: range = [3..8] Cumulative tuple count = 30
Bucket 3: range = [9..10] Cumulative tuple count = 10
What is the estimated size of the self-join operation R x R
A) 46
B) 218
C) 248
D) 1,036
E) 5,672
Answer given in solutions : B
How is the answer to be calculated?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
属性
R
上的自连接大小等于属性R
的每个值的频率之和。这里频率是在桶中给出的,例如第一个桶有2个值r,频率= 6,所以我们可以假设桶一中每个值的频率为频率= 3,类似地对于桶二每个值的频率= 30/6 = 5,对于桶三个每个值的频率= 10/2 = 5。
因此,大小为
The size of a self-join on attribute
R
is equal to the summation of the frequency of each value of attributeR
.Here the frequency is given in buckets, e.g. the first bucket has 2 values r with frequency = 6, so we can assume the frequency of each value in bucket one is frequency = 3, similarly for bucket two frequency of each = 30/6 = 5, and for bucket three frequency of each value = 10/2 = 5.
Therefore, the size is
我一直在尝试自己解决这个问题(来自 GRE 计算机科学科目考试准备考试)。
到目前为止我还没有找到为什么答案是218的答案,但我发现了给出的数字和正确答案之间的联系。
事实证明,累积元组计数的平方和除以每个桶中离散值的数量,得到 218。简单地说:
6²/2 + 30²/5 + 10²/2 = 218.
这不是答案,但至少有联系=)
I've been trying to figure this one out myself (it's from the GRE Computer Science subject test preparation exam).
So far I haven't found an answer as to why the answer is 218, but I have found a connection between the numbers given and the correct answer.
It turns out that that sum of the square of the cumulative tuple counts divided by the number of discrete values in each bucket, you get 218. Less abstractly:
6²/2 + 30²/5 + 10²/2 = 218
.It's not an answer, but at least there's a connection =)