关系代数连接优化

发布于 2024-10-16 14:49:35 字数 78 浏览 9 评论 0原文

我在学校教科书中看到,许多联接操作似乎从未优化联接右侧的表,而只优化左侧的表。
例如,要查找管理数据库部门的员工姓名,您可以执行以下操作:

I've seen in my textbook for school that many of the join operations never seemed to optimize the table on the right of the join, and only on the left.
For example, to find the name of the Employee managing the Database Department, you would do this:
????name( ????Mgr_ssn ( ????Dname = 'Database' ( Department ) ) ⨝Mgr_ssn = ssn Employee )

So I'm wondering if it would be equally correct to do something like:
????name( ????Mgr_ssn ( ????Dname = 'Database' ( Department ) ) ⨝Mgr_ssn = ssn ( ????ssn, name Employee ) )

This is assuming of course that Employee has many other attributes. In doing so I would think that the system would be saving time by not having to worry about joining all the other attributes of Employee when in the end they will be projected out anyway. I never have seen projection like this on the right side of a join before and am wondering if it is acceptable and/or unnecessary.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

绝不放开 2024-10-23 14:49:35

大多数优化器使用系统 R 优化器,它只考虑左深度连接。这就是为什么你永远不会在右侧看到连接。

所有选项的搜索空间都是指数级的,因此优化器希望快速找到合理可接受的解决方案(优化器不会找到最佳解决方案,而是试图避免最差的解决方案)。

PS 使用左深度连接的原因是它允许对结果进行流水线处理,而不需要将它们写入磁盘,从而节省 I/O。

Most optimizers use system R optimizer which considers only left deep joins. That is why you never see joins on the right.

The search space of all the options is exponential so optimizers want to find reasonably acceptable solutions quickly (optimizers don't find the best solution, they try to avoid the worst ones).

P.S. The reason for using left deep joins is that it allows pipelining the results without the need to write them to disk, which saves I/Os.

|煩躁 2024-10-23 14:49:35

任何像样的查询优化器都会降低适当的限制,有时还会进行预测,以最大限度地减少要处理的数据。而且由于优化器会自动执行此操作,并且结果是相同的,因此不需要特别优化关系代数中的表达式。

在像这样的两个表连接序列中,不清楚在连接部门之前形成投影是否有好处;可能的处理顺序是找到 Dname = 'Database' 的(可能是单个)部门,然后找到 Employee 中 E.SSN = D.Mgr_SSN 的单行。但是,如果多次使用子表达式,则可能值得这样做。

我还注意到这个设计很糟糕 - 你永远不应该使用像 SSN 这样敏感的东西作为数据库设计中的连接字段。 PCI 团队会发脾气!但也许这些名字是很久以前的温和时代的遗留物,但内容是生成的代理,真正的 SSN 存储在 Employee.RealSSN 中(甚至可能被加密以确保未经授权的人看不到它 - 尽管设置列上正确的权限以便只有授权的人才能选择它也是有效的)。

Any decent query optimizer will push down appropriate restrictions, and sometimes projections too, to minimize the data to be processed. And because the optimizer would do it automatically, and the result is identical, there's no particular need to optimize the expression in the relational algebra.

In a two-table join sequence like this, it is not clear that there would be a benefit to forming the projection before joining with department; the likely processing sequence is find the (probably single) department with Dname = 'Database' and then find the single row in Employee with the E.SSN = D.Mgr_SSN. However, if the sub-expression were used multiple times, it might well be worth doing.

I also note that the design is awful - you should never use anything as sensitive as an SSN as the joining field in a database design. The PCI team would have fits! But maybe the names are a hangover from gentler times now long past, but the content is a generated surrogate and the real SSN is stored in Employee.RealSSN (which might even be encrypted to ensure that the unauthorized do not see it - though setting the permissions correctly on the column so that only the authorized can select it is also effective).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文