按程序将子查询转换为联接

发布于 2024-08-12 03:59:19 字数 132 浏览 13 评论 0原文

是否有通用的过程或算法用于将 SQL 子查询转换为联接,反之亦然?也就是说,是否存在一组印刷操作可以应用于包含子查询的语法正确的 SQL 查询语句,从而产生功能上等效的语句而无需子查询?如果是这样,它们是什么(即算法是什么),在什么情况下它们不适用?

Is there a generalized procedure or algorithm for transforming a SQL subquery into a join, or vice versa? That is, is there a set of typographic operations that can be applied to a syntactically correct SQL query statement containing a subquery that results in a functionally equivalent statement without a subquery? If so, what are they (i.e., what's the algorithm), and in what cases do they not apply?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

﹏雨一样淡蓝的深情 2024-08-19 03:59:19

将子查询转换为 JOIN 可以非常简单:

IN 子句

 FROM TABLE_X x
WHERE x.col IN (SELECT y.col FROM TABLE_Y y)

...可以转换为:

FROM TABLE_X x
JOIN TABLE_Y y ON y.col = x.col

您的 JOIN 标准是您进行直接比较的地方。

EXISTS 子句

但是,当您查看 EXISTS 子句时,会发现情况很复杂。 EXISTS 通常是相关的,其中子查询根据子查询外部表中的条件进行过滤。但 EXISTS 仅用于根据条件返回布尔值。

 FROM TABLE_X x
WHERE EXISTS (SELECT NULL
                FROM TABLE_Y y
               WHERE y.col = x.col)

...converted:

FROM TABLE_X x
JOIN TABLE_Y y ON y.col = x.col

由于布尔值,结果集中存在出现更多行的风险。

SELECT 子句中的 SELECT

这些应该总是被改变,但有偏见

SELECT x.*,
       (SELECT MAX(y.example_col)
          FROM TABLE_Y y
         WHERE y.col = x.col)
  FROM TABLE_X x

你现在可能注意到了一个模式,但我把它变成了内联视图示例略有不同:

SELECT x.*,
       z.mc
  FROM TABLE_X x
  JOIN (SELECT y.col, --inline view within the brackets
               MAX(y.example_col) 'mc'
          FROM TABLE_Y y
      GROUP BY y.col) z ON z.col = x.col

关键是确保内联视图结果集包含需要连接的列以及列。

LEFT JOINs

您可能已经注意到我没有任何 LEFT JOIN 示例 - 仅当子查询中的列使用 NULL 测试时才需要这样做(几乎所有的 COALESCE现在的 db、Oracle 的 NVLNVL2、MySQL 的 IFNULL、SQL Server 的 ISNULL 等...):

SELECT x.*,
       COALESCE((SELECT MAX(y.example_col)
          FROM TABLE_Y y
         WHERE y.col = x.col), 0)
  FROM TABLE_X x

转换:

   SELECT x.*,
          COALESCE(z.mc, 0)
     FROM TABLE_X x
LEFT JOIN (SELECT y.col,
                  MAX(y.example_col) 'mc'
             FROM TABLE_Y y
         GROUP BY y.col) z ON z.col = x.col

结论

我不确定这是否能满足您的排版需求,但希望我已经证明关键是确定 JOIN 标准是什么。一旦您知道了所涉及的列,您就知道了所涉及的表。

Converting a subquery into a JOIN can be pretty straightforward:

IN clause

 FROM TABLE_X x
WHERE x.col IN (SELECT y.col FROM TABLE_Y y)

...can be converted to:

FROM TABLE_X x
JOIN TABLE_Y y ON y.col = x.col

Your JOIN criteria is where you have direct comparison.

EXISTS clause

But there are complications when you look at the EXISTS clause. EXISTS are typically correllated, where the subquery is filtered by criteria from the table(s) outside the subquery. But the EXISTS is only for returning a boolean based on the criteria.

 FROM TABLE_X x
WHERE EXISTS (SELECT NULL
                FROM TABLE_Y y
               WHERE y.col = x.col)

...converted:

FROM TABLE_X x
JOIN TABLE_Y y ON y.col = x.col

Because of the boolean, there's a risk of more rows turning up in the resultset.

SELECTs in the SELECT clause

These should always be changed, with prejudice:

SELECT x.*,
       (SELECT MAX(y.example_col)
          FROM TABLE_Y y
         WHERE y.col = x.col)
  FROM TABLE_X x

You're probably noticing a patter now, but I made this a little different for an inline view example:

SELECT x.*,
       z.mc
  FROM TABLE_X x
  JOIN (SELECT y.col, --inline view within the brackets
               MAX(y.example_col) 'mc'
          FROM TABLE_Y y
      GROUP BY y.col) z ON z.col = x.col

The key is making sure the inline view resultset includes the column(s) needed to join to, along with the columns.

LEFT JOINs

You might've noticed I didn't have any LEFT JOIN examples - this would only be necessary if columns from the subquery use NULL testing (COALESCE on almost any db these days, Oracle's NVL or NVL2, MySQLs IFNULL, SQL Server's ISNULL, etc...):

SELECT x.*,
       COALESCE((SELECT MAX(y.example_col)
          FROM TABLE_Y y
         WHERE y.col = x.col), 0)
  FROM TABLE_X x

Converted:

   SELECT x.*,
          COALESCE(z.mc, 0)
     FROM TABLE_X x
LEFT JOIN (SELECT y.col,
                  MAX(y.example_col) 'mc'
             FROM TABLE_Y y
         GROUP BY y.col) z ON z.col = x.col

Conclusion

I'm not sure if that will satisfy your typographic needs, but hope I've demonstrated that the key is determining what the JOIN criteria is. Once you know the column(s) involved, you know the table(s) involved.

灵芸 2024-08-19 03:59:19

这个问题依赖于关系代数的基础知识。您需要问自己正在执行哪种类型的连接。例如,LEFT ANTI SEMI JOIN 就像 WHERE NOT EXISTS 子句。

有些连接不允许复制数据,有些不允许删除数据。其他允许使用额外的字段。我在我的博客中讨论了这个问题 http://msmvps.com/blogs/robfarley/archive/2008/11/09/join-simplification-in-sql-server.aspx

另外,请不要觉得您需要在 JOIN 中执行所有操作。查询优化器应该为您处理所有这些事情,并且您通常可以通过这种方式使查询变得更难维护。您可能会发现自己使用了广泛的 GROUP BY 子句,并且具有有趣的 WHERE .. IS NULL 过滤器,这些过滤器只会将业务逻辑与查询设计断开。

SELECT 子句中的子查询(本质上是查找)仅提供额外的字段,而不提供重复或消除。因此,您需要确保在 JOIN 中强制执行 GROUP BY 或 DISTINCT 值,并使用 OUTER JOIN 来保证行为相同。

WHERE 子句中的子查询永远不能重复数据,也不能为 SELECT 子句提供额外的列,因此您应该使用 GROUP BY / DISTINCT 来检查这一点。 WHERE EXISTS 类似。 (这是 LEFT SEMI JOIN)

WHERE NOT EXISTS(LEFT ANTI SEMI JOIN)不提供数据,并且不重复行,但可以消除...为此,您需要执行 LEFT JOIN 并查找 NULL。

但查询优化器应该为您处理所有这些。我实际上喜欢在 SELECT 子句中偶尔使用子查询,因为它非常清楚地表明我没有复制或消除行。 QO 可以帮我整理它,但如果我使用视图或内联表值函数,我想向那些追随我的人明确表示,QO 可以大大简化它。查看原始查询的执行计划,您将看到系统正在为您提供 INNER/OUTER/SEMI 连接。

您真正需要避免的事情(至少在 SQL Server 中)是使用 BEGIN 和 END 的函数(例如标量函数)。它们可能感觉简化了您的代码,但实际上它们将在单独的上下文中执行,因为系统将它们视为过程性的(不可简化的)。

我在最近的 SQLBits V 会议上就此类事情做了一次会议。它是录制的,所以你应该可以在某个时候观看它(如果你能忍受我的笑话!)

This question relies on a basic knowledge of Relational Algebra. You need to ask yourself what kind of join is being performed. For example, an LEFT ANTI SEMI JOIN is like a WHERE NOT EXISTS clause.

Some joins do not allow duplicating of data, some do not allow eliminating data. Others allow extra fields to be available. I discuss this in my blog at http://msmvps.com/blogs/robfarley/archive/2008/11/09/join-simplification-in-sql-server.aspx

Also, please don't feel you need to do everything in JOINs. The Query Optimizer should take care of all of this for you, and you can often make your queries much harder to maintain this way. You may find yourself using an extensive GROUP BY clause, and having interesting WHERE .. IS NULL filters, which will only serve to disconnect the business logic from the query design.

A subquery in the SELECT clause (essentially a lookup) only provides an extra field, not duplication or elimination. Therefore, you would need to make sure that you enforce GROUP BY or DISTINCT values in your JOIN, and use an OUTER JOIN to guarantee that behaviour is the same.

A subquery in the WHERE clause can never duplicate data, or provide extra columns to the SELECT clause, so you should use GROUP BY / DISTINCT to check this. WHERE EXISTS is similar. (This the LEFT SEMI JOIN)

WHERE NOT EXISTS (LEFT ANTI SEMI JOIN) doesn't provide data, and doesn't duplicate rows, but can eliminate... for this you need to do LEFT JOINs and look for NULLs.

But the Query Optimizer should handle all this for you. I actually like having occasional subqueries in the SELECT clause, because it makes it very clear that I am not duplicating or eliminating rows. The QO can tidy it for me, but if I'm using a view or inline table-valued function, I want to make it clear to those who come after me that the QO can simplify it down a lot. Have a look at the Execution Plans of your original query, and you'll see that the system is providing the INNER/OUTER/SEMI joins for you.

The thing you really need to be avoiding (at least in SQL Server) are functions that use BEGIN and END (such as Scalar Functions). They may feel like they simplify your code, but they will actually be executed in a separate context, as the system sees them as procedural (not simplifiable).

I did a session on this kind of thing at the recent SQLBits V conference. It was recorded, so you should be able to watch it at some point (if you can put up with my jokes!)

薄凉少年不暖心 2024-08-19 03:59:19

这通常是可能的,而且好处是查询优化器可以自动完成,因此您不必关心它。

It often is possible, and what's good is that the query optimizer can do it automatically, so you don't have to care about it.

渡你暖光 2024-08-19 03:59:19

处于非常高的水平。将子查询转换为 JOIN:

  1. FROM: 表名进入 FROM
    • JOIN WHERE子句中两边带有表名的部分决定了(a)JOIN的类型(b)JOIN的条件
    • WHEREwhere子句中两边没有表名的部分进入WHERE子句
    • SELECT 子查询中的列名称进入 SELECT

将 JOIN 转换为子查询需要与上述逻辑相反的逻辑

At a really high level. to transform a sub-query to a JOIN:

  1. FROM: Table Names go into FROM
    • JOIN The parts of the WHERE clause with table names on both sides determine (a) the type of JOIN (b) the condition of join
    • WHERE The parts of the where clause without table names on both sides go into the WHERE clause
    • SELECT Column names from Sub-Query go into the SELECT

Transforming a JOIN to Sub-Query entails the reverse of the above logic

梦旅人picnic 2024-08-19 03:59:19

至少在 SQL Server 中,优化器可以随意执行此操作,但我确信它执行此操作的时间是有限制的。我确信这可能是某人的博士论文能够在计算机上完成的。

当我以老式的人类方式执行此操作时,它相当简单 - 特别是如果子查询已经使用别名 - 可以首先将其拉入公共表表达式。

In SQL Server, at least, the optimizer can do this at will, but I'm sure that there are constraints on when it does it. I'm sure that it was probably someone's PhD thesis to be able to do it in the computer.

When I do it the old fashioned human way, it's fairly straightforward - particularly if the subquery is already aliased - it can be pulled into a Common Table Expression first.

谜泪 2024-08-19 03:59:19

这给出了强烈的“视情况而定”的评价。

在某种程度上,如果您谈论的是与 ANSI SQL 89 或 92* 兼容的查询,那么我猜这是肯定的。如果您有由“基本”select、from 和 where 子句组成的简单(甚至不那么简单)查询,那么是的,我认为在数学上可以定义创建和“取消创建”子查询的流程和过程(尽管我无法确定如何确定何时通过算法形成子查询)。我认为这个“基本原理”可以应用于外连接和相关子查询。

在另一个层面上,我会说“不可能”。大多数时候我编写子查询,是因为我想不出一种方法将其插入“主”查询中。这很少涉及相关子查询,但我非常确定,这通常涉及标准的专有扩展。您如何解释枢轴、反枢轴、排名函数、TOP N 子句(这很可能是 ANSI 标准,我承认我从未从头到尾阅读过它们)、FULL 或 OUTER APPLY 等?这只是 SQL Server 的一部分,我确信 Oracle、DB2、MYSQL 和大多数其他参与者都有自己的扩展,这些扩展打破了“纯粹”关系模型。

当然,他们说不可能证明是否定的。我总结为“除非另有证明,否则无法完成”,将证据留给学者和理论家,并指出即使如此,无论您购买什么系统都不会支持它,除非对制造商来说具有经济意义(有系统支持 OUTER UNION 吗?)

** 谷歌搜索未能产生对第三个 ANSI SQL 标准的任何引用。我知道我几年前就听说过这件事,它曾经发生过吗?*

This rates a strong "it depends".

At one level, if you're talking about queries compatible with ANSI SQL 89 or 92*, then I would guess it's a definite maybe. If you have simple (or even not so simple) queries consisting of "basic" select, from, and where clauses, then yes, I would like to think that it is mathematically possible to define processes and procedures to create and "uncreate" subqueries (though how you might determine when to algorithmically form a subquery is beyond me). I think this "rationale" could be applied to outer joins and correlated subqueries.

At another level, I'd say "no way". Most of the time I write a subquery, it's because I can't think of a way to wedge it into the "main" query. Very rarely this involves correlated subqueries, but more often than not in involves what are, I'm pretty darn sure, proprietary extensions to the standards. How could you account for pivots, unpivots, ranking functions, TOP N clauses (which may well be ANSI standards, I'll admit to never having read them cover to cover), FULL or OUTER APPLY, and the like? And that's just parts of SQL Server, I'm sure Oracle, DB2, MYSQL, and most every other player has their own extensions that break the "purist" relational model.

Of course, they say it is impossible to prove a negative. I'd summarize with "can't be done until proven otherwise", leave the proof to the academics and theoreticians, and point out that even then, whatever system you purchase won't support it unless it makes financial sense for the manufacturer to work it in. (Does any system support OUTER UNION yet?)

** A bit of googling failed to produce any references to a third ANSI SQL standard. I know I heard talk about it years ago, did it ever happen?*

花想c 2024-08-19 03:59:19

将查询从子查询转换为连接的全自动系统相对难以构建。您需要获取输入查询,将其解析为解析树,然后
在解析树上执行一些相当复杂的模式匹配 - 将树的部分替换为
解析树的新部分。最后,您遍历树以输出新查询。

可能会产生一些令人惊讶的好或坏的性能影响。有时子查询比联接快得多。有时情况恰恰相反。

A fully automatic system for transforming queries from sub-queries into joins would be relatively difficulty to build. You would need to take an input query, parse it into a parse tree and then
perform some fairly complex pattern matches on the parse tree - replacing sections of the tree with
new sections of the parse tree. At the end you do a traversal of the tree to output the new query.

There can be some amazingly good or bad performance repercussions. Sometimes a sub-query is much faster than a join. Sometimes it is the inverse.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文