连接与子查询

发布于 2024-08-27 20:55:43 字数 135 浏览 8 评论 0原文

我是一个老派的 MySQL 用户,并且总是更喜欢 JOIN 而不是子查询。但现在每个人都使用子查询,我讨厌它;我不知道为什么。

我缺乏理论知识来自行判断是否有任何差异。子查询是否与 JOIN 一样好,因此无需担心?

I am an old-school MySQL user and have always preferred JOIN over sub-query. But nowadays everyone uses sub-query, and I hate it; I don't know why.

I lack the theoretical knowledge to judge for myself if there is any difference. Is a sub-query as good as a JOIN and therefore is there nothing to worry about?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(20

像你 2024-09-03 20:55:44

仅当第二个连接表的数据明显多于主表时,才会看到差异。我有过如下的经历...

我们有一个包含 10 万个条目的用户表,以及大约 30 万个条目的会员数据(友谊)。这是一个 join 语句,目的是获取朋友及其数据,但延迟很大。但当成员资格表中只有少量数据时,它工作得很好。一旦我们将其更改为使用子查询,它就可以正常工作。

但与此同时,连接查询正在处理条目数少于主表的其他表。

所以我认为连接和子查询语句工作正常,这取决于数据和情况。

The difference is only seen when the second joining table has significantly more data than the primary table. I had an experience like below...

We had a users table of one hundred thousand entries and their membership data (friendship) about 3 hundred thousand entries. It was a join statement in order to take friends and their data, but with a great delay. But it was working fine where there was only a small amount of data in the membership table. Once we changed it to use a sub-query it worked fine.

But in the mean time the join queries are working with other tables that have fewer entries than the primary table.

So I think the join and sub query statements are working fine and it depends on the data and the situation.

浅唱々樱花落 2024-09-03 20:55:44

如今,许多数据库都可以优化子查询和连接。因此,您只需使用解释检查您的查询,看看哪个更快。如果性能没有太大差异,我更喜欢使用子查询,因为它们简单且易于理解。

These days, many dbs can optimize subqueries and joins. Thus, you just gotto examine your query using explain and see which one is faster. If there is not much difference in performance, I prefer to use subquery as they are simple and easier to understand.

悲念泪 2024-09-03 20:55:44

我不是关系数据库专家,所以对此持保留态度。

关于子查询与连接的总体思路是评估较大查询所采用的路径。

为了执行较大的查询,必须首先执行每个单独的子查询,然后将结果集存储为较大查询与之交互的临时表。

该临时表没有索引,因此任何比较都需要扫描整个结果集。

相反,当您使用联接时,所有索引都在使用中,因此比较需要遍历索引树(或哈希表),这在速度方面要便宜得多。

现在,我不知道最流行的关系引擎的较新版本是否执行反向求值,并仅在临时表中加载必要的元素,作为一种优化方法。

I am not a relational database expert, so take this with a grain of salt.

The general idea about sub queries vs joins is the path the evaluation of the larger query takes.

In order to perform the larger query, every individual subquery has to be executed first, and then the resultset is stored as a temporary table that the larger query interacts with.

This temporary table is unindexed, so, any comparison requires scanning the whole resultset.

In contrast, when you use a join, all indexes are in use and so, comparison require traversing index trees (or hash tables), which is way less expensive in terms of speed.

Now, what I don't know if newer versions of the most popular relational engines execute the evaluation on reverse, and just load the necessary elements in the temporary table, as an optimization method.

述情 2024-09-03 20:55:44

我只是在考虑同样的问题,但我在 FROM 部分使用子查询。
我需要从大表进行连接和查询,“从”表有2800万条记录,但结果只有128,这么小的结果大数据!我正在使用 MAX() 函数。

首先我使用 LEFT JOIN 因为我认为这是正确的方法,mysql 可以优化等。
第二次只是为了测试,我重写为针对 JOIN 进行子选择。

LEFT JOIN 运行时间:1.12s
SUB-SELECT 运行时间:0.06s

子选择比连接快 18 倍!就在 chokito adv 中。子选择看起来很糟糕,但结果......

I just thinking about the same problem, but I am using subquery in the FROM part.
I need connect and query from large tables, the "slave" table have 28 million record but the result is only 128 so small result big data! I am using MAX() function on it.

First I am using LEFT JOIN because I think that is the correct way, the mysql can optimalize etc.
Second time just for testing, I rewrite to sub-select against the JOIN.

LEFT JOIN runtime: 1.12s
SUB-SELECT runtime: 0.06s

18 times faster the subselect than the join! Just in the chokito adv. The subselect looks terrible but the result ...

暖阳 2024-09-03 20:55:44

这取决于几个因素,包括您正在运行的特定查询、数据库中的数据量。子查询首先运行内部查询,然后再次从结果集中过滤出实际结果。而在 join 中则一次性运行并产生结果。

最好的策略是您应该测试连接解决方​​案和子查询解决方案以获得优化的解决方案。

It depends on several factors, including the specific query you're running, the amount of data in your database. Subquery runs the internal queries first and then from the result set again filter out the actual results. Whereas in join runs the and produces the result in one go.

The best strategy is that you should test both the join solution and the subquery solution to get the optimized solution.

一花一树开 2024-09-03 20:55:44

如果你想使用 join 加速查询:

对于“inner join/join”,
不要使用 where 条件,而是在“ON”条件下使用它。
例如:

     select id,name from table1 a  
   join table2 b on a.name=b.name
   where id='123'

 Try,

    select id,name from table1 a  
   join table2 b on a.name=b.name and a.id='123'

对于“左/右连接”,
不要在“ON”条件下使用,因为如果您使用左/右连接,它将获取任何一个表的所有行。因此,在“On”条件下使用它是没有用的。所以,尝试使用“Where”条件

If you want to speed up your query using join:

For "inner join/join",
Don't use where condition instead use it in "ON" condition.
Eg:

     select id,name from table1 a  
   join table2 b on a.name=b.name
   where id='123'

 Try,

    select id,name from table1 a  
   join table2 b on a.name=b.name and a.id='123'

For "Left/Right Join",
Don't use in "ON" condition, Because if you use left/right join it will get all rows for any one table.So, No use of using it in "On". So, Try to use "Where" condition

您的好友蓝忘机已上羡 2024-09-03 20:55:43

子查询是解决“从 A 获取事实,以 B 的事实为条件”形式的问题的逻辑上正确的方法。在这种情况下,将 B 放在子查询中比进行联接更具逻辑意义。从实际意义上讲,它也更安全,因为您不必因为与 B 进行多场比赛而担心从 A 获取重复的事实。

然而,实际上,答案通常归结为性能。有些优化器在给定连接和子查询时会很糟糕,而有些优化器则相反,这是特定于优化器、特定于 DBMS 版本和特定于查询的。

从历史上看,显式联接通常会获胜,因此公认的观点是联接更好,但优化器一直在变得更好,因此我更喜欢首先以逻辑连贯的方式编写查询,然后在性能限制允许的情况下进行重组。

Sub-queries are the logically correct way to solve problems of the form, "Get facts from A, conditional on facts from B". In such instances, it makes more logical sense to stick B in a sub-query than to do a join. It is also safer, in a practical sense, since you don't have to be cautious about getting duplicated facts from A due to multiple matches against B.

Practically speaking, however, the answer usually comes down to performance. Some optimisers suck lemons when given a join vs a sub-query, and some suck lemons the other way, and this is optimiser-specific, DBMS-version-specific and query-specific.

Historically, explicit joins usually win, hence the established wisdom that joins are better, but optimisers are getting better all the time, and so I prefer to write queries first in a logically coherent way, and then restructure if performance constraints warrant this.

很糊涂小朋友 2024-09-03 20:55:43

在大多数情况下,JOIN 比子查询更快,而子查询更快的情况很少见。

在 JOIN 中,RDBMS 可以创建一个更适合您的查询的执行计划,并且可以预测应该加载哪些数据进行处理并节省时间,这与子查询不同,子查询将运行所有查询并加载所有数据进行处理。

子查询的好处是它们比 JOIN 更具可读性:这就是大多数 SQL 新人更喜欢子查询的原因;这是最简单的方法;但就性能而言,JOINS 在大多数情况下更好,尽管它们也不难阅读。

In most cases JOINs are faster than sub-queries and it is very rare for a sub-query to be faster.

In JOINs RDBMS can create an execution plan that is better for your query and can predict what data should be loaded to be processed and save time, unlike the sub-query where it will run all the queries and load all their data to do the processing.

The good thing in sub-queries is that they are more readable than JOINs: that's why most new SQL people prefer them; it is the easy way; but when it comes to performance, JOINS are better in most cases even though they are not hard to read too.

蓝礼 2024-09-03 20:55:43

摘自 MySQL 手册 (13.2. 10.11 将子查询重写为连接):

LEFT [OUTER] JOIN 可能比等效的子查询更快,因为服务器可能能够更好地优化它——这一事实不仅仅针对 MySQL 服务器。

因此子查询可能比 LEFT [OUTER] JOIN 慢,但在我看来,它们的优点是可读性稍高。

Taken from the MySQL manual (13.2.10.11 Rewriting Subqueries as Joins):

A LEFT [OUTER] JOIN can be faster than an equivalent subquery because the server might be able to optimize it better—a fact that is not specific to MySQL Server alone.

So subqueries can be slower than LEFT [OUTER] JOIN, but in my opinion their strength is slightly higher readability.

月隐月明月朦胧 2024-09-03 20:55:43

在 2010 年,我会加入这个问题的作者,并会强烈投票支持 JOIN,但是有了更多的经验(特别是在 MySQL 中),我可以说:是的,子查询可以更好。我在这里阅读了多个答案;一些声明的子查询速度更快,但缺乏很好的解释。我希望我能为一个人提供这个(非常)迟到的答案:

首先,让我说最重要的一点:子查询有不同形式

第二个重要的陈述:大小重要事项

如果您使用子查询,您应该了解数据库服务器如何执行子查询。特别是如果子查询计算一次或每行计算一次!
另一方面,现代数据库服务器能够进行很多优化。在某些情况下,子查询有助于优化查询,但较新版本的 DB-Server 可能会使优化变得过时。

选择字段中的子查询

SELECT moo, (SELECT roger FROM wilco WHERE moo = me) AS bar FROM foo

请注意,对于 foo 中的每个结果行都会执行子查询。
如果可能的话,避免这种情况;它可能会大大减慢您对大型数据集的查询速度。但是,如果子查询没有对 foo 的引用,则它可以由数据库服务器作为静态内容进行优化,并且只能评估一次。

Where 语句中的子查询

SELECT moo FROM foo WHERE bar = (SELECT roger FROM wilco WHERE moo = me)

如果幸运的话,数据库会在内部将其优化为 JOIN。如果没有,您的查询在大型数据集上将变得非常非常慢,因为它将对 foo 中的每一行执行子查询,而不仅仅是像 select-type 中的结果。

Join 语句中的子查询

SELECT moo, bar 
  FROM foo 
    LEFT JOIN (
      SELECT MIN(bar), me FROM wilco GROUP BY me
    ) ON moo = me

这很有趣。我们将 JOIN 与子查询结合起来。在这里我们得到了子查询的真正优势。想象一下 wilco 中有数百万行的数据集,但只有几个不同的 me。我们现在有一个较小的临时表可以连接,而不是连接一个巨大的表。根据数据库大小,这可能会导致查询速度更快。您可以使用 CREATE TEMPORARY TABLE ...INSERT INTO ... SELECT ... 获得相同的效果,这可能会为非常复杂的查询提供更好的可读性(但可以将数据集锁定在可重复读隔离级别)。

嵌套子查询

SELECT VARIANCE(moo)
  FROM (
    SELECT moo, CONCAT(roger, wilco) AS bar
      FROM foo
      HAVING bar LIKE 'SpaceQ%'
  ) AS temp_foo
  GROUP BY moo

您可以在多个级别嵌套子查询。如果您必须对结果进行分组或更改,这可以帮助处理庞大的数据集。通常数据库服务器会为此创建一个临时表,但有时您不需要对整个表进行某些操作,只需对结果集进行操作。根据表的大小,这可能会提供更好的性能。

结论

子查询不能替代 JOIN 并且您不应该像这样使用它们(尽管可能)。以我的拙见,子查询的正确使用是用作 CREATE TEMPORARY TABLE ... 的快速替换。良好的子查询以 JOINON 语句无法完成的方式减少数据集。如果子查询具有关键字 GROUP BY 或 DISTINCT 之一,并且最好不要位于 select 字段或 where 语句中,那么它可能会大大提高性能。

In the year 2010 I would have joined the author of this questions and would have strongly voted for JOIN, but with much more experience (especially in MySQL) I can state: Yes subqueries can be better. I've read multiple answers here; some stated subqueries are faster, but it lacked a good explanation. I hope I can provide one with this (very) late answer:

First of all, let me say the most important: There are different forms of sub-queries

And the second important statement: Size matters

If you use sub-queries, you should be aware of how the DB-Server executes the sub-query. Especially if the sub-query is evaluated once or for every row!
On the other side, a modern DB-Server is able to optimize a lot. In some cases a subquery helps optimizing a query, but a newer version of the DB-Server might make the optimization obsolete.

Sub-queries in Select-Fields

SELECT moo, (SELECT roger FROM wilco WHERE moo = me) AS bar FROM foo

Be aware that a sub-query is executed for every resulting row from foo.
Avoid this if possible; it may drastically slow down your query on huge datasets. However, if the sub-query has no reference to foo it can be optimized by the DB-server as static content and could be evaluated only once.

Sub-queries in the Where-statement

SELECT moo FROM foo WHERE bar = (SELECT roger FROM wilco WHERE moo = me)

If you are lucky, the DB optimizes this internally into a JOIN. If not, your query will become very, very slow on huge datasets because it will execute the sub-query for every row in foo, not just the results like in the select-type.

Sub-queries in the Join-statement

SELECT moo, bar 
  FROM foo 
    LEFT JOIN (
      SELECT MIN(bar), me FROM wilco GROUP BY me
    ) ON moo = me

This is interesting. We combine JOIN with a sub-query. And here we get the real strength of sub-queries. Imagine a dataset with millions of rows in wilco but only a few distinct me. Instead of joining against a huge table, we have now a smaller temporary table to join against. This can result in much faster queries depending on database size. You can have the same effect with CREATE TEMPORARY TABLE ... and INSERT INTO ... SELECT ..., which might provide better readability on very complex queries (but can lock datasets in a repeatable read isolation level).

Nested sub-queries

SELECT VARIANCE(moo)
  FROM (
    SELECT moo, CONCAT(roger, wilco) AS bar
      FROM foo
      HAVING bar LIKE 'SpaceQ%'
  ) AS temp_foo
  GROUP BY moo

You can nest sub-queries in multiple levels. This can help on huge datasets if you have to group or change the results. Usually the DB-Server creates a temporary table for this, but sometimes you do not need some operations on the whole table, only on the resultset. This might provide a much better performance depending on the size of the table.

Conclusion

Sub-queries are no replacement for a JOIN and you should not use them like this (although possible). In my humble opinion, the correct use of a sub-query is the use as a quick replacement of CREATE TEMPORARY TABLE .... A good sub-query reduces a dataset in a way you cannot accomplish in an ON statement of a JOIN. If a sub-query has one of the keywords GROUP BY or DISTINCT and is preferably not situated in the select fields or the where statement, then it might improve performance a lot.

驱逐舰岛风号 2024-09-03 20:55:43

使用 EXPLAIN 查看数据库如何对数据执行查询。这个答案中有一个巨大的“这取决于”...

当 PostgreSQL 认为其中一个比另一个更快时,它可以将子查询重写为联接或联接到子查询。这一切都取决于数据、索引、相关性、数据量、查询等。

Use EXPLAIN to see how your database executes the query on your data. There is a huge "it depends" in this answer...

PostgreSQL can rewrite a subquery to a join or a join to a subquery when it thinks one is faster than the other. It all depends on the data, indexes, correlation, amount of data, query, etc.

深者入戏 2024-09-03 20:55:43

首先,要比较两者,您应该将带有子查询的查询区分为:

  1. 始终具有使用连接编写的相应等效查询的子查询类
  2. 无法使用连接重写的子查询类

对于第一类

如今连 mysql 也这么做了。

尽管如此,有时并非如此,但这并不意味着连接总是会获胜——我在 mysql 中使用子查询提高了性能时遇到过这样的情况。 (例如,如果有一些因素阻止 mysql 规划器正确估计成本,并且如果规划器没有看到相同的连接变体和子查询变体,那么子查询可以通过强制特定路径来超越连接)。

结论是,如果您想确定哪一种性能更好,则应该测试连接查询和子查询变体的查询。

对于第二类,比较没有意义,因为这些查询不能使用联接重写,在这些情况下,子查询是完成所需任务的自然方式,您不应该歧视它们。

First of all, to compare the two first you should distinguish queries with subqueries to:

  1. a class of subqueries that always have corresponding equivalent query written with joins
  2. a class of subqueries that can not be rewritten using joins

For the first class of queries a good RDBMS will see joins and subqueries as equivalent and will produce same query plans.

These days even mysql does that.

Still, sometimes it does not, but this does not mean that joins will always win - I had cases when using subqueries in mysql improved performance. (For example if there is something preventing mysql planner to correctly estimate the cost and if the planner doesn't see the join-variant and subquery-variant as same then subqueries can outperform the joins by forcing a certain path).

Conclusion is that you should test your queries for both join and subquery variants if you want to be sure which one will perform better.

For the second class the comparison makes no sense as those queries can not be rewritten using joins and in these cases subqueries are natural way to do the required tasks and you should not discriminate against them.

白鸥掠海 2024-09-03 20:55:43

我认为引用的答案中没有充分强调的是重复问题以及特定(使用)案例可能产生的有问题的结果。

(尽管 Marcelo Cantos 确实提到过)

我将引用斯坦福大学 Lagunita SQL 课程中的示例。

学生表

+------+--------+------+--------+
| sID  | sName  | GPA  | sizeHS |
+------+--------+------+--------+
|  123 | Amy    |  3.9 |   1000 |
|  234 | Bob    |  3.6 |   1500 |
|  345 | Craig  |  3.5 |    500 |
|  456 | Doris  |  3.9 |   1000 |
|  567 | Edward |  2.9 |   2000 |
|  678 | Fay    |  3.8 |    200 |
|  789 | Gary   |  3.4 |    800 |
|  987 | Helen  |  3.7 |    800 |
|  876 | Irene  |  3.9 |    400 |
|  765 | Jay    |  2.9 |   1500 |
|  654 | Amy    |  3.9 |   1000 |
|  543 | Craig  |  3.4 |   2000 |
+------+--------+------+--------+

申请表

(向特定大学和专业提出的申请)

+------+----------+----------------+----------+
| sID  | cName    | major          | decision |
+------+----------+----------------+----------+
|  123 | Stanford | CS             | Y        |
|  123 | Stanford | EE             | N        |
|  123 | Berkeley | CS             | Y        |
|  123 | Cornell  | EE             | Y        |
|  234 | Berkeley | biology        | N        |
|  345 | MIT      | bioengineering | Y        |
|  345 | Cornell  | bioengineering | N        |
|  345 | Cornell  | CS             | Y        |
|  345 | Cornell  | EE             | N        |
|  678 | Stanford | history        | Y        |
|  987 | Stanford | CS             | Y        |
|  987 | Berkeley | CS             | Y        |
|  876 | Stanford | CS             | N        |
|  876 | MIT      | biology        | Y        |
|  876 | MIT      | marine biology | N        |
|  765 | Stanford | history        | Y        |
|  765 | Cornell  | history        | N        |
|  765 | Cornell  | psychology     | Y        |
|  543 | MIT      | CS             | N        |
+------+----------+----------------+----------+

让我们尝试查找已申请 CS 专业的学生的 GPA 分数(无论是哪所大学)

使用子查询:

select GPA from Student where sID in (select sID from Apply where major = 'CS');

+------+
| GPA  |
+------+
|  3.9 |
|  3.5 |
|  3.7 |
|  3.9 |
|  3.4 |
+------+

此结果集的平均值为:

select avg(GPA) from Student where sID in (select sID from Apply where major = 'CS');

+--------------------+
| avg(GPA)           |
+--------------------+
| 3.6800000000000006 |
+--------------------+

使用联接:

select GPA from Student, Apply where Student.sID = Apply.sID and Apply.major = 'CS';

+------+
| GPA  |
+------+
|  3.9 |
|  3.9 |
|  3.5 |
|  3.7 |
|  3.7 |
|  3.9 |
|  3.4 |
+------+

此结果集的平均值:

select avg(GPA) from Student, Apply where Student.sID = Apply.sID and Apply.major = 'CS';

+-------------------+
| avg(GPA)          |
+-------------------+
| 3.714285714285714 |
+-------------------+

很明显,第二次尝试会产生误导我们的用例中的结果是,它会计算平均值的重复项。
同样明显的是,将 distinct 与基于 join 的语句一起使用不会消除问题,因为它会错误地保留 出现次数的三分之一。 3.9 分。正确的情况是考虑 3.9 分数出现两 (2) 次,因为我们实际上有两 (2) 名学生具有该分数符合我们的查询条件。

在某些情况下,除了任何性能问题之外,子查询似乎是最安全的方法。

I think what has been under-emphasized in the cited answers is the issue of duplicates and problematic results that may arise from specific (use) cases.

(although Marcelo Cantos does mention it)

I will cite the example from Stanford's Lagunita courses on SQL.

Student Table

+------+--------+------+--------+
| sID  | sName  | GPA  | sizeHS |
+------+--------+------+--------+
|  123 | Amy    |  3.9 |   1000 |
|  234 | Bob    |  3.6 |   1500 |
|  345 | Craig  |  3.5 |    500 |
|  456 | Doris  |  3.9 |   1000 |
|  567 | Edward |  2.9 |   2000 |
|  678 | Fay    |  3.8 |    200 |
|  789 | Gary   |  3.4 |    800 |
|  987 | Helen  |  3.7 |    800 |
|  876 | Irene  |  3.9 |    400 |
|  765 | Jay    |  2.9 |   1500 |
|  654 | Amy    |  3.9 |   1000 |
|  543 | Craig  |  3.4 |   2000 |
+------+--------+------+--------+

Apply Table

(applications made to specific universities and majors)

+------+----------+----------------+----------+
| sID  | cName    | major          | decision |
+------+----------+----------------+----------+
|  123 | Stanford | CS             | Y        |
|  123 | Stanford | EE             | N        |
|  123 | Berkeley | CS             | Y        |
|  123 | Cornell  | EE             | Y        |
|  234 | Berkeley | biology        | N        |
|  345 | MIT      | bioengineering | Y        |
|  345 | Cornell  | bioengineering | N        |
|  345 | Cornell  | CS             | Y        |
|  345 | Cornell  | EE             | N        |
|  678 | Stanford | history        | Y        |
|  987 | Stanford | CS             | Y        |
|  987 | Berkeley | CS             | Y        |
|  876 | Stanford | CS             | N        |
|  876 | MIT      | biology        | Y        |
|  876 | MIT      | marine biology | N        |
|  765 | Stanford | history        | Y        |
|  765 | Cornell  | history        | N        |
|  765 | Cornell  | psychology     | Y        |
|  543 | MIT      | CS             | N        |
+------+----------+----------------+----------+

Let's try to find the GPA scores for students that have applied to CS major (regardless of the university)

Using a subquery:

select GPA from Student where sID in (select sID from Apply where major = 'CS');

+------+
| GPA  |
+------+
|  3.9 |
|  3.5 |
|  3.7 |
|  3.9 |
|  3.4 |
+------+

The average value for this resultset is:

select avg(GPA) from Student where sID in (select sID from Apply where major = 'CS');

+--------------------+
| avg(GPA)           |
+--------------------+
| 3.6800000000000006 |
+--------------------+

Using a join:

select GPA from Student, Apply where Student.sID = Apply.sID and Apply.major = 'CS';

+------+
| GPA  |
+------+
|  3.9 |
|  3.9 |
|  3.5 |
|  3.7 |
|  3.7 |
|  3.9 |
|  3.4 |
+------+

average value for this resultset:

select avg(GPA) from Student, Apply where Student.sID = Apply.sID and Apply.major = 'CS';

+-------------------+
| avg(GPA)          |
+-------------------+
| 3.714285714285714 |
+-------------------+

It is obvious that the second attempt yields misleading results in our use case, given that it counts duplicates for the computation of the average value.
It is also evident that usage of distinct with the join - based statement will not eliminate the problem, given that it will erroneously keep one out of three occurrences of the 3.9 score. The correct case is to account for TWO (2) occurrences of the 3.9 score given that we actually have TWO (2) students with that score that comply with our query criteria.

It seems that in some cases a sub-query is the safest way to go, besides any performance issues.

浊酒尽余欢 2024-09-03 20:55:43

MSDN SQL Server 文档显示

许多包含子查询的 Transact-SQL 语句也可以表示为联接。其他问题只能通过子查询提出。在 Transact-SQL 中,包含子查询的语句与不包含子查询的语义等效版本之间通常没有性能差异。然而,在某些必须检查存在性的情况下,联接会产生更好的性能。否则,必须针对外部查询的每个结果处理嵌套查询,以确保消除重复项。在这种情况下,联接方法会产生更好的结果。

因此,如果您需要类似的东西,

select * from t1 where exists select * from t2 where t2.parent=t1.id

请尝试使用 join 代替。在其他情况下,这没有什么区别。

我说:为子查询创建函数可以消除混乱的问题,并允许您为子查询实现额外的逻辑。因此,我建议尽可能为子查询创建函数。

代码混乱是一个大问题,业界几十年来一直在努力避免它。

MSDN Documentation for SQL Server says

Many Transact-SQL statements that include subqueries can be alternatively formulated as joins. Other questions can be posed only with subqueries. In Transact-SQL, there is usually no performance difference between a statement that includes a subquery and a semantically equivalent version that does not. However, in some cases where existence must be checked, a join yields better performance. Otherwise, the nested query must be processed for each result of the outer query to ensure elimination of duplicates. In such cases, a join approach would yield better results.

so if you need something like

select * from t1 where exists select * from t2 where t2.parent=t1.id

try to use join instead. In other cases, it makes no difference.

I say: Creating functions for subqueries eliminate the problem of cluttter and allows you to implement additional logic to subqueries. So I recommend creating functions for subqueries whenever possible.

Clutter in code is a big problem and the industry has been working on avoiding it for decades.

久光 2024-09-03 20:55:43

根据我对两个案例的观察,如果一个表的记录少于 100,000 条,那么连接将会快速运行。

但如果一个表的记录超过 100,000 条,那么子查询是最好的结果。

我有一个表,其中包含我在下面创建的查询中的 500,000 条记录,其结果时间如下

SELECT * 
FROM crv.workorder_details wd 
inner join  crv.workorder wr on wr.workorder_id = wd.workorder_id;

结果:1​​3.3 秒

select * 
from crv.workorder_details 
where workorder_id in (select workorder_id from crv.workorder)

结果:1​​.65 秒

As per my observation like two cases, if a table has less then 100,000 records then the join will work fast.

But in the case that a table has more than 100,000 records then a subquery is best result.

I have one table that has 500,000 records on that I created below query and its result time is like

SELECT * 
FROM crv.workorder_details wd 
inner join  crv.workorder wr on wr.workorder_id = wd.workorder_id;

Result : 13.3 Seconds

select * 
from crv.workorder_details 
where workorder_id in (select workorder_id from crv.workorder)

Result : 1.65 Seconds

岛歌少女 2024-09-03 20:55:43
  • 一般规则是,在大多数情况下 (99%),连接 速度更快。
  • 表的数据越多,子查询速度就越慢。
  • 表的数据越少,子查询的速度与联接相当。
  • 子查询更简单、更容易理解、更容易阅读。
  • 大多数网络和应用程序框架及其“ORM”和“活动记录”都会使用子查询生成查询,因为使用子查询更容易划分责任、维护代码、 使用
  • 对于较小的网站或应用程序,子查询是可以的,但对于较大的网站和应用程序,您通常必须重写生成的查询来加入查询,特别是如果查询 查询中有许多子查询

有人说“某些 RDBMS 可以将子查询重写为连接或将连接重写为子查询,当它认为一个比另一个快。”,但是这一说法适用于简单的情况,当然不适用于带有子查询的复杂查询,这实际上会导致性能问题。

  • A general rule is that joins are faster in most cases (99%).
  • The more data tables have, the subqueries are slower.
  • The less data tables have, the subqueries have equivalent speed as joins.
  • The subqueries are simpler, easier to understand, and easier to read.
  • Most of the web and app frameworks and their "ORM"s and "Active record"s generate queries with subqueries, because with subqueries are easier to split responsibility, maintain code, etc.
  • For smaller web sites or apps subqueries are OK, but for larger web sites and apps you will often have to rewrite generated queries to join queries, especial if a query uses many subqueries in the query.

Some people say "some RDBMS can rewrite a subquery to a join or a join to a subquery when it thinks one is faster than the other.", but this statement applies to simple cases, surely not for complicated queries with subqueries which actually cause a problems in performance.

猥︴琐丶欲为 2024-09-03 20:55:43

从旧的 Mambo CMS 在非常大的数据库上运行:

SELECT id, alias
FROM
  mos_categories
WHERE
  id IN (
    SELECT
      DISTINCT catid
    FROM mos_content
  );

0 秒

SELECT
  DISTINCT mos_content.catid,
  mos_categories.alias
FROM
  mos_content, mos_categories
WHERE
  mos_content.catid = mos_categories.id;

~3 秒

EXPLAIN 显示它们检查完全相同的行数,但一次需要 3 秒,一次几乎是即时的。这个故事的寓意是什么?如果性能很重要(什么时候不是?),请尝试多种方法,看看哪一种最快。

并且...

SELECT
  DISTINCT mos_categories.id,
  mos_categories.alias
FROM
  mos_content, mos_categories
WHERE
  mos_content.catid = mos_categories.id;

0 秒

再次,相同的结果,相同的检查行数。我的猜测是,DISTINCT mos_content.catid 比 DISTINCT mos_categories.id 花费的时间要长得多。

Run on a very large database from an old Mambo CMS:

SELECT id, alias
FROM
  mos_categories
WHERE
  id IN (
    SELECT
      DISTINCT catid
    FROM mos_content
  );

0 seconds

SELECT
  DISTINCT mos_content.catid,
  mos_categories.alias
FROM
  mos_content, mos_categories
WHERE
  mos_content.catid = mos_categories.id;

~3 seconds

An EXPLAIN shows that they examine the exact same number of rows, but one takes 3 seconds and one is near instant. Moral of the story? If performance is important (when isn't it?), try it multiple ways and see which one is fastest.

And...

SELECT
  DISTINCT mos_categories.id,
  mos_categories.alias
FROM
  mos_content, mos_categories
WHERE
  mos_content.catid = mos_categories.id;

0 seconds

Again, same results, same number of rows examined. My guess is that DISTINCT mos_content.catid takes far longer to figure out than DISTINCT mos_categories.id does.

2024-09-03 20:55:43

子查询通常用于返回单行作为原子值,但它们也可用于使用 IN 关键字将值与多行进行比较。它们几乎可以出现在 SQL 语句中任何有意义的位置,包括目标列表、WHERE 子句等。可以使用简单的子查询作为搜索条件。例如,在一对表之间:

SELECT title 
FROM books 
WHERE author_id = (
    SELECT id 
    FROM authors 
    WHERE last_name = 'Bar' AND first_name = 'Foo'
);

请注意,对子查询的结果使用普通值运算符要求只能返回一个字段。如果您有兴趣检查一组其他值中是否存在单个值,请使用 IN:

SELECT title 
FROM books 
WHERE author_id IN (
    SELECT id FROM authors WHERE last_name ~ '^[A-E]'
);

这显然与 LEFT-JOIN 不同,在 LEFT-JOIN 中您只想连接表 A 和 B 中的内容,即使连接-condition 在表 B 等中找不到任何匹配的记录。

如果您只是担心速度,则必须检查数据库并编写一个好的查询,看看性能是否有任何显着差异。

Subqueries are generally used to return a single row as an atomic value, though they may be used to compare values against multiple rows with the IN keyword. They are allowed at nearly any meaningful point in a SQL statement, including the target list, the WHERE clause, and so on. A simple sub-query could be used as a search condition. For example, between a pair of tables:

SELECT title 
FROM books 
WHERE author_id = (
    SELECT id 
    FROM authors 
    WHERE last_name = 'Bar' AND first_name = 'Foo'
);

Note that using a normal value operator on the results of a sub-query requires that only one field must be returned. If you're interested in checking for the existence of a single value within a set of other values, use IN:

SELECT title 
FROM books 
WHERE author_id IN (
    SELECT id FROM authors WHERE last_name ~ '^[A-E]'
);

This is obviously different from say a LEFT-JOIN where you just want to join stuff from table A and B even if the join-condition doesn't find any matching record in table B, etc.

If you're just worried about speed you'll have to check with your database and write a good query and see if there's any significant difference in performance.

萌酱 2024-09-03 20:55:43

MySQL版本:5.5.28-0ubuntu0.12.04.2-log

我也觉得JOIN总是比MySQL中的子查询好,但是EXPLAIN是更好的判断方式。下面是一个子查询比 JOIN 效果更好的示例。

这是我的带有 3 个子查询的查询:

EXPLAIN SELECT vrl.list_id,vrl.ontology_id,vrl.position,l.name AS list_name, vrlih.position AS previous_position, vrl.moved_date 
FROM `vote-ranked-listory` vrl 
INNER JOIN lists l ON l.list_id = vrl.list_id 
INNER JOIN `vote-ranked-list-item-history` vrlih ON vrl.list_id = vrlih.list_id AND vrl.ontology_id=vrlih.ontology_id AND vrlih.type='PREVIOUS_POSITION' 
INNER JOIN list_burial_state lbs ON lbs.list_id = vrl.list_id AND lbs.burial_score < 0.5 
WHERE vrl.position <= 15 AND l.status='ACTIVE' AND l.is_public=1 AND vrl.ontology_id < 1000000000 
 AND (SELECT list_id FROM list_tag WHERE list_id=l.list_id AND tag_id=43) IS NULL 
 AND (SELECT list_id FROM list_tag WHERE list_id=l.list_id AND tag_id=55) IS NULL 
 AND (SELECT list_id FROM list_tag WHERE list_id=l.list_id AND tag_id=246403) IS NOT NULL 
ORDER BY vrl.moved_date DESC LIMIT 200;

EXPLAIN 显示:

+----+--------------------+----------+--------+-----------------------------------------------------+--------------+---------+-------------------------------------------------+------+--------------------------+
| id | select_type        | table    | type   | possible_keys                                       | key          | key_len | ref                                             | rows | Extra                    |
+----+--------------------+----------+--------+-----------------------------------------------------+--------------+---------+-------------------------------------------------+------+--------------------------+
|  1 | PRIMARY            | vrl      | index  | PRIMARY                                             | moved_date   | 8       | NULL                                            |  200 | Using where              |
|  1 | PRIMARY            | l        | eq_ref | PRIMARY,status,ispublic,idx_lookup,is_public_status | PRIMARY      | 4       | ranker.vrl.list_id                              |    1 | Using where              |
|  1 | PRIMARY            | vrlih    | eq_ref | PRIMARY                                             | PRIMARY      | 9       | ranker.vrl.list_id,ranker.vrl.ontology_id,const |    1 | Using where              |
|  1 | PRIMARY            | lbs      | eq_ref | PRIMARY,idx_list_burial_state,burial_score          | PRIMARY      | 4       | ranker.vrl.list_id                              |    1 | Using where              |
|  4 | DEPENDENT SUBQUERY | list_tag | ref    | list_tag_key,list_id,tag_id                         | list_tag_key | 9       | ranker.l.list_id,const                          |    1 | Using where; Using index |
|  3 | DEPENDENT SUBQUERY | list_tag | ref    | list_tag_key,list_id,tag_id                         | list_tag_key | 9       | ranker.l.list_id,const                          |    1 | Using where; Using index |
|  2 | DEPENDENT SUBQUERY | list_tag | ref    | list_tag_key,list_id,tag_id                         | list_tag_key | 9       | ranker.l.list_id,const                          |    1 | Using where; Using index |
+----+--------------------+----------+--------+-----------------------------------------------------+--------------+---------+-------------------------------------------------+------+--------------------------+

使用 JOIN 的相同查询是:

EXPLAIN SELECT vrl.list_id,vrl.ontology_id,vrl.position,l.name AS list_name, vrlih.position AS previous_position, vrl.moved_date 
FROM `vote-ranked-listory` vrl 
INNER JOIN lists l ON l.list_id = vrl.list_id 
INNER JOIN `vote-ranked-list-item-history` vrlih ON vrl.list_id = vrlih.list_id AND vrl.ontology_id=vrlih.ontology_id AND vrlih.type='PREVIOUS_POSITION' 
INNER JOIN list_burial_state lbs ON lbs.list_id = vrl.list_id AND lbs.burial_score < 0.5 
LEFT JOIN list_tag lt1 ON lt1.list_id = vrl.list_id AND lt1.tag_id = 43 
LEFT JOIN list_tag lt2 ON lt2.list_id = vrl.list_id AND lt2.tag_id = 55 
INNER JOIN list_tag lt3 ON lt3.list_id = vrl.list_id AND lt3.tag_id = 246403 
WHERE vrl.position <= 15 AND l.status='ACTIVE' AND l.is_public=1 AND vrl.ontology_id < 1000000000 
AND lt1.list_id IS NULL AND lt2.tag_id IS NULL 
ORDER BY vrl.moved_date DESC LIMIT 200;

输出是:

+----+-------------+-------+--------+-----------------------------------------------------+--------------+---------+---------------------------------------------+------+----------------------------------------------+
| id | select_type | table | type   | possible_keys                                       | key          | key_len | ref                                         | rows | Extra                                        |
+----+-------------+-------+--------+-----------------------------------------------------+--------------+---------+---------------------------------------------+------+----------------------------------------------+
|  1 | SIMPLE      | lt3   | ref    | list_tag_key,list_id,tag_id                         | tag_id       | 5       | const                                       | 2386 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | l     | eq_ref | PRIMARY,status,ispublic,idx_lookup,is_public_status | PRIMARY      | 4       | ranker.lt3.list_id                          |    1 | Using where                                  |
|  1 | SIMPLE      | vrlih | ref    | PRIMARY                                             | PRIMARY      | 4       | ranker.lt3.list_id                          |  103 | Using where                                  |
|  1 | SIMPLE      | vrl   | ref    | PRIMARY                                             | PRIMARY      | 8       | ranker.lt3.list_id,ranker.vrlih.ontology_id |   65 | Using where                                  |
|  1 | SIMPLE      | lt1   | ref    | list_tag_key,list_id,tag_id                         | list_tag_key | 9       | ranker.lt3.list_id,const                    |    1 | Using where; Using index; Not exists         |
|  1 | SIMPLE      | lbs   | eq_ref | PRIMARY,idx_list_burial_state,burial_score          | PRIMARY      | 4       | ranker.vrl.list_id                          |    1 | Using where                                  |
|  1 | SIMPLE      | lt2   | ref    | list_tag_key,list_id,tag_id                         | list_tag_key | 9       | ranker.lt3.list_id,const                    |    1 | Using where; Using index                     |
+----+-------------+-------+--------+-----------------------------------------------------+--------------+---------+---------------------------------------------+------+----------------------------------------------+

rows 列的比较可以看出差异,并且使用 JOIN 的查询正在使用 使用临时;使用文件排序

当然,当我运行这两个查询时,第一个查询在 0.02 秒内完成,第二个查询甚至在 1 分钟后也没有完成,所以 EXPLAIN 正确解释了这些查询。

如果我在 list_tag 表上没有 INNER JOIN,即如果我

AND (SELECT list_id FROM list_tag WHERE list_id=l.list_id AND tag_id=246403) IS NOT NULL  

从第一个查询中删除并相应地:

INNER JOIN list_tag lt3 ON lt3.list_id = vrl.list_id AND lt3.tag_id = 246403

从第二个查询中删除,则 EXPLAIN 为两个查询以及这两个查询返回相同的行数跑得同样快。

MySQL version: 5.5.28-0ubuntu0.12.04.2-log

I was also under the impression that JOIN is always better than a sub-query in MySQL, but EXPLAIN is a better way to make a judgment. Here is an example where sub queries work better than JOINs.

Here is my query with 3 sub-queries:

EXPLAIN SELECT vrl.list_id,vrl.ontology_id,vrl.position,l.name AS list_name, vrlih.position AS previous_position, vrl.moved_date 
FROM `vote-ranked-listory` vrl 
INNER JOIN lists l ON l.list_id = vrl.list_id 
INNER JOIN `vote-ranked-list-item-history` vrlih ON vrl.list_id = vrlih.list_id AND vrl.ontology_id=vrlih.ontology_id AND vrlih.type='PREVIOUS_POSITION' 
INNER JOIN list_burial_state lbs ON lbs.list_id = vrl.list_id AND lbs.burial_score < 0.5 
WHERE vrl.position <= 15 AND l.status='ACTIVE' AND l.is_public=1 AND vrl.ontology_id < 1000000000 
 AND (SELECT list_id FROM list_tag WHERE list_id=l.list_id AND tag_id=43) IS NULL 
 AND (SELECT list_id FROM list_tag WHERE list_id=l.list_id AND tag_id=55) IS NULL 
 AND (SELECT list_id FROM list_tag WHERE list_id=l.list_id AND tag_id=246403) IS NOT NULL 
ORDER BY vrl.moved_date DESC LIMIT 200;

EXPLAIN shows:

+----+--------------------+----------+--------+-----------------------------------------------------+--------------+---------+-------------------------------------------------+------+--------------------------+
| id | select_type        | table    | type   | possible_keys                                       | key          | key_len | ref                                             | rows | Extra                    |
+----+--------------------+----------+--------+-----------------------------------------------------+--------------+---------+-------------------------------------------------+------+--------------------------+
|  1 | PRIMARY            | vrl      | index  | PRIMARY                                             | moved_date   | 8       | NULL                                            |  200 | Using where              |
|  1 | PRIMARY            | l        | eq_ref | PRIMARY,status,ispublic,idx_lookup,is_public_status | PRIMARY      | 4       | ranker.vrl.list_id                              |    1 | Using where              |
|  1 | PRIMARY            | vrlih    | eq_ref | PRIMARY                                             | PRIMARY      | 9       | ranker.vrl.list_id,ranker.vrl.ontology_id,const |    1 | Using where              |
|  1 | PRIMARY            | lbs      | eq_ref | PRIMARY,idx_list_burial_state,burial_score          | PRIMARY      | 4       | ranker.vrl.list_id                              |    1 | Using where              |
|  4 | DEPENDENT SUBQUERY | list_tag | ref    | list_tag_key,list_id,tag_id                         | list_tag_key | 9       | ranker.l.list_id,const                          |    1 | Using where; Using index |
|  3 | DEPENDENT SUBQUERY | list_tag | ref    | list_tag_key,list_id,tag_id                         | list_tag_key | 9       | ranker.l.list_id,const                          |    1 | Using where; Using index |
|  2 | DEPENDENT SUBQUERY | list_tag | ref    | list_tag_key,list_id,tag_id                         | list_tag_key | 9       | ranker.l.list_id,const                          |    1 | Using where; Using index |
+----+--------------------+----------+--------+-----------------------------------------------------+--------------+---------+-------------------------------------------------+------+--------------------------+

The same query with JOINs is:

EXPLAIN SELECT vrl.list_id,vrl.ontology_id,vrl.position,l.name AS list_name, vrlih.position AS previous_position, vrl.moved_date 
FROM `vote-ranked-listory` vrl 
INNER JOIN lists l ON l.list_id = vrl.list_id 
INNER JOIN `vote-ranked-list-item-history` vrlih ON vrl.list_id = vrlih.list_id AND vrl.ontology_id=vrlih.ontology_id AND vrlih.type='PREVIOUS_POSITION' 
INNER JOIN list_burial_state lbs ON lbs.list_id = vrl.list_id AND lbs.burial_score < 0.5 
LEFT JOIN list_tag lt1 ON lt1.list_id = vrl.list_id AND lt1.tag_id = 43 
LEFT JOIN list_tag lt2 ON lt2.list_id = vrl.list_id AND lt2.tag_id = 55 
INNER JOIN list_tag lt3 ON lt3.list_id = vrl.list_id AND lt3.tag_id = 246403 
WHERE vrl.position <= 15 AND l.status='ACTIVE' AND l.is_public=1 AND vrl.ontology_id < 1000000000 
AND lt1.list_id IS NULL AND lt2.tag_id IS NULL 
ORDER BY vrl.moved_date DESC LIMIT 200;

and the output is:

+----+-------------+-------+--------+-----------------------------------------------------+--------------+---------+---------------------------------------------+------+----------------------------------------------+
| id | select_type | table | type   | possible_keys                                       | key          | key_len | ref                                         | rows | Extra                                        |
+----+-------------+-------+--------+-----------------------------------------------------+--------------+---------+---------------------------------------------+------+----------------------------------------------+
|  1 | SIMPLE      | lt3   | ref    | list_tag_key,list_id,tag_id                         | tag_id       | 5       | const                                       | 2386 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | l     | eq_ref | PRIMARY,status,ispublic,idx_lookup,is_public_status | PRIMARY      | 4       | ranker.lt3.list_id                          |    1 | Using where                                  |
|  1 | SIMPLE      | vrlih | ref    | PRIMARY                                             | PRIMARY      | 4       | ranker.lt3.list_id                          |  103 | Using where                                  |
|  1 | SIMPLE      | vrl   | ref    | PRIMARY                                             | PRIMARY      | 8       | ranker.lt3.list_id,ranker.vrlih.ontology_id |   65 | Using where                                  |
|  1 | SIMPLE      | lt1   | ref    | list_tag_key,list_id,tag_id                         | list_tag_key | 9       | ranker.lt3.list_id,const                    |    1 | Using where; Using index; Not exists         |
|  1 | SIMPLE      | lbs   | eq_ref | PRIMARY,idx_list_burial_state,burial_score          | PRIMARY      | 4       | ranker.vrl.list_id                          |    1 | Using where                                  |
|  1 | SIMPLE      | lt2   | ref    | list_tag_key,list_id,tag_id                         | list_tag_key | 9       | ranker.lt3.list_id,const                    |    1 | Using where; Using index                     |
+----+-------------+-------+--------+-----------------------------------------------------+--------------+---------+---------------------------------------------+------+----------------------------------------------+

A comparison of the rows column tells the difference and the query with JOINs is using Using temporary; Using filesort.

Of course when I run both the queries, the first one is done in 0.02 secs, the second one does not complete even after 1 min, so EXPLAIN explained these queries properly.

If I do not have the INNER JOIN on the list_tag table i.e. if I remove

AND (SELECT list_id FROM list_tag WHERE list_id=l.list_id AND tag_id=246403) IS NOT NULL  

from the first query and correspondingly:

INNER JOIN list_tag lt3 ON lt3.list_id = vrl.list_id AND lt3.tag_id = 246403

from the second query, then EXPLAIN returns the same number of rows for both queries and both these queries run equally fast.

囚你心 2024-09-03 20:55:43

子查询能够动态计算聚合函数。
例如,找到该书的最低价格并获取以此价格出售的所有书籍。
1) 使用子查询:

SELECT titles, price
FROM Books, Orders
WHERE price = 
(SELECT MIN(price)
 FROM Orders) AND (Books.ID=Orders.ID);

2) 使用 JOIN

SELECT MIN(price)
     FROM Orders;
-----------------
2.99

SELECT titles, price
FROM Books b
INNER JOIN  Orders o
ON b.ID = o.ID
WHERE o.price = 2.99;

Subqueries have ability to calculate aggregation functions on a fly.
E.g. Find minimal price of the book and get all books which are sold with this price.
1) Using Subqueries:

SELECT titles, price
FROM Books, Orders
WHERE price = 
(SELECT MIN(price)
 FROM Orders) AND (Books.ID=Orders.ID);

2) using JOINs

SELECT MIN(price)
     FROM Orders;
-----------------
2.99

SELECT titles, price
FROM Books b
INNER JOIN  Orders o
ON b.ID = o.ID
WHERE o.price = 2.99;
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文