MySQL Group By 和 HAVING

发布于 2024-11-16 21:26:28 字数 522 浏览 2 评论 0原文

我是一个 MySQL 查询新手,所以我确信这是一个有明显答案的问题。

但是,我正在查看这两个查询。他们会返回不同的结果集吗?我知道排序过程会以不同的方式开始,但我相信它们会返回相同的结果,而第一个查询的效率稍高一些?

查询 1:HAVING,然后 AND

SELECT user_id   
FROM forum_posts  
GROUP BY user_id 
    HAVING COUNT(id) >= 100   
    AND user_id NOT IN (SELECT user_id FROM banned_users)

查询 2:WHERE,然后 HAVING

SELECT user_id   
FROM forum_posts 
WHERE user_id NOT IN(SELECT user_id FROM banned_users) 
GROUP BY user_id 
    HAVING COUNT(id) >= 100   

I'm a MySQL query noobie so I'm sure this is a question with an obvious answer.

But, I was looking at these two queries. Will they return different result sets? I understand that the sorting process would commence differently, but I believe they will return the same results with the first query being slightly more efficient?

Query 1: HAVING, then AND

SELECT user_id   
FROM forum_posts  
GROUP BY user_id 
    HAVING COUNT(id) >= 100   
    AND user_id NOT IN (SELECT user_id FROM banned_users)

Query 2: WHERE, then HAVING

SELECT user_id   
FROM forum_posts 
WHERE user_id NOT IN(SELECT user_id FROM banned_users) 
GROUP BY user_id 
    HAVING COUNT(id) >= 100   

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

无尽的现实 2024-11-23 21:26:28

实际上,第一个查询的效率会较低(在 WHERE 之后应用 HAVING)。
更新

一些伪代码来说明如何执行查询([非常]简化的版本)。
第一个查询:
1. 从 forum_posts 中选择 user_id
2. 从banned_user中选择user_id
3.分组、计数等
4. 如果记录出现在第二个

查询中,则从第一个结果集中排除这些记录
1. 从 forum_posts 中选择 user_id
2. 从banned_user中选择user_id
3. 如果记录出现在第二个结果集中,则从第一个结果集中排除这些记录
4.分组、计数等。

步骤1,2的顺序并不重要,mysql可以选择它认为更好的任何内容。重要的区别在于步骤 3,4。 having 在 GROUP BY 之后应用。分组通常比联接更昂贵(在这种情况下排除记录可以被视为联接操作),因此分组的记录越少,性能越好。

Actually the first query will be less efficient (HAVING applied after WHERE).
UPDATE

Some pseudo code to illustrate how your queries are executed ([very] simplified version).
First query:
1. SELECT user_id FROM forum_posts
2. SELECT user_id FROM banned_user
3. Group, count, etc.
4. Exclude records from the first result set if they are presented in the second

Second query
1. SELECT user_id FROM forum_posts
2. SELECT user_id FROM banned_user
3. Exclude records from the first result set if they are presented in the second
4. Group, count, etc.

The order of steps 1,2 is not important, mysql can choose whatever it thinks is better. The important difference is in steps 3,4. Having is applied after GROUP BY. Grouping is usually more expensive than joining (excluding records can be considering as join operation in this case), so the fewer records it has to group, the better performance.

心如荒岛 2024-11-23 21:26:28

您已经回答了两个查询将显示相同的结果以及各种意见,哪个查询更有效。

我的观点是,仅当优化器为这两个查询生成不同的计划时,才会出现效率差异(速度)。我认为对于最新的 MySQL 版本,优化器足够聪明,可以为任一查询找到相同的计划,因此根本没有区别,但当然可以使用 EXPLAIN 测试并查看执行计划或针对一些测试表运行 2 个查询。

无论如何,我会使用第二个版本,只是为了安全起见。


让我补充一点:

  • 在 MySQL 中,COUNT(*) 通常比 COUNT(notNullableField) 更高效。在未来的 MySQL 版本中修复此问题之前,请在适用的情况下使用 COUNT(*)

因此,您还可以使用:

SELECT user_id   
FROM forum_posts 
WHERE user_id NOT IN
  ( SELECT user_id FROM banned_users ) 
GROUP BY user_id 
HAVING COUNT(*) >= 100   
  • 在应用GROUP BY之前,还有其他方法可以实现相同(NOT IN)的子结果。

使用 LEFT JOIN / NULL

SELECT fp.user_id   
FROM forum_posts AS fp
  LEFT JOIN banned_users AS bu
    ON bu.user_id = fp.user_id
WHERE bu.user_id IS NULL 
GROUP BY fp.user_id 
HAVING COUNT(*) >= 100  

使用 NOT EXISTS

SELECT fp.user_id   
FROM forum_posts AS fp 
WHERE NOT EXISTS
  ( SELECT * 
    FROM banned_users AS bu
    WHERE bu.user_id = fp.user_id
  ) 
GROUP BY fp.user_id 
HAVING COUNT(*) >= 100   

这 3 种方法中哪一种更快取决于您的表大小和许多其他因素,因此最好是使用您的表进行测试数据。

You have already answers that the two queries will show same results and various opinions for which one is more efficient.

My opininion is that there will be a difference in efficiency (speed), only if the optimizer yields with different plans for the 2 queries. I think that for the latest MySQL versions the optimizers are smart enough to find the same plan for either query so there will be no difference at all but off course one can test and see either the excution plans with EXPLAIN or running the 2 queries against some test tables.

I would use the second version in any case, just to play safe.


Let me add that:

  • COUNT(*) is usually more efficient than COUNT(notNullableField) in MySQL. Until that is fixed in future MySQL versions, use COUNT(*) where applicable.

Therefore, you can also use:

SELECT user_id   
FROM forum_posts 
WHERE user_id NOT IN
  ( SELECT user_id FROM banned_users ) 
GROUP BY user_id 
HAVING COUNT(*) >= 100   
  • There are also other ways to achieve same (to NOT IN) sub-results before applying GROUP BY.

Using LEFT JOIN / NULL :

SELECT fp.user_id   
FROM forum_posts AS fp
  LEFT JOIN banned_users AS bu
    ON bu.user_id = fp.user_id
WHERE bu.user_id IS NULL 
GROUP BY fp.user_id 
HAVING COUNT(*) >= 100  

Using NOT EXISTS :

SELECT fp.user_id   
FROM forum_posts AS fp 
WHERE NOT EXISTS
  ( SELECT * 
    FROM banned_users AS bu
    WHERE bu.user_id = fp.user_id
  ) 
GROUP BY fp.user_id 
HAVING COUNT(*) >= 100   

Which of the 3 methods is faster depends on your table sizes and a lot of other factors, so best is to test with your data.

且行且努力 2024-11-23 21:26:28

HAVING 条件应用于分组结果,并且由于您按 user_id 分组,因此它们的所有可能值都将出现在分组结果中,因此 user_id 条件的放置并不重要。

HAVING conditions are applied to the grouped by results, and since you group by user_id, all of their possible values will be present in the grouped result, so the placing of the user_id condition is not important.

北方的韩爷 2024-11-23 21:26:28

对我来说,第二个查询更有效,因为它减少了 GROUP BY 和 HAVING 的记录数量。

或者,您可以尝试以下查询以避免使用 IN:

SELECT `fp`.`user_id`
FROM `forum_posts` `fp`
LEFT JOIN `banned_users` `bu` ON `fp`.`user_id` = `bu`.`user_id`
WHERE `bu`.`user_id` IS NULL
GROUP BY `fp`.`user_id`
HAVING COUNT(`fp`.`id`) >= 100

希望这会有所帮助。

To me, second query is more efficient because it lowers the number of records for GROUP BY and HAVING.

Alternatively, you may try the following query to avoid using IN:

SELECT `fp`.`user_id`
FROM `forum_posts` `fp`
LEFT JOIN `banned_users` `bu` ON `fp`.`user_id` = `bu`.`user_id`
WHERE `bu`.`user_id` IS NULL
GROUP BY `fp`.`user_id`
HAVING COUNT(`fp`.`id`) >= 100

Hope this helps.

第七度阳光i 2024-11-23 21:26:28

不,它不会给出相同的结果。

因为第一个查询将从 count(id) 条件过滤记录,

另一个查询过滤记录,然后应用having子句。

第二个查询编写正确

No it does not gives same results.

Because first query will filter records from count(id) condition

Another query filter records and then apply having clause.

Second Query is correctly written

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文