MySQL Group By 和 HAVING

发布于 2024-11-16 21:26:28 字数 522 浏览 7 评论 0原文

我是一个 MySQL 查询新手，所以我确信这是一个有明显答案的问题。

但是，我正在查看这两个查询。他们会返回不同的结果集吗？我知道排序过程会以不同的方式开始，但我相信它们会返回相同的结果，而第一个查询的效率稍高一些？

查询 1：HAVING，然后 AND

SELECT user_id   
FROM forum_posts  
GROUP BY user_id 
    HAVING COUNT(id) >= 100   
    AND user_id NOT IN (SELECT user_id FROM banned_users)

查询 2：WHERE，然后 HAVING

SELECT user_id   
FROM forum_posts 
WHERE user_id NOT IN(SELECT user_id FROM banned_users) 
GROUP BY user_id 
    HAVING COUNT(id) >= 100

原文

I'm a MySQL query noobie so I'm sure this is a question with an obvious answer.

But, I was looking at these two queries. Will they return different result sets? I understand that the sorting process would commence differently, but I believe they will return the same results with the first query being slightly more efficient?

Query 1: HAVING, then AND

SELECT user_id   
FROM forum_posts  
GROUP BY user_id 
    HAVING COUNT(id) >= 100   
    AND user_id NOT IN (SELECT user_id FROM banned_users)

Query 2: WHERE, then HAVING

SELECT user_id   
FROM forum_posts 
WHERE user_id NOT IN(SELECT user_id FROM banned_users) 
GROUP BY user_id 
    HAVING COUNT(id) >= 100

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

无尽的现实 2024-11-23 21:26:28

实际上，第一个查询的效率会较低（在 WHERE 之后应用 HAVING）。
更新

一些伪代码来说明如何执行查询（[非常]简化的版本）。
第一个查询：
1. 从 forum_posts 中选择 user_id
2. 从banned_user中选择user_id
3.分组、计数等
4. 如果记录出现在第二个

查询中，则从第一个结果集中排除这些记录
1. 从 forum_posts 中选择 user_id
2. 从banned_user中选择user_id
3. 如果记录出现在第二个结果集中，则从第一个结果集中排除这些记录
4.分组、计数等。

步骤1,2的顺序并不重要，mysql可以选择它认为更好的任何内容。重要的区别在于步骤 3,4。 having 在 GROUP BY 之后应用。分组通常比联接更昂贵（在这种情况下排除记录可以被视为联接操作），因此分组的记录越少，性能越好。

回复收藏 0 原文

心如荒岛 2024-11-23 21:26:28

您已经回答了两个查询将显示相同的结果以及各种意见，哪个查询更有效。

我的观点是，仅当优化器为这两个查询生成不同的计划时，才会出现效率差异（速度）。我认为对于最新的 MySQL 版本，优化器足够聪明，可以为任一查询找到相同的计划，因此根本没有区别，但当然可以使用 EXPLAIN 测试并查看执行计划或针对一些测试表运行 2 个查询。

无论如何，我会使用第二个版本，只是为了安全起见。

让我补充一点：

在 MySQL 中，COUNT(*) 通常比 COUNT(notNullableField) 更高效。在未来的 MySQL 版本中修复此问题之前，请在适用的情况下使用 COUNT(*)。

因此，您还可以使用：

SELECT user_id   
FROM forum_posts 
WHERE user_id NOT IN
  ( SELECT user_id FROM banned_users ) 
GROUP BY user_id 
HAVING COUNT(*) >= 100

在应用GROUP BY之前，还有其他方法可以实现相同（NOT IN）的子结果。

使用 LEFT JOIN / NULL ：

SELECT fp.user_id   
FROM forum_posts AS fp
  LEFT JOIN banned_users AS bu
    ON bu.user_id = fp.user_id
WHERE bu.user_id IS NULL 
GROUP BY fp.user_id 
HAVING COUNT(*) >= 100

使用 NOT EXISTS ：

SELECT fp.user_id   
FROM forum_posts AS fp 
WHERE NOT EXISTS
  ( SELECT * 
    FROM banned_users AS bu
    WHERE bu.user_id = fp.user_id
  ) 
GROUP BY fp.user_id 
HAVING COUNT(*) >= 100

这 3 种方法中哪一种更快取决于您的表大小和许多其他因素，因此最好是使用您的表进行测试数据。

You have already answers that the two queries will show same results and various opinions for which one is more efficient.

My opininion is that there will be a difference in efficiency (speed), only if the optimizer yields with different plans for the 2 queries. I think that for the latest MySQL versions the optimizers are smart enough to find the same plan for either query so there will be no difference at all but off course one can test and see either the excution plans with EXPLAIN or running the 2 queries against some test tables.

I would use the second version in any case, just to play safe.

Let me add that:

COUNT(*) is usually more efficient than COUNT(notNullableField) in MySQL. Until that is fixed in future MySQL versions, use COUNT(*) where applicable.

Therefore, you can also use:

SELECT user_id   
FROM forum_posts 
WHERE user_id NOT IN
  ( SELECT user_id FROM banned_users ) 
GROUP BY user_id 
HAVING COUNT(*) >= 100

There are also other ways to achieve same (to NOT IN) sub-results before applying GROUP BY.

Using LEFT JOIN / NULL :

SELECT fp.user_id   
FROM forum_posts AS fp
  LEFT JOIN banned_users AS bu
    ON bu.user_id = fp.user_id
WHERE bu.user_id IS NULL 
GROUP BY fp.user_id 
HAVING COUNT(*) >= 100

Using NOT EXISTS :

SELECT fp.user_id   
FROM forum_posts AS fp 
WHERE NOT EXISTS
  ( SELECT * 
    FROM banned_users AS bu
    WHERE bu.user_id = fp.user_id
  ) 
GROUP BY fp.user_id 
HAVING COUNT(*) >= 100

Which of the 3 methods is faster depends on your table sizes and a lot of other factors, so best is to test with your data.

回复收藏 0 原文

且行且努力 2024-11-23 21:26:28

HAVING 条件应用于分组结果，并且由于您按 user_id 分组，因此它们的所有可能值都将出现在分组结果中，因此 user_id 条件的放置并不重要。

回复收藏 0 原文

北方的韩爷 2024-11-23 21:26:28

对我来说，第二个查询更有效，因为它减少了 GROUP BY 和 HAVING 的记录数量。

或者，您可以尝试以下查询以避免使用 IN：

SELECT `fp`.`user_id`
FROM `forum_posts` `fp`
LEFT JOIN `banned_users` `bu` ON `fp`.`user_id` = `bu`.`user_id`
WHERE `bu`.`user_id` IS NULL
GROUP BY `fp`.`user_id`
HAVING COUNT(`fp`.`id`) >= 100

希望这会有所帮助。

To me, second query is more efficient because it lowers the number of records for GROUP BY and HAVING.

Alternatively, you may try the following query to avoid using IN:

SELECT `fp`.`user_id`
FROM `forum_posts` `fp`
LEFT JOIN `banned_users` `bu` ON `fp`.`user_id` = `bu`.`user_id`
WHERE `bu`.`user_id` IS NULL
GROUP BY `fp`.`user_id`
HAVING COUNT(`fp`.`id`) >= 100

Hope this helps.

回复收藏 0 原文