MySQL Group By 和 HAVING
我是一个 MySQL 查询新手,所以我确信这是一个有明显答案的问题。
但是,我正在查看这两个查询。他们会返回不同的结果集吗?我知道排序过程会以不同的方式开始,但我相信它们会返回相同的结果,而第一个查询的效率稍高一些?
查询 1:HAVING,然后 AND
SELECT user_id
FROM forum_posts
GROUP BY user_id
HAVING COUNT(id) >= 100
AND user_id NOT IN (SELECT user_id FROM banned_users)
查询 2:WHERE,然后 HAVING
SELECT user_id
FROM forum_posts
WHERE user_id NOT IN(SELECT user_id FROM banned_users)
GROUP BY user_id
HAVING COUNT(id) >= 100
I'm a MySQL query noobie so I'm sure this is a question with an obvious answer.
But, I was looking at these two queries. Will they return different result sets? I understand that the sorting process would commence differently, but I believe they will return the same results with the first query being slightly more efficient?
Query 1: HAVING, then AND
SELECT user_id
FROM forum_posts
GROUP BY user_id
HAVING COUNT(id) >= 100
AND user_id NOT IN (SELECT user_id FROM banned_users)
Query 2: WHERE, then HAVING
SELECT user_id
FROM forum_posts
WHERE user_id NOT IN(SELECT user_id FROM banned_users)
GROUP BY user_id
HAVING COUNT(id) >= 100
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
实际上,第一个查询的效率会较低(在
WHERE
之后应用HAVING
)。更新
一些伪代码来说明如何执行查询([非常]简化的版本)。
第一个查询:
1.
从 forum_posts 中选择 user_id
2.
从banned_user中选择user_id
3.分组、计数等
4. 如果记录出现在第二个
查询中,则从第一个结果集中排除这些记录
1.
从 forum_posts 中选择 user_id
2.
从banned_user中选择user_id
3. 如果记录出现在第二个结果集中,则从第一个结果集中排除这些记录
4.分组、计数等。
步骤1,2的顺序并不重要,mysql可以选择它认为更好的任何内容。重要的区别在于步骤 3,4。 having 在
GROUP BY
之后应用。分组通常比联接更昂贵(在这种情况下排除记录可以被视为联接操作),因此分组的记录越少,性能越好。Actually the first query will be less efficient (
HAVING
applied afterWHERE
).UPDATE
Some pseudo code to illustrate how your queries are executed ([very] simplified version).
First query:
1.
SELECT user_id FROM forum_posts
2.
SELECT user_id FROM banned_user
3. Group, count, etc.
4. Exclude records from the first result set if they are presented in the second
Second query
1.
SELECT user_id FROM forum_posts
2.
SELECT user_id FROM banned_user
3. Exclude records from the first result set if they are presented in the second
4. Group, count, etc.
The order of steps 1,2 is not important, mysql can choose whatever it thinks is better. The important difference is in steps 3,4. Having is applied after
GROUP BY
. Grouping is usually more expensive than joining (excluding records can be considering as join operation in this case), so the fewer records it has to group, the better performance.您已经回答了两个查询将显示相同的结果以及各种意见,哪个查询更有效。
我的观点是,仅当优化器为这两个查询生成不同的计划时,才会出现效率差异(速度)。我认为对于最新的 MySQL 版本,优化器足够聪明,可以为任一查询找到相同的计划,因此根本没有区别,但当然可以使用 EXPLAIN 测试并查看执行计划或针对一些测试表运行 2 个查询。
无论如何,我会使用第二个版本,只是为了安全起见。
让我补充一点:
COUNT(*)
通常比COUNT(notNullableField)
更高效。在未来的 MySQL 版本中修复此问题之前,请在适用的情况下使用COUNT(*)
。因此,您还可以使用:
GROUP BY
之前,还有其他方法可以实现相同(NOT IN
)的子结果。使用
LEFT JOIN / NULL
:使用
NOT EXISTS
:这 3 种方法中哪一种更快取决于您的表大小和许多其他因素,因此最好是使用您的表进行测试数据。
You have already answers that the two queries will show same results and various opinions for which one is more efficient.
My opininion is that there will be a difference in efficiency (speed), only if the optimizer yields with different plans for the 2 queries. I think that for the latest MySQL versions the optimizers are smart enough to find the same plan for either query so there will be no difference at all but off course one can test and see either the excution plans with EXPLAIN or running the 2 queries against some test tables.
I would use the second version in any case, just to play safe.
Let me add that:
COUNT(*)
is usually more efficient thanCOUNT(notNullableField)
in MySQL. Until that is fixed in future MySQL versions, useCOUNT(*)
where applicable.Therefore, you can also use:
NOT IN
) sub-results before applyingGROUP BY
.Using
LEFT JOIN / NULL
:Using
NOT EXISTS
:Which of the 3 methods is faster depends on your table sizes and a lot of other factors, so best is to test with your data.
HAVING 条件应用于分组结果,并且由于您按 user_id 分组,因此它们的所有可能值都将出现在分组结果中,因此 user_id 条件的放置并不重要。
HAVING conditions are applied to the grouped by results, and since you group by user_id, all of their possible values will be present in the grouped result, so the placing of the user_id condition is not important.
对我来说,第二个查询更有效,因为它减少了 GROUP BY 和 HAVING 的记录数量。
或者,您可以尝试以下查询以避免使用 IN:
希望这会有所帮助。
To me, second query is more efficient because it lowers the number of records for GROUP BY and HAVING.
Alternatively, you may try the following query to avoid using IN:
Hope this helps.
不,它不会给出相同的结果。
因为第一个查询将从 count(id) 条件过滤记录,
另一个查询过滤记录,然后应用having子句。
第二个查询编写正确
No it does not gives same results.
Because first query will filter records from count(id) condition
Another query filter records and then apply having clause.
Second Query is correctly written