在 GROUP BY 和 COUNT 之后加入另一个表
我试图了解使用 JOIN
、COUNT(*)
和 GROUP BY
进行非常简单的查询的正确方法。我实际上已经让它工作了(见下文),但从我读到的内容来看,我使用了一个额外的 GROUP BY
,这是我不应该使用的。
(注意:下面的问题不是我的实际问题(它处理更复杂的表),但我试图提出一个类似的问题)
我有两个表:
Table: Person
-------------
key name cityKey
1 Alice 1
2 Bob 2
3 Charles 2
4 David 1
Table: City
-------------
key name
1 Albany
2 Berkeley
3 Chico
我想对 People 进行查询(带有一些 WHERE
子句)返回
- 每个城市中匹配的人数
- 城市的键 城市
- 的名称。
如果我这样做,
SELECT COUNT(Person.key) AS count, City.key AS cityKey, City.name AS cityName
FROM Person
LEFT JOIN City ON Person.cityKey = City.key
GROUP BY Person.cityKey, City.name
我会得到我想要的结果
count cityKey cityName
2 1 Albany
2 2 Berkeley
但是,我read 放入 GROUP BY
子句 (City.name
) 的最后一部分只是为了使其工作错误的。
那么这样做的正确方法是什么?我一直在尝试用谷歌搜索答案,但我觉得有一些基本的东西我没有得到。
I'm trying to make sense of the right way to use JOIN
, COUNT(*)
, and GROUP BY
to do a pretty simple query. I've actually gotten it to work (see below) but from what I've read, I'm using an extra GROUP BY
that I shouldn't be.
(Note: The problem below isn't my actual problem (which deals with more complicated tables), but I've tried to come up with an analogous problem)
I have two tables:
Table: Person
-------------
key name cityKey
1 Alice 1
2 Bob 2
3 Charles 2
4 David 1
Table: City
-------------
key name
1 Albany
2 Berkeley
3 Chico
I'd like to do a query on the People (with some WHERE
clause) that returns
- the number of matching people in each city
- the key for the city
- the name of the city.
If I do
SELECT COUNT(Person.key) AS count, City.key AS cityKey, City.name AS cityName
FROM Person
LEFT JOIN City ON Person.cityKey = City.key
GROUP BY Person.cityKey, City.name
I get the result that I want
count cityKey cityName
2 1 Albany
2 2 Berkeley
However, I've read that throwing in that last part of the GROUP BY
clause (City.name
) just to make it work is wrong.
So what's the right way to do this? I've been trying to google for an answer, but I feel like there's something fundamental that I'm just not getting.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我不认为在这种情况下这是“错误的”,因为城市名称和城市键之间存在一对一的关系。您可以重写它,以便加入子选择以按键获取城市的人数,然后再次获取城市表的名称,但这是有争议的,这是否会更好。我想这是风格和观点的问题。
如果我的 SQL 不是太生疏的话:-)
I don't think that it's "wrong" in this case, because you've got a one-to-one relationship between city name and city key. You could rewrite it such that you join to a sub-select to get the count of persons to cities by key, to the city table again for the name, but it's debatable that that'd be better. It's a matter of style and opinion I guess.
if my SQL isn't too rusty :-)
你误会了,你搞反了。
标准 SQL 要求您在 GROUP BY 中指定 SELECT 中提到的所有未包装在聚合函数中的列。如果您不希望 GROUP BY 中的某些列,请将它们包装在聚合函数中。根据数据库的不同,您可以使用分析/窗口函数
OVER
...但是,MySQL 和 SQLite 提供了“功能”,您可以在其中从分组依据中省略这些列 - 这会导致没有结果“为什么这个不从 MySQL 移植到 fill_in_the_blank 数据库?!” Stackoverflow 和许多其他网站 &论坛。
You misunderstand, you got it backwards.
Standard SQL requires you to specify in the GROUP BY all the columns mentioned in the SELECT that are not wrapped in aggregate functions. If you don't want certain columns in the GROUP BY, wrap them in aggregate functions. Depending on the database, you could use the analytic/windowing function
OVER
...However, MySQL and SQLite provide the "feature" where you can omit these columns from the group by - which leads to no end of "why doesn't this port from MySQL to fill_in_the_blank database?!" Stackoverflow and numerous other sites & forums.
这并没有错。您必须了解查询优化器如何看待您的查询。解析它的顺序就是要求您“将最后一部分放入”。优化器以类似于以下顺序的方式查看您的查询:
这里的要点是,GROUP BY 并不必须命名 SELECT 中的所有列,但事实上恰恰相反 - SELECT 不能包含 GROUP BY 中尚未包含的任何列。
It's not wrong. You have to understand how the Query Optimizer sees your query. The order in which it is parsed is what requires you to "throw the last part in." The optimizer sees your query in something akin to this order:
The point here is that it's not that the GROUP BY has to name all the columns in the SELECT, but in fact it is the opposite - the SELECT cannot include any columns not already in the GROUP BY.
您的查询仅适用于 MySQL,因为您对
Person.cityKey
进行分组,但选择了city.key
。所有其他数据库都要求您使用min(city.key)
之类的聚合,或者将City.key
添加到group by
子句中。由于城市名称和城市键的组合是唯一的,因此以下内容是等效的:
或:
或:
组中的所有行都将具有相同的城市名称和键,因此使用
max< 并不重要/code> 或
min
聚合。PS 如果您只想计算不同的人,即使他们有多行,请尝试:
而不是
Your query would only work on MySQL, because you group on
Person.cityKey
but selectcity.key
. All other databases would require you to use an aggregate likemin(city.key)
, or to addCity.key
to thegroup by
clause.Because the combination of city name and city key is unique, the following are equivalent:
Or:
Or:
All rows in the group will have the same city name and key, so it doesn't matter if you use the
max
ormin
aggregate.P.S. If you'd like to count only different persons, even if they have multiple rows, try:
instead of