当前位置：文江博客话题详情

SELECT 列表中的所有列是否都必须出现在 GROUP BY 子句中

发布于 2024-11-07 02:43:09 字数 146 浏览 6 评论 0原文

我的讲师说：

SELECT 列表中的所有列名称必须出现在 GROUP BY 子句中，除非名称仅在聚合函数中使用。

我只是想对此进行一些确认，因为我想不出一个合乎逻辑的解释来解释为什么它应该是真的。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

童话里做英雄 2024-11-14 02:43:09

想象一下：

    A    B    C
   Cat   10   False
   Dog   25   True
   Dog   20   False
   Cat   5    False

如果您选择 A、B 并仅按 A 进行分组 - 您的输出会是什么？您只有两行（或元组），因为 A 有两个值 - 但它如何显示 B？

如果按 A、B 分组，则会得到四行，没有问题。
如果您按 A 分组并对 B 执行函数（例如 SUM(B)），那么您会再次得到两行：

    Cat    15
    Dog    45

但是如果您选择 A、B 并且仅按 A 分组 - 它不知道该怎么做。说实话，我相信有些数据库会在这种情况下为 B 选择一个随机值，并且我相信有些数据库会给您一条错误消息。

Imagine the following:

    A    B    C
   Cat   10   False
   Dog   25   True
   Dog   20   False
   Cat   5    False

If you select A, B and Group By Only A - what would your output be? You'd only have two rows (or tuples) because you have two values for A - but how does it display B?

If you group by A, B, you'd get four rows, no problems there.
If you group by A and perform a function on B - like SUM(B) then you get two rows again:

    Cat    15
    Dog    45

But if you select A, B and only group by A - it doesn't know what to do. Truthfully, I believe there are some databases out there that will select a random value for B in that case and I believe there are some that will give you an error message.

回复收藏 0 原文

趁微风不噪 2024-11-14 02:43:09

历史上确实如此。省略未聚合的列会导致不确定的行为。 SQL 的目标是完全确定的行为。

但 SQL 标准最近发生了变化，允许您从 GROUP BY 子句中省略在功能上依赖于 GROUP BY 中的列的列。 PostgreSQL 遵循更新的 SQL 标准。（这不是唯一的。）行为仍然是完全确定的。

create table a (
  a_id integer primary key,
  xfr_date date not null
);

create table b (
  a_id integer not null references a (a_id),
  recd_date date not null,
  units_recd integer not null 
    check (units_recd >= 0),
  primary key (a_id, recd_date)
);

select a.a_id, a.xfr_date, sum(b.units_recd)
from a
inner join b on a.a_id = b.a_id
group by a.a_id; -- The column a.xfr_date is functionally dependent 
                 -- on a.a_id; it doesn't have to appear in the 
                 -- GROUP BY clause.

与 SQL 标准的显着偏差是 MySQL。它可以让您省略 GROUP BY 中的几乎所有内容。但是，当您忽略 SELECT 列表中的列时，这种设计选择会使其行为变得不确定。

That's historically true. Omitting unaggregated columns leads to indeterminate behavior. SQL aims at fully determinate behavior.

But SQL standards have recently changed to let you omit from the GROUP BY clause columns that are functionally dependent on columns that are in the GROUP BY. PostgreSQL follows the more recent SQL standards. (It's not the only one.) Behavior is still fully determinate.

create table a (
  a_id integer primary key,
  xfr_date date not null
);

create table b (
  a_id integer not null references a (a_id),
  recd_date date not null,
  units_recd integer not null 
    check (units_recd >= 0),
  primary key (a_id, recd_date)
);

select a.a_id, a.xfr_date, sum(b.units_recd)
from a
inner join b on a.a_id = b.a_id
group by a.a_id; -- The column a.xfr_date is functionally dependent 
                 -- on a.a_id; it doesn't have to appear in the 
                 -- GROUP BY clause.

The notable deviation from SQL standards is MySQL. It lets you omit just about everything from the GROUP BY. But that design choice makes its behavior indeterminate when you omit columns that are in the SELECT list.

回复收藏 0 原文

墨离汐 2024-11-14 02:43:09

实际上，在 MySQL 中，您不必按所有列进行分组。您可以按您想要的任何列进行分组。问题是，它只会为不在分组依据中的字段提取一个随机值（从组中的可用行集中）。如果您知道要按唯一键进行分组，则按其余字段进行分组是没有意义的，因为无论如何它们都已经具有相同的值。实际上，当完全没有必要时，它可以加快速度，不必按每个字段进行分组。

回复收藏 0 原文

淡忘如思 2024-11-14 02:43:09

如果您对某项进行分组，则无法看到未分组列的各个值，因为每一组中可能有多个值。您所能做的就是报告聚合函数（总和、计数、最小值等）——这些函数能够将多个值组合到结果中的单个单元格中。

回复收藏 0 原文

半夏半凉 2024-11-14 02:43:09

所以简单的答案是：这取决于。 Mysql允许，vertica不允许。

实际上有一个有效的省略用例，那就是当您已经使用 MIN() 选择 say 时。

这是事件跟踪的实际示例。假设您有信用和购买事件。

为简单起见，我们说 a=信用，b、c、d 是某种购买事件，并且时间用流水号跟踪。现在您想要查找每个积分后首次购买的日期。我们也碰巧只有一个客户0：

create table events (user_id int ,created_at int, event varchar(255));
insert into events values (0,0, 'a');
insert into events values (0,1, 'b');
insert into events values (0,2, 'c');
insert into events values (0,3, 'a');
insert into events values (0,4, 'c');
insert into events values (0,5, 'b');
insert into events values (0,6, 'a');
insert into events values (0,7, 'a');
insert into events values (0,8, 'd');

mysql> SELECT user_id, MAX(purchased) AS purchased, spent, event FROM (SELECT e1.User_ID AS user_id, e1.created_at AS purchased, MIN(e2.created_at) AS spent, e2.event AS event FROM events e1, events e2 WHERE e1.user_id = e2.user_id AND e1.created_at <= e2.created_at AND e1.Event = 'a' AND e2.Event != 'a' GROUP BY e1.user_id, e1.created_at) e3 GROUP BY user_id, spent;
+---------+-----------+-------+-------+
| user_id | purchased | spent | event |
+---------+-----------+-------+-------+
|       0 |         0 |     1 | b     |
|       0 |         3 |     4 | c     |
|       0 |         7 |     8 | d     |
+---------+-----------+-------+-------+
3 rows in set (0.00 sec)

在mysql中看起来不错，在vertica中不起作用：

则必须在聚合函数中使用

错误2640：列“e2.event”必须出现在GROUP BY子句中，或者如果我省略事件列，，它适用于两者，但我实际上想知道 min 选择的行有什么具体值事件。

所以我的回答以征求意见结束:) 有什么想法吗？

So the simple answer is: It depends. Mysql allows it, vertica doesn't.

There is actually a valid use case for omitting and that is when you are already selecting say with MIN().

Here is an actual example for event tracking. Imaging you have credit and purchase events.

For simplicity we say a=credit, b,c,d are some kind of purchase event, and time is tracked with a running number. Now you want to find the date of the first purchase after each credit. We also happen to have only one customer 0:

create table events (user_id int ,created_at int, event varchar(255));
insert into events values (0,0, 'a');
insert into events values (0,1, 'b');
insert into events values (0,2, 'c');
insert into events values (0,3, 'a');
insert into events values (0,4, 'c');
insert into events values (0,5, 'b');
insert into events values (0,6, 'a');
insert into events values (0,7, 'a');
insert into events values (0,8, 'd');

mysql> SELECT user_id, MAX(purchased) AS purchased, spent, event FROM (SELECT e1.User_ID AS user_id, e1.created_at AS purchased, MIN(e2.created_at) AS spent, e2.event AS event FROM events e1, events e2 WHERE e1.user_id = e2.user_id AND e1.created_at <= e2.created_at AND e1.Event = 'a' AND e2.Event != 'a' GROUP BY e1.user_id, e1.created_at) e3 GROUP BY user_id, spent;
+---------+-----------+-------+-------+
| user_id | purchased | spent | event |
+---------+-----------+-------+-------+
|       0 |         0 |     1 | b     |
|       0 |         3 |     4 | c     |
|       0 |         7 |     8 | d     |
+---------+-----------+-------+-------+
3 rows in set (0.00 sec)

looks good in mysql, does not work in vertica:

ERROR 2640: Column "e2.event" must appear in the GROUP BY clause or be used in an aggregate function

if I omit the event column, it works in both, but I do actually want to know what specific value event had for the row that min selected.

So my answer ends with a request for comment :) Any ideas?

回复收藏 0 原文