SELECT 列表中的所有列是否都必须出现在 GROUP BY 子句中

发布于 2024-11-07 02:43:09 字数 146 浏览 6 评论 0原文

我的讲师说:

SELECT 列表中的所有列名称必须出现在 GROUP BY 子句中,除非名称仅在聚合函数中使用。

我只是想对此进行一些确认,因为我想不出一个合乎逻辑的解释来解释为什么它应该是真的。

My lecturer stated:

All column names in SELECT list must appear in GROUP BY clause unless name is used only in an aggregate function.

I'm just wanting some confirmation of this as I cannot think of a logical explanation as to why it should be true.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

童话里做英雄 2024-11-14 02:43:09

想象一下:

    A    B    C
   Cat   10   False
   Dog   25   True
   Dog   20   False
   Cat   5    False

如果您选择 A、B 并仅按 A 进行分组 - 您的输出会是什么?您只有两行(或元组),因为 A 有两个值 - 但它如何显示 B?

如果按 A、B 分组,则会得到四行,没有问题。
如果您按 A 分组并对 B 执行函数(例如 SUM(B)),那么您会再次得到两行:

    Cat    15
    Dog    45

但是如果您选择 A、B 并且仅按 A 分组 - 它不知道该怎么做。说实话,我相信有些数据库会在这种情况下为 B 选择一个随机值,并且我相信有些数据库会给您一条错误消息。

Imagine the following:

    A    B    C
   Cat   10   False
   Dog   25   True
   Dog   20   False
   Cat   5    False

If you select A, B and Group By Only A - what would your output be? You'd only have two rows (or tuples) because you have two values for A - but how does it display B?

If you group by A, B, you'd get four rows, no problems there.
If you group by A and perform a function on B - like SUM(B) then you get two rows again:

    Cat    15
    Dog    45

But if you select A, B and only group by A - it doesn't know what to do. Truthfully, I believe there are some databases out there that will select a random value for B in that case and I believe there are some that will give you an error message.

趁微风不噪 2024-11-14 02:43:09

历史上确实如此。省略未聚合的列会导致不确定的行为。 SQL 的目标是完全确定的行为。

但 SQL 标准最近发生了变化,允许您从 GROUP BY 子句中省略在功能上依赖于 GROUP BY 中的列的列。 PostgreSQL 遵循更新的 SQL 标准。 (这不是唯一的。)行为仍然是完全确定的。

create table a (
  a_id integer primary key,
  xfr_date date not null
);

create table b (
  a_id integer not null references a (a_id),
  recd_date date not null,
  units_recd integer not null 
    check (units_recd >= 0),
  primary key (a_id, recd_date)
);

select a.a_id, a.xfr_date, sum(b.units_recd)
from a
inner join b on a.a_id = b.a_id
group by a.a_id; -- The column a.xfr_date is functionally dependent 
                 -- on a.a_id; it doesn't have to appear in the 
                 -- GROUP BY clause.

与 SQL 标准的显着偏差是 MySQL。它可以让您省略 GROUP BY 中的几乎所有内容。但是,当您忽略 SELECT 列表中的列时,这种设计选择会使其行为变得不确定。

That's historically true. Omitting unaggregated columns leads to indeterminate behavior. SQL aims at fully determinate behavior.

But SQL standards have recently changed to let you omit from the GROUP BY clause columns that are functionally dependent on columns that are in the GROUP BY. PostgreSQL follows the more recent SQL standards. (It's not the only one.) Behavior is still fully determinate.

create table a (
  a_id integer primary key,
  xfr_date date not null
);

create table b (
  a_id integer not null references a (a_id),
  recd_date date not null,
  units_recd integer not null 
    check (units_recd >= 0),
  primary key (a_id, recd_date)
);

select a.a_id, a.xfr_date, sum(b.units_recd)
from a
inner join b on a.a_id = b.a_id
group by a.a_id; -- The column a.xfr_date is functionally dependent 
                 -- on a.a_id; it doesn't have to appear in the 
                 -- GROUP BY clause.

The notable deviation from SQL standards is MySQL. It lets you omit just about everything from the GROUP BY. But that design choice makes its behavior indeterminate when you omit columns that are in the SELECT list.

墨离汐 2024-11-14 02:43:09

实际上,在 MySQL 中,您不必按所有列进行分组。您可以按您想要的任何列进行分组。问题是,它只会为不在分组依据中的字段提取一个随机值(从组中的可用行集中)。如果您知道要按唯一键进行分组,则按其余字段进行分组是没有意义的,因为无论如何它们都已经具有相同的值。实际上,当完全没有必要时,它可以加快速度,不必按每个字段进行分组。

Actually, in MySQL you don't have to group by all columns. You can just group by whatever columns you want. The problem is, it will just pull a random value (from the set of available rows in the group) for the fields which aren't in the group by. If you know that you are grouping by something that is a unique key, there's no point in grouping by the rest of the fields, as they will already all have the same value anyway. It can actually speed it up to not have to group by every field when it is completely unnecessary.

淡忘如思 2024-11-14 02:43:09

如果您对某项进行分组,则无法看到未分组列的各个值,因为每一组中可能有多个值。您所能做的就是报告聚合函数(总和、计数、最小值等)——这些函数能够将多个值组合到结果中的单个单元格中。

If you are grouping on something you cannot see the individual values of non-grouped columns because there may be more than one value within each group. All you can do is report on aggregate functions (sum, count, min & etc) -- these are able to combine the multiple values into a single cell in the result.

半夏半凉 2024-11-14 02:43:09

所以简单的答案是:这取决于。 Mysql允许,vertica不允许。

实际上有一个有效的省略用例,那就是当您已经使用 MIN() 选择 say 时。

这是事件跟踪的实际示例。假设您有信用和购买事件。

为简单起见,我们说 a=信用,b、c、d 是某种购买事件,并且时间用流水号跟踪。现在您想要查找每个积分后首次购买的日期。我们也碰巧只有一个客户0:

create table events (user_id int ,created_at int, event varchar(255));
insert into events values (0,0, 'a');
insert into events values (0,1, 'b');
insert into events values (0,2, 'c');
insert into events values (0,3, 'a');
insert into events values (0,4, 'c');
insert into events values (0,5, 'b');
insert into events values (0,6, 'a');
insert into events values (0,7, 'a');
insert into events values (0,8, 'd');

mysql> SELECT user_id, MAX(purchased) AS purchased, spent, event FROM (SELECT e1.User_ID AS user_id, e1.created_at AS purchased, MIN(e2.created_at) AS spent, e2.event AS event FROM events e1, events e2 WHERE e1.user_id = e2.user_id AND e1.created_at <= e2.created_at AND e1.Event = 'a' AND e2.Event != 'a' GROUP BY e1.user_id, e1.created_at) e3 GROUP BY user_id, spent;
+---------+-----------+-------+-------+
| user_id | purchased | spent | event |
+---------+-----------+-------+-------+
|       0 |         0 |     1 | b     |
|       0 |         3 |     4 | c     |
|       0 |         7 |     8 | d     |
+---------+-----------+-------+-------+
3 rows in set (0.00 sec)

在mysql中看起来不错,在vertica中不起作用:

则必须在聚合函数中使用

错误2640:列“e2.event”必须出现在GROUP BY子句中,或者如果我省略事件列, ,它适用于两者,但我实际上想知道 min 选择的行有什么具体值事件。

所以我的回答以征求意见结束:) 有什么想法吗?

So the simple answer is: It depends. Mysql allows it, vertica doesn't.

There is actually a valid use case for omitting and that is when you are already selecting say with MIN().

Here is an actual example for event tracking. Imaging you have credit and purchase events.

For simplicity we say a=credit, b,c,d are some kind of purchase event, and time is tracked with a running number. Now you want to find the date of the first purchase after each credit. We also happen to have only one customer 0:

create table events (user_id int ,created_at int, event varchar(255));
insert into events values (0,0, 'a');
insert into events values (0,1, 'b');
insert into events values (0,2, 'c');
insert into events values (0,3, 'a');
insert into events values (0,4, 'c');
insert into events values (0,5, 'b');
insert into events values (0,6, 'a');
insert into events values (0,7, 'a');
insert into events values (0,8, 'd');

mysql> SELECT user_id, MAX(purchased) AS purchased, spent, event FROM (SELECT e1.User_ID AS user_id, e1.created_at AS purchased, MIN(e2.created_at) AS spent, e2.event AS event FROM events e1, events e2 WHERE e1.user_id = e2.user_id AND e1.created_at <= e2.created_at AND e1.Event = 'a' AND e2.Event != 'a' GROUP BY e1.user_id, e1.created_at) e3 GROUP BY user_id, spent;
+---------+-----------+-------+-------+
| user_id | purchased | spent | event |
+---------+-----------+-------+-------+
|       0 |         0 |     1 | b     |
|       0 |         3 |     4 | c     |
|       0 |         7 |     8 | d     |
+---------+-----------+-------+-------+
3 rows in set (0.00 sec)

looks good in mysql, does not work in vertica:

ERROR 2640: Column "e2.event" must appear in the GROUP BY clause or be used in an aggregate function

if I omit the event column, it works in both, but I do actually want to know what specific value event had for the row that min selected.

So my answer ends with a request for comment :) Any ideas?

安穩 2024-11-14 02:43:09

萨姆·萨弗伦 (Sam Saffron) 指出,也有例外,但一般来说,讲师所说的是正确的。

如果我选择 3 列并按 2 分组,RDBMS 应该如何处理第三列?

RDBMS 的开发人员可能会决定如何处理额外的列(MySQL 的开发人员似乎已经这样做了),但这是我在编写 select 时会做出的决定还是我想要的决定?决定永远有效吗?我当然更喜欢类似 Oracle 的方法,迫使我明确说明应该发生什么。

如果我选择 3 列并按 2 进行分组,那么 RDBS 是否应该按所有 3 列进行分组,从第三个(最大或最小、最常见的)中选择一个随机值?

There are exceptions as noted by Sam Saffron but generally what your lecturer said is true.

If I select 3 columns and group by 2 what should the RDBMS do with the 3rd column?

The developers of the RDBMS may make a decision of how to handle the extra colum (as it appears MySQL's developers have) but is it the decision I would have made or the one I want when writing the select? Will the decision always be valid? I certainly prefer the Oracle-like approach of forcing me to explicitly state what should happen.

If I select 3 columns and group by 2 should the RDBS group by all 3, pick a random value from the 3rd, the biggest or littlest, the most common?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文