我如何(或可以)在多个列上选择 DISTINCT?
我需要从表中检索所有行,其中两列的组合都不同。 因此,我希望同一天没有任何其他销售的所有销售都以相同的价格进行。 基于日期和价格的唯一销售将更新为活动状态。
所以我在想:
UPDATE sales
SET status = 'ACTIVE'
WHERE id IN (SELECT DISTINCT (saleprice, saledate), id, count(id)
FROM sales
HAVING count = 1)
但如果再往前走,我的大脑就会受伤。
I need to retrieve all rows from a table where 2 columns combined are all different. So I want all the sales that do not have any other sales that happened on the same day for the same price. The sales that are unique based on day and price will get updated to an active status.
So I'm thinking:
UPDATE sales
SET status = 'ACTIVE'
WHERE id IN (SELECT DISTINCT (saleprice, saledate), id, count(id)
FROM sales
HAVING count = 1)
But my brain hurts going any farther than that.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
大致相当于:
习惯 GROUP BY 语法是个好主意,因为它更强大。
对于您的查询,我会这样做:
is roughly equivalent to:
It's a good idea to get used to the GROUP BY syntax, as it's more powerful.
For your query, I'd do it like this:
如果您将到目前为止的答案放在一起,进行清理和改进,您将得到这个高级查询:
这比它们中的任何一个都快得多。 将当前接受的答案的性能提高 10 - 15 倍(在我对 PostgreSQL 8.4 和 9.1 的测试中)。
但这还远非最佳。 使用
NOT EXISTS
(反)半连接以获得更好的性能。EXISTS
是标准 SQL,一直存在(至少从 PostgreSQL 7.2 开始,早在这个问题被提出之前)并且完全符合所提出的要求:db<>fiddle 此处
旧sqlfiddle
用于标识行的唯一键
(如果您不这样做)具有表的主键或唯一键(示例中的
id
),您可以出于此查询的目的用系统列ctid
替换(但不能用于其他某些查询)目的):每个表都应该有一个主键。 如果您还没有,请添加一个。 我建议在 Postgres 10+ 中使用
serial
或IDENTITY
列。相关:
这有多快?
一旦找到第一个重复项,
EXISTS
反半连接中的子查询就可以停止计算(没有必要进一步查找)。 对于具有很少重复项的基表,这只是稍微更有效。 对于大量重复项,这会变得更加高效。排除空更新
对于已经具有
status = 'ACTIVE'
的行,此更新不会更改任何内容,但仍以全部成本插入新的行版本(存在较小的例外情况)。 通常,您不希望这样。 添加另一个像上面演示的WHERE
条件以避免这种情况并使其更快:如果
status
定义为NOT NULL
,您可以简化为:列的数据类型必须支持
<>
运算符。 有些类型(例如json
)则不然。 请参阅:NULL 处理中的细微差别
此查询(与 Joel 当前接受的答案不同)不处理 NULL值相等。 以下两行
(saleprice, saledate)
将被视为“不同”(尽管与人眼看起来相同):还传递唯一索引以及几乎任何其他位置,因为 NULL 值不进行比较根据 SQL 标准,相等。 请参阅:
OTOH,
GROUP BY
、DISTINCT
或DISTINCT ON ()
将 NULL 值视为相等。 根据您想要实现的目标,使用适当的查询样式。 您仍然可以通过使用此更快的查询对于任何或所有比较,使用 IS NOT DISTINCT FROM
代替=
以使 NULL 比较相等。 更多:如果所有被比较的列都定义为
NOT NULL
,则没有分歧的余地。If you put together the answers so far, clean up and improve, you would arrive at this superior query:
Which is much faster than either of them. Nukes the performance of the currently accepted answer by factor 10 - 15 (in my tests on PostgreSQL 8.4 and 9.1).
But this is still far from optimal. Use a
NOT EXISTS
(anti-)semi-join for even better performance.EXISTS
is standard SQL, has been around forever (at least since PostgreSQL 7.2, long before this question was asked) and fits the presented requirements perfectly:db<>fiddle here
Old sqlfiddle
Unique key to identify row
If you don't have a primary or unique key for the table (
id
in the example), you can substitute with the system columnctid
for the purpose of this query (but not for some other purposes):Every table should have a primary key. Add one if you didn't have one, yet. I suggest a
serial
or anIDENTITY
column in Postgres 10+.Related:
How is this faster?
The subquery in the
EXISTS
anti-semi-join can stop evaluating as soon as the first dupe is found (no point in looking further). For a base table with few duplicates this is only mildly more efficient. With lots of duplicates this becomes way more efficient.Exclude empty updates
For rows that already have
status = 'ACTIVE'
this update would not change anything, but still insert a new row version at full cost (minor exceptions apply). Normally, you do not want this. Add anotherWHERE
condition like demonstrated above to avoid this and make it even faster:If
status
is definedNOT NULL
, you can simplify to:The data type of the column must support the
<>
operator. Some types likejson
don't. See:Subtle difference in NULL handling
This query (unlike the currently accepted answer by Joel) does not treat NULL values as equal. The following two rows for
(saleprice, saledate)
would qualify as "distinct" (though looking identical to the human eye):Also passes in a unique index and almost anywhere else, since NULL values do not compare equal according to the SQL standard. See:
OTOH,
GROUP BY
,DISTINCT
orDISTINCT ON ()
treat NULL values as equal. Use an appropriate query style depending on what you want to achieve. You can still use this faster query withIS NOT DISTINCT FROM
instead of=
for any or all comparisons to make NULL compare equal. More:If all columns being compared are defined
NOT NULL
, there is no room for disagreement.您的查询的问题在于,当使用 GROUP BY 子句(本质上是通过使用不同的)时,您只能使用分组依据或聚合函数的列。 您不能使用列 ID,因为可能存在不同的值。 在您的情况下,由于 HAVING 子句,始终只有一个值,但大多数 RDBMS 不够智能,无法识别这一点。
不过,这应该可行(并且不需要联接):
您还可以使用 MAX 或 AVG 而不是 MIN,只有在只有一个匹配行时才使用返回列值的函数才重要。
The problem with your query is that when using a GROUP BY clause (which you essentially do by using distinct) you can only use columns that you group by or aggregate functions. You cannot use the column id because there are potentially different values. In your case there is always only one value because of the HAVING clause, but most RDBMS are not smart enough to recognize that.
This should work however (and doesn't need a join):
You could also use MAX or AVG instead of MIN, it is only important to use a function that returns the value of the column if there is only one matching row.
如果您的 DBMS 不支持多列的不同,如下所示:
多重选择通常可以安全地执行,如下所示:
因为这可以在大多数 DBMS 上工作,并且预计这比分组解决方案更快,因为您避免了分组功能。
If your DBMS doesn't support distinct with multiple columns like this:
Multi select in general can be executed safely as follows:
As this can work on most of the DBMS and this is expected to be faster than group by solution as you are avoiding the grouping functionality.
我想从“GrondOfLucht”一列中选择不同的值,但它们应该按照“排序”列中给出的顺序进行排序。 我无法使用它仅获得一列的不同值,
它还会给出“排序”列,并且因为“GrondOfLucht”和“排序”不唯一,所以结果将是所有行。
使用 GROUP 按照 'sorting 给出的顺序选择 'GrondOfLucht' 的记录
I want to select the distinct values from one column 'GrondOfLucht' but they should be sorted in the order as given in the column 'sortering'. I cannot get the distinct values of just one column using
It will also give the column 'sortering' and because 'GrondOfLucht' AND 'sortering' is not unique, the result will be ALL rows.
use the GROUP to select the records of 'GrondOfLucht' in the order given by 'sortering