消除 PostgreSQL SELECT 语句中的重复行

发布于 2024-12-19 07:54:06 字数 1061 浏览 1 评论 0原文

这是我的查询:

SELECT autor.entwickler,anwendung.name
  FROM autor 
  left join anwendung
    on anwendung.name = autor.anwendung;

 entwickler |    name     
------------+-------------
 Benutzer 1 | Anwendung 1
 Benutzer 2 | Anwendung 1
 Benutzer 2 | Anwendung 2
 Benutzer 1 | Anwendung 3
 Benutzer 1 | Anwendung 4
 Benutzer 2 | Anwendung 4
(6 rows)

我想为字段 name 中的每个不同值保留一行,并像这样丢弃其他值:

 entwickler |    name     
------------+-------------
 Benutzer 1 | Anwendung 1
 Benutzer 2 | Anwendung 2
 Benutzer 1 | Anwendung 3
 Benutzer 1 | Anwendung 4

在 MySQL 中我会这样做:

SELECT autor.entwickler,anwendung.name
  FROM autor
  left join anwendung
    on anwendung.name = autor.anwendung
 GROUP BY anwendung.name;

但是 PostgreSQL 给了我这个错误:

错误:列“autor.entwickler”必须出现在 GROUP BY 子句中 或用于聚合函数第 1 行:SELECT autor.entwickler FROM autor left join anwendung on ...

我完全理解这个错误,并假设 mysql 实现比 postgres 实现不太符合 SQL。但我怎样才能得到想要的结果呢?

This is my query:

SELECT autor.entwickler,anwendung.name
  FROM autor 
  left join anwendung
    on anwendung.name = autor.anwendung;

 entwickler |    name     
------------+-------------
 Benutzer 1 | Anwendung 1
 Benutzer 2 | Anwendung 1
 Benutzer 2 | Anwendung 2
 Benutzer 1 | Anwendung 3
 Benutzer 1 | Anwendung 4
 Benutzer 2 | Anwendung 4
(6 rows)

I want to keep one row for each distinct value in the field name, and discard the others like this:

 entwickler |    name     
------------+-------------
 Benutzer 1 | Anwendung 1
 Benutzer 2 | Anwendung 2
 Benutzer 1 | Anwendung 3
 Benutzer 1 | Anwendung 4

In MySQL I would just do:

SELECT autor.entwickler,anwendung.name
  FROM autor
  left join anwendung
    on anwendung.name = autor.anwendung
 GROUP BY anwendung.name;

But PostgreSQL gives me this error:

ERROR: column "autor.entwickler" must appear in the GROUP BY clause
or be used in an aggregate function LINE 1: SELECT autor.entwickler
FROM autor left join anwendung on an ...

I totally understand the error and assume that the mysql implementation is less SQL conform than the postgres implementation. But how can I get the desired result?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

暗地喜欢 2024-12-26 07:54:06

PostgreSQL 目前不允许使用不明确的 GROUP BY 语句,这些语句的结果取决于扫描表的顺序、使用的计划等。这就是标准所说的应该如何工作 AFAIK,但是某些数据库(像 5.7 之前的 MySQL 版本一样)允许更宽松的查询,只选择出现在 SELECT 列表中但不在 GROUP BY 中的元素遇到的第一个值。

在 PostgreSQL 中,您应该使用 DISTINCT ON 对于此类查询。

你想写这样的东西:(

SELECT DISTINCT ON (anwendung.name) anwendung.name, autor.entwickler
FROM author 
left join anwendung on anwendung.name = autor.anwendung;

根据后续评论更正语法)

这有点像 MySQL 5.7 的 ANY_VALUE(...) 伪函数 group by,但相反 - 它表示 distinct on 子句中的值必须是唯一的,对于未指定的列,任何值都是可接受的。

除非有 ORDER BY,否则无法保证选择哪些值。为了可预测性,您通常应该使用 ORDER BY

还值得注意的是,使用像 min()max() 这样的聚合是可行的。虽然这是事实 - 并且会产生可靠且可预测的结果,与使用 DISTINCT ON 或模糊的 GROUP BY 不同 - 由于需要额外的排序,它会产生性能成本或聚合,并且它仅适用于序数数据类型。

PostgreSQL doesn't currently allow ambiguous GROUP BY statements where the results are dependent on the order the table is scanned, the plan used, etc. That's how the standard says it should work AFAIK, but some databases (like MySQL versions prior to 5.7) permit looser queries that just pick the first value encountered for elements appearing in the SELECT list but not in GROUP BY.

In PostgreSQL, you should use DISTINCT ON for this kind of query.

You want to write something like:

SELECT DISTINCT ON (anwendung.name) anwendung.name, autor.entwickler
FROM author 
left join anwendung on anwendung.name = autor.anwendung;

(Syntax corrected based on follow-up comment)

This is a bit like MySQL 5.7's ANY_VALUE(...) pseudo-function for group by, but in reverse - it says that the values in the distinct on clause must be unique, and any value is acceptable for the columns not specified.

Unless there's an ORDER BY, there is no gurantee as to which values are selected. You should usually have an ORDER BY for predictability.

It's also been noted that using an aggregate like min() or max() would work. While this is true - and will lead to reliable and predictable results, unlike using DISTINCT ON or an ambigious GROUP BY - it has a performance cost due to the need for extra sorting or aggregation, and it only works for ordinal data types.

怀念你的温柔 2024-12-26 07:54:06

Craig 的回答和您在评论中生成的查询有相同的缺陷:表 anwendung 位于 LEFT JOIN右侧,这与你明显的意图。您关心 anwendung.name任意选择 autor.entwickler。我将在下面进一步讨论这一点。

它应该是:

SELECT DISTINCT ON (1) an.name, au.entwickler
FROM   anwendung an
LEFT   JOIN autor au ON an.name = au.anwendung;

DISTINCT ON (1) 只是 DISTINCT ON (an.name) 的语法简写。此处允许位置引用。

如果一个应用程序 (anwendung) 有多个开发人员 (entwickler),则任意选择一名开发人员。如果您想要“第一个”(根据您的语言环境按字母顺序排列),您必须添加一个 ORDER BY 子句:

SELECT DISTINCT ON (1) an.name, au.entwickler
FROM   anwendung an
LEFT   JOIN autor au ON an.name = au.anwendung
ORDER  BY 1, 2;

正如@mdahlman 暗示的那样,更规范的方法是:

SELECT an.name, min(au.entwickler) AS entwickler
FROM   autor au
LEFT   JOIN anwendung an ON an.name = au.anwendung
GROUP  BY an.name;

或者,更好的是,清理您的数据模型,正确实现 anwendungautor 之间的 n:m 关系,添加代理主键作为 anwendungautor 几乎不是唯一的,通过外键约束强制关系完整性并调整生成的查询:

正确的方法

CREATE TABLE autor (
   autor_id serial PRIMARY KEY -- surrogate primary key
 , autor    text NOT NULL);

INSERT INTO autor  VALUES
   (1, 'mike')
 , (2, 'joe')
 , (3, 'jane')   -- worked on two apps
 , (4, 'susi');  -- has no part in any apps (yet)

CREATE TABLE anwendung (
   anwendung_id serial PRIMARY KEY -- surrogate primary key
 , anwendung    text  UNIQUE);     -- disallow duplicate names

INSERT INTO anwendung  VALUES
   (1, 'foo')    -- has 3 authors linked to it
 , (2, 'bar')
 , (3, 'shark')
 , (4, 'bait');  -- has no authors attached to it (yet).

CREATE TABLE autor_anwendung (  -- you might name this table "entwickler"
   autor_id     integer REFERENCES autor     ON UPDATE CASCADE ON DELETE CASCADE
 , anwendung_id integer REFERENCES anwendung ON UPDATE CASCADE ON DELETE CASCADE
 , PRIMARY KEY (autor_id, anwendung_id)
);

INSERT INTO autor_anwendung VALUES
 (1, 1)
,(2, 1)
,(3, 1)
,(2, 2)
,(3, 3);

此查询检索每个应用程序的一行与一个关联的作者(按字母顺序排列的第一个)或 NULL(如果有)无:

SELECT DISTINCT ON (1) an.anwendung, au.autor
FROM   anwendung an
LEFT   JOIN autor_anwendung au_au USING (anwendung_id)
LEFT   JOIN autor au USING (autor_id)
ORDER  BY 1, 2;

结果:

 name  | entwickler
-------+-----------------
 bait  |
 bar   | joe
 foo   | jane
 shark | jane

Craig's answer and your resulting query in the comments share the same flaw: The table anwendung is at the right side of a LEFT JOIN, which contradicts your obvious intent. You care about anwendung.name and pick autor.entwickler arbitrarily. I'll come back to that further down.

It should be:

SELECT DISTINCT ON (1) an.name, au.entwickler
FROM   anwendung an
LEFT   JOIN autor au ON an.name = au.anwendung;

DISTINCT ON (1) is just a syntactical shorthand for DISTINCT ON (an.name). Positional references are allowed here.

If there are multiple developers (entwickler) for an app (anwendung) one developer is picked arbitrarily. You have to add an ORDER BY clause if you want the "first" (alphabetically according to your locale):

SELECT DISTINCT ON (1) an.name, au.entwickler
FROM   anwendung an
LEFT   JOIN autor au ON an.name = au.anwendung
ORDER  BY 1, 2;

As @mdahlman implied, a more canonical way would be:

SELECT an.name, min(au.entwickler) AS entwickler
FROM   autor au
LEFT   JOIN anwendung an ON an.name = au.anwendung
GROUP  BY an.name;

Or, better yet, clean up your data model, implement the n:m relationship between anwendung and autor properly, add surrogate primary keys as anwendung and autor are hardly unique, enforce relational integrity with foreign key constraints and adapt your resulting query:

The proper way

CREATE TABLE autor (
   autor_id serial PRIMARY KEY -- surrogate primary key
 , autor    text NOT NULL);

INSERT INTO autor  VALUES
   (1, 'mike')
 , (2, 'joe')
 , (3, 'jane')   -- worked on two apps
 , (4, 'susi');  -- has no part in any apps (yet)

CREATE TABLE anwendung (
   anwendung_id serial PRIMARY KEY -- surrogate primary key
 , anwendung    text  UNIQUE);     -- disallow duplicate names

INSERT INTO anwendung  VALUES
   (1, 'foo')    -- has 3 authors linked to it
 , (2, 'bar')
 , (3, 'shark')
 , (4, 'bait');  -- has no authors attached to it (yet).

CREATE TABLE autor_anwendung (  -- you might name this table "entwickler"
   autor_id     integer REFERENCES autor     ON UPDATE CASCADE ON DELETE CASCADE
 , anwendung_id integer REFERENCES anwendung ON UPDATE CASCADE ON DELETE CASCADE
 , PRIMARY KEY (autor_id, anwendung_id)
);

INSERT INTO autor_anwendung VALUES
 (1, 1)
,(2, 1)
,(3, 1)
,(2, 2)
,(3, 3);

This query retrieves one row per app with one associated author (the 1st one alphabetically) or NULL if there are none:

SELECT DISTINCT ON (1) an.anwendung, au.autor
FROM   anwendung an
LEFT   JOIN autor_anwendung au_au USING (anwendung_id)
LEFT   JOIN autor au USING (autor_id)
ORDER  BY 1, 2;

Result:

 name  | entwickler
-------+-----------------
 bait  |
 bar   | joe
 foo   | jane
 shark | jane
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文