GROUP BY 和聚合连续数值
使用 PostgreSQL 9.0。
假设我有一个包含字段的表:company
、profession
和 year
。我想返回一个包含独特公司和职业的结果,但根据数字序列聚合(到一个数组中即可)年份:
示例表:
+-----------------------------+
| company | profession | year |
+---------+------------+------+
| Google | Programmer | 2000 |
| Google | Sales | 2000 |
| Google | Sales | 2001 |
| Google | Sales | 2002 |
| Google | Sales | 2004 |
| Mozilla | Sales | 2002 |
+-----------------------------+
我对一个查询感兴趣,该查询将输出类似于以下内容的行:
+-----------------------------------------+
| company | profession | year |
+---------+------------+------------------+
| Google | Programmer | [2000] |
| Google | Sales | [2000,2001,2002] |
| Google | Sales | [2004] |
| Mozilla | Sales | [2002] |
+-----------------------------------------+
基本功能是只有连续年份才可以分组在一起。
Using PostgreSQL 9.0.
Let's say I have a table containing the fields: company
, profession
and year
. I want to return a result which contains unique companies and professions, but aggregates (into an array is fine) years based on numeric sequence:
Example Table:
+-----------------------------+
| company | profession | year |
+---------+------------+------+
| Google | Programmer | 2000 |
| Google | Sales | 2000 |
| Google | Sales | 2001 |
| Google | Sales | 2002 |
| Google | Sales | 2004 |
| Mozilla | Sales | 2002 |
+-----------------------------+
I'm interested in a query which would output rows similar to the following:
+-----------------------------------------+
| company | profession | year |
+---------+------------+------------------+
| Google | Programmer | [2000] |
| Google | Sales | [2000,2001,2002] |
| Google | Sales | [2004] |
| Mozilla | Sales | [2002] |
+-----------------------------------------+
The essential feature is that only consecutive years shall be grouped together.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
@a_horse_with_no_name 的答案很有价值一个正确的解决方案,就像我在评论中已经说过的那样,作为学习如何在 PostgreSQL 中使用不同类型的窗口函数的好材料。
然而,我不禁感到,对于像这样的问题,该答案中所采取的方法有点太过了。基本上,在继续在数组中聚合年份之前,您需要的是一个附加的分组标准。您已经有了
公司
和职业
,现在您只需要一些东西来区分属于不同序列的年份。这正是上述答案所提供的,这正是我认为可以用更简单的方式完成的。方法如下:
There's much value to @a_horse_with_no_name's answer, both as a correct solution and, like I already said in a comment, as a good material for learning how to use different kinds of window functions in PostgreSQL.
And yet I cannot help feeling that the approach taken in that answer is a bit too much of an effort for a problem like this one. Basically, what you need is an additional criterion for grouping before you go on aggregating years in arrays. You've already got
company
andprofession
, now you only need something to distinguish years that belong to different sequences.That is just what the above mentioned answer provides and that is precisely what I think can be done in a simpler way. Here's how:
使用 PL/pgSQL 的程序解决方案
对于具有聚合/Windows 函数的普通 SQL 来说,这个问题相当难以处理。虽然循环通常比使用纯 SQL 的基于集合的解决方案慢,但使用 PL/pgSQL 的过程解决方案可以通过对表进行单次顺序扫描(
FOR 循环),并且在这种特殊情况下应该明显更快:
测试表:
函数:
调用:
db>>fiddle 此处
生成请求的结果。
Procedural solution with PL/pgSQL
The problem is rather unwieldy for plain SQL with aggregate / windows functions. While looping is typically slower than set-based solutions with plain SQL, a procedural solution with PL/pgSQL can make do with a single sequential scan over the table (implicit cursor of a
FOR
loop) and should be substantially faster in this particular case:Test table:
Function:
Call:
db<>fiddle here
Produces the requested result.
识别非连续值总是有点棘手,并且涉及多个嵌套子查询(至少我无法想出更好的解决方案)。
第一步是识别年份的非连续值:
步骤 1) 识别非连续值
这将返回以下结果:
现在,使用 group_cnt 值,我们可以为具有连续年份的每个组创建“组 ID”:
步骤 2 ) 定义组 ID
这将返回以下结果:
如您所见,每个“组”都有自己的 group_nr,我们最终可以通过添加另一个派生表来使用它进行聚合:
步骤 3) 最终查询
以下结果:
这将返回 正是如果我没记错的话,你想要什么。
Identifying non-consecutive values is always a bit tricky and involves several nested sub-queries (at least I cannot come up with a better solution).
The first step is to identify non-consecutive values for the year:
Step 1) Identify non-consecutive values
This returns the following result:
Now with the group_cnt value we can create "group IDs" for each group that has consecutive years:
Step 2) Define group IDs
This returns the following result:
As you can see each "group" got its own group_nr and this we can finally use to aggregate over by adding yet another derived table:
Step 3) Final query
This returns the following result:
Which is exactly what you wanted, if I'm not mistaken.