PostgreSQL 窗口函数:按列分组而不排序(~ Python itertools.groupby)
我需要根据列对 PostgreSQL 中的表进行分区,而不进行排序 &使结果独一无二;基本上我想要实现的是在 PostgreSQL 中重现 Python 的 itertools.groupby()
行为。
给定包含两列的表:
1 C 2 C 3 C 4 C 5 C 6 C 7 C 8 C 9 C 10 S 11 E 12 E 13 E 14 E 15 E 16 C 17 C 18 C 19 C 20 E 21 E 22 E 23 E 24 E
我想按第二列中的值对其进行分区(同时保留现有顺序),最终得到以下结果:
1 C 1 2 C 1 3 C 1 4 C 1 5 C 1 6 C 1 7 C 1 8 C 1 9 C 1 10 S 2 11 E 3 12 E 3 13 E 3 14 E 3 15 E 3 16 C 4 17 C 4 18 C 4 19 C 4 20 E 5 21 E 5 22 E 5 23 E 5 24 E 5
我尝试使用窗口函数来实现这一点,使用 ROW_NUMBER( ) 和 LAG()
将当前行与前一行进行比较,看看它是否已更改。在这种情况下的问题是我还需要一个每次值变化时都会递增的变量。
I need to partition a table in PostgreSQL based on a column without sorting & making the result unique; Basically what I am trying to achieve is to reproduce the itertools.groupby()
behavior from Python in PostgreSQL.
Given the table containing two columns:
1 C 2 C 3 C 4 C 5 C 6 C 7 C 8 C 9 C 10 S 11 E 12 E 13 E 14 E 15 E 16 C 17 C 18 C 19 C 20 E 21 E 22 E 23 E 24 E
I want to partition it by the value in the second (whilst preserving the existing order), to end up with this:
1 C 1 2 C 1 3 C 1 4 C 1 5 C 1 6 C 1 7 C 1 8 C 1 9 C 1 10 S 2 11 E 3 12 E 3 13 E 3 14 E 3 15 E 3 16 C 4 17 C 4 18 C 4 19 C 4 20 E 5 21 E 5 22 E 5 23 E 5 24 E 5
I tried to achieve that with window functions, using a combination of ROW_NUMBER()
and LAG()
to compare the current row with the previous to see if it has changed. The problem in that case was that I would also need a variable that increments every time the value changes.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
试试这个:
Try this:
尽管您可以提出完全基于窗口函数的解决方案,但这种结果集的状态处理似乎最适合迭代方法。这是一个之前提出过类似问题的问题: PostgreSQL 函数迭代/作用于具有状态的许多行
基本上,仅窗口方法归结为在开始处生成一个带有“1”(或任何其他正整数)的整数列每个分区(因此其中
col2>>LAG(col2)
),然后使用另一个窗口聚合对从结果集开始到当前行的这些组指标进行求和。它有效,只是在复杂性和(我相信)性能方面都很丑陋。Although you can come up with a solution based entirely on window functions, this kind of stateful processing of the result set seems best suited to an iterative approach. Here's a question that asked something a bit similar before: PostgreSQL function to iterate through/act on many rows with state
Basically the window-only approach comes down to producing an integer column with a "1" (or any other positive integer) at the start of each partition (so where
col2<>LAG(col2)
) and then using another window aggregation to sum those group indicators from the start of the result set up to the current row. It works, it's just ugly both in complexity and (I believe) in performance.