PostgreSQL 窗口函数:按列分组而不排序(~ Python itertools.groupby)

发布于 2024-10-18 13:28:32 字数 713 浏览 1 评论 0原文

我需要根据列对 PostgreSQL 中的表进行分区,而不进行排序 &使结果独一无二;基本上我想要实现的是在 PostgreSQL 中重现 Python 的 itertools.groupby() 行为。

给定包含两列的表:

1   C
2   C
3   C
4   C
5   C
6   C
7   C
8   C
9   C
10  S
11  E
12  E
13  E
14  E
15  E
16  C
17  C
18  C
19  C
20  E
21  E
22  E
23  E
24  E

我想按第二列中的值对其进行分区(同时保留现有顺序),最终得到以下结果:

1   C   1
2   C   1
3   C   1
4   C   1
5   C   1
6   C   1
7   C   1
8   C   1
9   C   1
10  S   2
11  E   3
12  E   3
13  E   3
14  E   3
15  E   3
16  C   4
17  C   4
18  C   4
19  C   4
20  E   5
21  E   5
22  E   5
23  E   5
24  E   5

我尝试使用窗口函数来实现这一点,使用 ROW_NUMBER( ) 和 LAG() 将当前行与前一行进行比较,看看它是否已更改。在这种情况下的问题是我还需要一个每次值变化时都会递增的变量。

I need to partition a table in PostgreSQL based on a column without sorting & making the result unique; Basically what I am trying to achieve is to reproduce the itertools.groupby() behavior from Python in PostgreSQL.

Given the table containing two columns:

1   C
2   C
3   C
4   C
5   C
6   C
7   C
8   C
9   C
10  S
11  E
12  E
13  E
14  E
15  E
16  C
17  C
18  C
19  C
20  E
21  E
22  E
23  E
24  E

I want to partition it by the value in the second (whilst preserving the existing order), to end up with this:

1   C   1
2   C   1
3   C   1
4   C   1
5   C   1
6   C   1
7   C   1
8   C   1
9   C   1
10  S   2
11  E   3
12  E   3
13  E   3
14  E   3
15  E   3
16  C   4
17  C   4
18  C   4
19  C   4
20  E   5
21  E   5
22  E   5
23  E   5
24  E   5

I tried to achieve that with window functions, using a combination of ROW_NUMBER() and LAG() to compare the current row with the previous to see if it has changed. The problem in that case was that I would also need a variable that increments every time the value changes.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

雨轻弹 2024-10-25 13:28:32

试试这个:

WITH T1 AS
(
    SELECT
        id,
        grp,
        LAG(grp) OVER (ORDER BY id) IS DISTINCT FROM grp AS changes
    FROM yourtable
)
SELECT id, grp, SUM(changes::int) OVER (ORDER BY id) FROM T1

Try this:

WITH T1 AS
(
    SELECT
        id,
        grp,
        LAG(grp) OVER (ORDER BY id) IS DISTINCT FROM grp AS changes
    FROM yourtable
)
SELECT id, grp, SUM(changes::int) OVER (ORDER BY id) FROM T1
迎风吟唱 2024-10-25 13:28:32

尽管您可以提出完全基于窗口函数的解决方案,但这种结果集的状态处理似乎最适合迭代方法。这是一个之前提出过类似问题的问题: PostgreSQL 函数迭代/作用于具有状态的许多行

基本上,仅窗口方法归结为在开始处生成一个带有“1”(或任何其他正整数)的整数列每个分区(因此其中 col2>>LAG(col2)),然后使用另一个窗口聚合对从结果集开始到当前行的这些组指标进行求和。它有效,只是在复杂性和(我相信)性能方面都很丑陋。

Although you can come up with a solution based entirely on window functions, this kind of stateful processing of the result set seems best suited to an iterative approach. Here's a question that asked something a bit similar before: PostgreSQL function to iterate through/act on many rows with state

Basically the window-only approach comes down to producing an integer column with a "1" (or any other positive integer) at the start of each partition (so where col2<>LAG(col2)) and then using another window aggregation to sum those group indicators from the start of the result set up to the current row. It works, it's just ugly both in complexity and (I believe) in performance.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文