在同一分区上应用多个窗口函数

发布于 2024-08-14 18:02:08 字数 501 浏览 11 评论 0原文

是否可以将多个窗口函数应用于同一分区？（如果我没有使用正确的词汇，请纠正我）

例如，您可以这样做

SELECT name, first_value() over (partition by name order by date) from table1

但是有没有办法做类似的事情：

SELECT name, (first_value() as f, last_value() as l (partition by name order by date)) from table1

我们在同一个窗口上应用两个函数？

参考： http://postgresql.ro/docs/8.4/static/tutorial-window.html

原文

Is it possible to apply multiple window functions to the same partition? (Correct me if I'm not using the right vocabulary)

For example you can do

SELECT name, first_value() over (partition by name order by date) from table1

But is there a way to do something like:

SELECT name, (first_value() as f, last_value() as l (partition by name order by date)) from table1

Where we are applying two functions onto the same window?

Reference:
http://postgresql.ro/docs/8.4/static/tutorial-window.html

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

灯角 2024-08-21 18:02:08

你能不能只使用每个选择的窗口

类似的东西

SELECT  name, 
        first_value() OVER (partition by name order by date) as f, 
        last_value() OVER (partition by name order by date) as l 
from table1

也从你的参考中你可以这样做

SELECT sum(salary) OVER w, avg(salary) OVER w
FROM empsalary
WINDOW w AS (PARTITION BY depname ORDER BY salary DESC)

Can you not just use the window per selection

Something like

SELECT  name, 
        first_value() OVER (partition by name order by date) as f, 
        last_value() OVER (partition by name order by date) as l 
from table1

Also from your reference you can do it like this

SELECT sum(salary) OVER w, avg(salary) OVER w
FROM empsalary
WINDOW w AS (PARTITION BY depname ORDER BY salary DESC)

回复收藏 0 原文

昇り龍 2024-08-21 18:02:08

警告：我不会删除此答案，因为它在技术上似乎是正确的，因此可能会有所帮助，但请注意PARTITION BY bar ORDER BY foo 无论如何可能不是您想要做的。事实上，聚合函数不会将分区元素作为一个整体进行计算。也就是说，SELECT avg(foo) OVER (PARTITION BY bar ORDER BY foo)不等于SELECT avg(foo) OVER (PARTITION BY bar)< /code>（参见答案末尾的证明）。

虽然它本身并不能提高性能，但如果您多次使用同一个分区，您可能想要使用 astander 提出的第二种语法，不仅仅是因为它编写起来更便宜。这就是原因。

考虑以下查询：

SELECT 
  array_agg(foo)
    OVER (PARTITION BY bar ORDER BY foo), 
  avg(baz)
    OVER (PARTITION BY bar ORDER BY foo) 
FROM 
  foobar;

由于原则上排序对平均值的计算没有影响，因此您可能会想改用以下查询（第二个分区上没有排序）：

SELECT 
  array_agg(foo) 
    OVER (PARTITION BY bar ORDER BY foo), 
  avg(baz)
    OVER (PARTITION BY bar) 
FROM 
  foobar;

这是一个大错误，因为这需要更长的时间。证明：

> EXPLAIN ANALYZE SELECT array_agg(foo) OVER (PARTITION BY bar ORDER BY foo), avg(baz) OVER (PARTITION BY bar ORDER BY foo) FROM foobar;
                                                           QUERY PLAN                                                        
---------------------------------------------------------------------------------------------------------------------------------
 WindowAgg  (cost=215781.92..254591.76 rows=1724882 width=12) (actual time=969.659..2353.865 rows=1724882 loops=1)
   ->  Sort  (cost=215781.92..220094.12 rows=1724882 width=12) (actual time=969.640..1083.039 rows=1724882 loops=1)
         Sort Key: bar, foo
         Sort Method: quicksort  Memory: 130006kB
         ->  Seq Scan on foobar  (cost=0.00..37100.82 rows=1724882 width=12) (actual time=0.027..393.815 rows=1724882 loops=1)
 Total runtime: 2458.969 ms
(6 lignes)

> EXPLAIN ANALYZE SELECT array_agg(foo) OVER (PARTITION BY bar ORDER BY foo), avg(baz) OVER (PARTITION BY bar) FROM foobar;
                                                              QUERY PLAN                                                           
---------------------------------------------------------------------------------------------------------------------------------------
 WindowAgg  (cost=215781.92..276152.79 rows=1724882 width=12) (actual time=938.733..2958.811 rows=1724882 loops=1)
   ->  WindowAgg  (cost=215781.92..250279.56 rows=1724882 width=12) (actual time=938.699..2033.172 rows=1724882 loops=1)
         ->  Sort  (cost=215781.92..220094.12 rows=1724882 width=12) (actual time=938.683..1062.568 rows=1724882 loops=1)
               Sort Key: bar, foo
               Sort Method: quicksort  Memory: 130006kB
               ->  Seq Scan on foobar  (cost=0.00..37100.82 rows=1724882 width=12) (actual time=0.028..377.299 rows=1724882 loops=1)
 Total runtime: 3060.041 ms
(7 lignes)

现在，如果您意识到这个问题，您当然会在任何地方使用相同的分区。但是，当您有十次或更多相同的分区并且需要几天时间更新它时，很容易忘记在本身不需要它的分区上添加 ORDER BY 子句。

这里出现了 WINDOW 语法，它将防止您犯这样的粗心错误（当然，前提是您知道最好尽量减少不同窗口函数的数量）。以下内容与第一个查询严格等效（据我从 EXPLAIN ANALYZE 得知）：

SELECT
  array_agg(foo)
    OVER qux,
  avg(baz)
    OVER qux
FROM
  foobar
WINDOW
  qux AS (PARTITION BY bar ORDER BY bar)

警告后更新：

我理解“SELECT avg(foo) OVER (PARTITION BY bar ORDER BY foo) 不等于SELECT avg(foo) OVER (PARTITION BY bar)”似乎有问题，所以这里是一个例子：

# SELECT * FROM foobar;
 foo | bar 
-----+-----
   1 |   1
   2 |   2
   3 |   1
   4 |   2
(4 lines)

# SELECT array_agg(foo) OVER qux, avg(foo) OVER qux FROM foobar WINDOW qux AS (PARTITION BY bar);
 array_agg | avg 
-----------+-----
 {1,3}     |   2
 {1,3}     |   2
 {2,4}     |   3
 {2,4}     |   3
 (4 lines)

# SELECT array_agg(foo) OVER qux, avg(foo) OVER qux FROM foobar WINDOW qux AS (PARTITION BY bar ORDER BY foo);
 array_agg | avg 
-----------+-----
 {1}       |   1
 {1,3}     |   2
 {2}       |   2
 {2,4}     |   3
(4 lines)

Warning : I don't delete this answer since it seems technically correct and therefore may be helpful, but beware that PARTITION BY bar ORDER BY foo is probably not what you want to do anyway. Indeed, aggregate functions won't compute the partition elements as a whole. That is, SELECT avg(foo) OVER (PARTITION BY bar ORDER BY foo) is not equivalent to SELECT avg(foo) OVER (PARTITION BY bar) (see proof at the end of the answer).

Though it doesn't improve performance per se, if you use multiple times the same partition, you probably want to use the second syntax proposed by astander, and not only because it's cheaper to write. Here is why.

Consider the following query :

SELECT 
  array_agg(foo)
    OVER (PARTITION BY bar ORDER BY foo), 
  avg(baz)
    OVER (PARTITION BY bar ORDER BY foo) 
FROM 
  foobar;

Since in principle the ordering has no effect on the computation of the average, you might be tempted to use the following query instead (no ordering on the second partition) :

SELECT 
  array_agg(foo) 
    OVER (PARTITION BY bar ORDER BY foo), 
  avg(baz)
    OVER (PARTITION BY bar) 
FROM 
  foobar;

This is a big mistake, as it will take much longer. Proof :

> EXPLAIN ANALYZE SELECT array_agg(foo) OVER (PARTITION BY bar ORDER BY foo), avg(baz) OVER (PARTITION BY bar ORDER BY foo) FROM foobar;
                                                           QUERY PLAN                                                        
---------------------------------------------------------------------------------------------------------------------------------
 WindowAgg  (cost=215781.92..254591.76 rows=1724882 width=12) (actual time=969.659..2353.865 rows=1724882 loops=1)
   ->  Sort  (cost=215781.92..220094.12 rows=1724882 width=12) (actual time=969.640..1083.039 rows=1724882 loops=1)
         Sort Key: bar, foo
         Sort Method: quicksort  Memory: 130006kB
         ->  Seq Scan on foobar  (cost=0.00..37100.82 rows=1724882 width=12) (actual time=0.027..393.815 rows=1724882 loops=1)
 Total runtime: 2458.969 ms
(6 lignes)

> EXPLAIN ANALYZE SELECT array_agg(foo) OVER (PARTITION BY bar ORDER BY foo), avg(baz) OVER (PARTITION BY bar) FROM foobar;
                                                              QUERY PLAN                                                           
---------------------------------------------------------------------------------------------------------------------------------------
 WindowAgg  (cost=215781.92..276152.79 rows=1724882 width=12) (actual time=938.733..2958.811 rows=1724882 loops=1)
   ->  WindowAgg  (cost=215781.92..250279.56 rows=1724882 width=12) (actual time=938.699..2033.172 rows=1724882 loops=1)
         ->  Sort  (cost=215781.92..220094.12 rows=1724882 width=12) (actual time=938.683..1062.568 rows=1724882 loops=1)
               Sort Key: bar, foo
               Sort Method: quicksort  Memory: 130006kB
               ->  Seq Scan on foobar  (cost=0.00..37100.82 rows=1724882 width=12) (actual time=0.028..377.299 rows=1724882 loops=1)
 Total runtime: 3060.041 ms
(7 lignes)

Now, if you are aware of this issue, of course you will use the same partition everywhere. But when you have ten times or more the same partition and you are updating it over days, it is quite easy to forget to add the ORDER BY clause on a partition which doesn't need it by itself.

Here comes the WINDOW syntax, which will prevent you from such careless mistakes (provided, of course, you're aware it's better to minimize the number of different window functions). The following is strictly equivalent (as far as I can tell from EXPLAIN ANALYZE) to the first query :

SELECT
  array_agg(foo)
    OVER qux,
  avg(baz)
    OVER qux
FROM
  foobar
WINDOW
  qux AS (PARTITION BY bar ORDER BY bar)

Post-warning update :

I understand the statement that "SELECT avg(foo) OVER (PARTITION BY bar ORDER BY foo) is not equivalent to SELECT avg(foo) OVER (PARTITION BY bar)" seems questionable, so here is an example :

# SELECT * FROM foobar;
 foo | bar 
-----+-----
   1 |   1
   2 |   2
   3 |   1
   4 |   2
(4 lines)

# SELECT array_agg(foo) OVER qux, avg(foo) OVER qux FROM foobar WINDOW qux AS (PARTITION BY bar);
 array_agg | avg 
-----------+-----
 {1,3}     |   2
 {1,3}     |   2
 {2,4}     |   3
 {2,4}     |   3
 (4 lines)

# SELECT array_agg(foo) OVER qux, avg(foo) OVER qux FROM foobar WINDOW qux AS (PARTITION BY bar ORDER BY foo);
 array_agg | avg 
-----------+-----
 {1}       |   1
 {1,3}     |   2
 {2}       |   2
 {2,4}     |   3
(4 lines)

回复收藏 0 原文

~没有更多了~