PostgreSQL:“按分钟”运行查询的行数

发布于 2024-12-16 19:27:02 字数 254 浏览 2 评论 0原文

我需要每分钟查询截至该分钟的总行数。

到目前为止我所能达到的最好成绩并不能解决问题。它返回每分钟的计数,而不是每分钟的总计数:

SELECT COUNT(id) AS count
     , EXTRACT(hour from "when") AS hour
     , EXTRACT(minute from "when") AS minute
  FROM mytable
 GROUP BY hour, minute

I need to query for each minute the total count of rows up to that minute.

The best I could achieve so far doesn't do the trick. It returns count per minute, not the total count up to each minute:

SELECT COUNT(id) AS count
     , EXTRACT(hour from "when") AS hour
     , EXTRACT(minute from "when") AS minute
  FROM mytable
 GROUP BY hour, minute

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

揽清风入怀 2024-12-23 19:27:02

仅返回活动

最短

SELECT DISTINCT
       date_trunc('minute', "when") AS minute
     , count(*) OVER (ORDER BY date_trunc('minute', "when")) AS running_ct
FROM   mytable
ORDER  BY 1;

使用的分钟 date_trunc(),它返回的正是您所需要的。

不要在查询中包含 id,因为您想要对分钟切片进行 GROUP BY

count() 通常用作普通聚合函数。附加 OVER 子句使其成为窗口函数。在窗口定义中省略 PARTITION BY - 您希望对所有行进行运行计数。默认情况下,按照 ORDER BY 的定义,从当前行的第一行到最后一个对等行进行计数。 手册

默认的成帧选项是RANGE UNBOUNDED PRECEDING,这是
无界前行和当前行之间的范围相同。使用ORDER BY
这将框架设置为从分区开始的所有行
通过当前行的最后一个 ORDER BY 对等点。

而这恰好正是您所需要的。

使用 count(*) 而不是 count(id)。它更适合您的问题(“行数”)。它通常比 count(id)。而且,虽然我们可能假设 idNOT NULL,但问题中尚未指定它,因此 count(id)错误,严格来说,因为count(id)不计算NULL值。

您无法在同一查询级别对分钟切片进行GROUP BY。聚合函数在窗口函数之前应用,这样窗口函数 count(*) 每分钟只能看到 1 行。
不过,您可以SELECT DISTINCT,因为DISTINCT 是在窗口函数之后应用的。

ORDER BY 1 只是这里 ORDER BY date_trunc('month', "when") 的简写。
1 是对 SELECT 列表中第一个表达式的位置引用。

如果需要,请使用 to_char()格式化结果。喜欢:

SELECT DISTINCT
       to_char(date_trunc('minute', "when"), 'DD.MM.YYYY HH24:MI') AS minute
     , count(*) OVER (ORDER BY date_trunc('minute', "when")) AS running_ct
FROM   mytable
ORDER  BY date_trunc('minute', "when");

最快

SELECT minute, sum(minute_ct) OVER (ORDER BY minute) AS running_ct
FROM  (
   SELECT date_trunc('minute', "when") AS minute
        , count(*) AS minute_ct
   FROM   tbl
   GROUP  BY 1
   ) sub
ORDER  BY 1;

与上面非常相似,但是:

我使用子查询来聚合和计算每分钟的行数。这样,我们每分钟获得 1 行,而外部 SELECT 中没有 DISTINCT

现在使用 sum() 作为窗口聚合函数来累加子查询的计数。

我发现这要快得多,每分钟有很多行。

包括无活动的分钟数

最短

@GabiMe 询问在评论中如何在时间内 分钟获取一行帧,包括那些没有发生事件的帧(基表中没有行):

SELECT DISTINCT
       minute, count(c.minute) OVER (ORDER BY minute) AS running_ct
FROM  (
   SELECT generate_series(date_trunc('minute', min("when"))
                        ,                      max("when")
                        , interval '1 min')
   FROM   tbl
   ) m(minute)
LEFT   JOIN (SELECT date_trunc('minute', "when") FROM tbl) c(minute) USING (minute)
ORDER  BY 1;

使用 generate_series() - 此处直接基于子查询的聚合值。

LEFT JOIN 到截断为分钟和计数的所有时间戳。 NULL 值(不存在行)不会添加到运行计数中。

最快的

CTE:

WITH cte AS (
   SELECT date_trunc('minute', "when") AS minute, count(*) AS minute_ct
   FROM   tbl
   GROUP  BY 1
   ) 
SELECT m.minute
     , COALESCE(sum(cte.minute_ct) OVER (ORDER BY m.minute), 0) AS running_ct
FROM  (
   SELECT generate_series(min(minute), max(minute), interval '1 min')
   FROM   cte
   ) m(minute)
LEFT   JOIN cte USING (minute)
ORDER  BY 1;

同样,在第一步中聚合并计算每分钟的行数,它省略了后面的 DISTINCT 的需要。

count() 不同,sum() 可以返回 NULL。默认为 0COALESCE

对于许多行和“when”上的索引,这个带有子查询的版本是我使用 Postgres 9.1 - 9.4 测试的几个变体中最快的:

SELECT m.minute
     , COALESCE(sum(c.minute_ct) OVER (ORDER BY m.minute), 0) AS running_ct
FROM  (
   SELECT generate_series(date_trunc('minute', min("when"))
                        ,                      max("when")
                        , interval '1 min')
   FROM   tbl
   ) m(minute)
LEFT   JOIN (
   SELECT date_trunc('minute', "when") AS minute
        , count(*) AS minute_ct
   FROM   tbl
   GROUP  BY 1
   ) c USING (minute)
ORDER  BY 1;

Return only minutes with activity

Shortest

SELECT DISTINCT
       date_trunc('minute', "when") AS minute
     , count(*) OVER (ORDER BY date_trunc('minute', "when")) AS running_ct
FROM   mytable
ORDER  BY 1;

Use date_trunc(), it returns exactly what you need.

Don't include id in the query, since you want to GROUP BY minute slices.

count() is typically used as plain aggregate function. Appending an OVER clause makes it a window function. Omit PARTITION BY in the window definition - you want a running count over all rows. By default, that counts from the first row to the last peer of the current row as defined by ORDER BY. The manual:

The default framing option is RANGE UNBOUNDED PRECEDING, which is the
same as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. With ORDER BY,
this sets the frame to be all rows from the partition start up
through the current row's last ORDER BY peer.

And that happens to be exactly what you need.

Use count(*) rather than count(id). It better fits your question ("count of rows"). It is generally slightly faster than count(id). And, while we might assume that id is NOT NULL, it has not been specified in the question, so count(id) is wrong, strictly speaking, because NULL values are not counted with count(id).

You can't GROUP BY minute slices at the same query level. Aggregate functions are applied before window functions, the window function count(*) would only see 1 row per minute this way.
You can, however, SELECT DISTINCT, because DISTINCT is applied after window functions.

ORDER BY 1 is just shorthand for ORDER BY date_trunc('minute', "when") here.
1 is a positional reference reference to the 1st expression in the SELECT list.

Use to_char() if you need to format the result. Like:

SELECT DISTINCT
       to_char(date_trunc('minute', "when"), 'DD.MM.YYYY HH24:MI') AS minute
     , count(*) OVER (ORDER BY date_trunc('minute', "when")) AS running_ct
FROM   mytable
ORDER  BY date_trunc('minute', "when");

Fastest

SELECT minute, sum(minute_ct) OVER (ORDER BY minute) AS running_ct
FROM  (
   SELECT date_trunc('minute', "when") AS minute
        , count(*) AS minute_ct
   FROM   tbl
   GROUP  BY 1
   ) sub
ORDER  BY 1;

Much like the above, but:

I use a subquery to aggregate and count rows per minute. This way we get 1 row per minute without DISTINCT in the outer SELECT.

Use sum() as window aggregate function now to add up the counts from the subquery.

I found this to be substantially faster with many rows per minute.

Include minutes without activity

Shortest

@GabiMe asked in a comment how to get eone row for every minute in the time frame, including those where no event occured (no row in base table):

SELECT DISTINCT
       minute, count(c.minute) OVER (ORDER BY minute) AS running_ct
FROM  (
   SELECT generate_series(date_trunc('minute', min("when"))
                        ,                      max("when")
                        , interval '1 min')
   FROM   tbl
   ) m(minute)
LEFT   JOIN (SELECT date_trunc('minute', "when") FROM tbl) c(minute) USING (minute)
ORDER  BY 1;

Generate a row for every minute in the time frame between the first and the last event with generate_series() - here directly based on aggregated values from the subquery.

LEFT JOIN to all timestamps truncated to the minute and count. NULL values (where no row exists) do not add to the running count.

Fastest

With CTE:

WITH cte AS (
   SELECT date_trunc('minute', "when") AS minute, count(*) AS minute_ct
   FROM   tbl
   GROUP  BY 1
   ) 
SELECT m.minute
     , COALESCE(sum(cte.minute_ct) OVER (ORDER BY m.minute), 0) AS running_ct
FROM  (
   SELECT generate_series(min(minute), max(minute), interval '1 min')
   FROM   cte
   ) m(minute)
LEFT   JOIN cte USING (minute)
ORDER  BY 1;

Again, aggregate and count rows per minute in the first step, it omits the need for later DISTINCT.

Different from count(), sum() can return NULL. Default to 0 with COALESCE.

With many rows and an index on "when" this version with a subquery was fastest among a couple of variants I tested with Postgres 9.1 - 9.4:

SELECT m.minute
     , COALESCE(sum(c.minute_ct) OVER (ORDER BY m.minute), 0) AS running_ct
FROM  (
   SELECT generate_series(date_trunc('minute', min("when"))
                        ,                      max("when")
                        , interval '1 min')
   FROM   tbl
   ) m(minute)
LEFT   JOIN (
   SELECT date_trunc('minute', "when") AS minute
        , count(*) AS minute_ct
   FROM   tbl
   GROUP  BY 1
   ) c USING (minute)
ORDER  BY 1;
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文