按日期分组,当 count() 不产生任何行时为 0

发布于 2025-01-08 08:04:44 字数 989 浏览 0 评论 0原文

我正在使用 Postgresql 9,当没有计算行时,我正在与计数和分组作斗争。

让我们假设以下模式:

create table views {
 date_event timestamp with time zone ;
 event_id integer;
}

让我们想象以下内容:

2012-01-01 00:00:05    2
2012-01-01 01:00:05    5
2012-01-01 03:00:05    8
2012-01-01 03:00:15    20

我想按小时分组,并计算行数。我希望我可以检索以下内容:

2012-01-01 00:00:00    1
2012-01-01 01:00:00    1
2012-01-01 02:00:00    0
2012-01-01 03:00:00    2
2012-01-01 04:00:00    0
2012-01-01 05:00:00    0
.
.

2012-01-07 23:00:00    0

我的意思是,对于每个时间范围槽,我计算表中日期对应的行数,否则,我返回计数为零的行。

以下肯定行不通(只会产生计数行数> 0的行)。

SELECT  extract ( hour from date_event ),count(*)
FROM views
where date_event > '2012-01-01' and date_event <'2012-01-07'
GROUP BY extract ( hour from date_event );

请注意,我可能还需要按分钟、按小时、按天、按月或按年分组(当然可以进行多个查询)。

我只能使用普通的旧 SQL,并且由于我的视图表可能非常大(> 100M 记录),因此我尝试牢记性能。

如何才能实现这一目标?

谢谢 !

I'm using Postgresql 9 and I'm fighting with counting and grouping when no lines are counted.

Let's assume the following schema :

create table views {
 date_event timestamp with time zone ;
 event_id integer;
}

Let's imagine the following content :

2012-01-01 00:00:05    2
2012-01-01 01:00:05    5
2012-01-01 03:00:05    8
2012-01-01 03:00:15    20

I want to group by hour, and count the number of lines. I wish I could retrieve the following :

2012-01-01 00:00:00    1
2012-01-01 01:00:00    1
2012-01-01 02:00:00    0
2012-01-01 03:00:00    2
2012-01-01 04:00:00    0
2012-01-01 05:00:00    0
.
.

2012-01-07 23:00:00    0

I mean that for each time range slot, I count the number of lines in my table whose date correspond, otherwise, I return a line with a count at zero.

The following will definitely not work (will yeld only lines with counted lines > 0).

SELECT  extract ( hour from date_event ),count(*)
FROM views
where date_event > '2012-01-01' and date_event <'2012-01-07'
GROUP BY extract ( hour from date_event );

Please note I might also need to group by minute, or by hour, or by day, or by month, or by year (multiple queries is possible of course).

I can only use plain old sql, and since my views table can be very big (>100M records), I try to keep performance in mind.

How can this be achieved ?

Thank you !

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

俏︾媚 2025-01-15 08:04:44

鉴于表中没有日期,您需要一种方法来生成它们。您可以使用 generate_series 函数:

SELECT * FROM generate_series('2012-01-01'::timestamp, '2012-01-07 23:00', '1 hour') AS ts;

这将产生如下结果:

         ts          
---------------------
 2012-01-01 00:00:00
 2012-01-01 01:00:00
 2012-01-01 02:00:00
 2012-01-01 03:00:00
...
 2012-01-07 21:00:00
 2012-01-07 22:00:00
 2012-01-07 23:00:00
(168 rows)

剩下的任务是使用如下所示的外连接来连接两个选择:

select extract ( day from ts ) as day, extract ( hour from ts ) as hour,coalesce(count,0) as count from 
(
    SELECT  extract ( day from date ) as day , extract ( hour from date ) as hr ,count(*)
    FROM    sr
    where date>'2012-01-01' and date <'2012-01-07'
    GROUP BY   extract ( day from date ) , extract ( hour from date )
) AS cnt 
 right outer join ( SELECT * FROM generate_series ( '2012-01-01'::timestamp, '2012-01-07 23:00', '1 hour') AS ts ) as dtetable on extract ( hour from ts ) = cnt.hr and extract ( day from ts ) = cnt.day 
 order by day,hour asc;

Given that you don't have the dates in the table, you need a way to generate them. You can use the generate_series function:

SELECT * FROM generate_series('2012-01-01'::timestamp, '2012-01-07 23:00', '1 hour') AS ts;

This will produce results like this:

         ts          
---------------------
 2012-01-01 00:00:00
 2012-01-01 01:00:00
 2012-01-01 02:00:00
 2012-01-01 03:00:00
...
 2012-01-07 21:00:00
 2012-01-07 22:00:00
 2012-01-07 23:00:00
(168 rows)

The remaining task is to join the two selects using an outer join like this :

select extract ( day from ts ) as day, extract ( hour from ts ) as hour,coalesce(count,0) as count from 
(
    SELECT  extract ( day from date ) as day , extract ( hour from date ) as hr ,count(*)
    FROM    sr
    where date>'2012-01-01' and date <'2012-01-07'
    GROUP BY   extract ( day from date ) , extract ( hour from date )
) AS cnt 
 right outer join ( SELECT * FROM generate_series ( '2012-01-01'::timestamp, '2012-01-07 23:00', '1 hour') AS ts ) as dtetable on extract ( hour from ts ) = cnt.hr and extract ( day from ts ) = cnt.day 
 order by day,hour asc;
热情消退 2025-01-15 08:04:44

这个查询将为您提供您正在寻找的输出,

select to_char(date_event, 'YYYY-MM-DD HH24:00') as time, 
    count (to_char(date_event, 'HH24:00')) as count 
    from views 
    where date(date_event) between '2012-01-01' and '2012-01-07' 
    group by time 
    order by time;

This query will give you the output what your are looking for,

select to_char(date_event, 'YYYY-MM-DD HH24:00') as time, 
    count (to_char(date_event, 'HH24:00')) as count 
    from views 
    where date(date_event) between '2012-01-01' and '2012-01-07' 
    group by time 
    order by time;
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文