如何返回缺失数据计数为 0 的行?
我有几年(2003-2008)的数据分布不均匀(关于日期)。 我想查询给定的开始和结束日期集的数据,按 PostgreSQL 8.3。
问题在于,某些查询在所需时间段内给出连续的结果,如下所示:
select to_char(date_trunc('month',date), 'YYYY-MM-DD'), count(distinct post_id)
from some_table
where category_id = 1
and entity_id = 77
and entity2_id = 115
and date <= '2008-12-06'
and date >= '2007-12-01'
group by date_trunc('month',date)
order by date_trunc('month',date);
to_char | count
------------+-------
2007-12-01 | 64
2008-01-01 | 31
2008-02-01 | 14
2008-03-01 | 21
2008-04-01 | 28
2008-05-01 | 44
2008-06-01 | 100
2008-07-01 | 72
2008-08-01 | 91
2008-09-01 | 92
2008-10-01 | 79
2008-11-01 | 65
(12 rows)
但其中一些查询由于不存在数据而错过了一些间隔,如以下查询:
select to_char(date_trunc('month',date), 'YYYY-MM-DD'), count(distinct post_id)
from some_table
where category_id=1
and entity_id = 75
and entity2_id = 115
and date <= '2008-12-06'
and date >= '2007-12-01'
group by date_trunc('month',date)
order by date_trunc('month',date);
to_char | count
------------+-------
2007-12-01 | 2
2008-01-01 | 2
2008-03-01 | 1
2008-04-01 | 2
2008-06-01 | 1
2008-08-01 | 3
2008-10-01 | 2
(7 rows)
所需结果集为:
to_char | count
------------+-------
2007-12-01 | 2
2008-01-01 | 2
2008-02-01 | 0
2008-03-01 | 1
2008-04-01 | 2
2008-05-01 | 0
2008-06-01 | 1
2008-07-01 | 0
2008-08-01 | 3
2008-09-01 | 0
2008-10-01 | 2
2008-11-01 | 0
(12 rows)
缺失计数为 0条目。
我之前看过 Stack Overflow 上的讨论,但它们似乎没有解决我的问题,因为我的分组周期是(日、周、月、季度、年)之一,并由应用程序决定运行时。 因此,我想像使用日历表或序列表进行左连接这样的方法不会有帮助。
我当前的解决方案是使用日历模块来填补 Python 中的这些空白(在 Turbogears 应用程序中)。
有一个更好的方法吗?
I have unevenly distributed data (wrt date
) for a few years (2003-2008). I want to query data for a given set of start and end date, grouping the data by any of the supported intervals (day, week, month, quarter, year) in PostgreSQL 8.3.
The problem is that some of the queries give results continuous over the required period, as this one:
select to_char(date_trunc('month',date), 'YYYY-MM-DD'), count(distinct post_id)
from some_table
where category_id = 1
and entity_id = 77
and entity2_id = 115
and date <= '2008-12-06'
and date >= '2007-12-01'
group by date_trunc('month',date)
order by date_trunc('month',date);
to_char | count
------------+-------
2007-12-01 | 64
2008-01-01 | 31
2008-02-01 | 14
2008-03-01 | 21
2008-04-01 | 28
2008-05-01 | 44
2008-06-01 | 100
2008-07-01 | 72
2008-08-01 | 91
2008-09-01 | 92
2008-10-01 | 79
2008-11-01 | 65
(12 rows)
But some of them miss some intervals because there is no data present, as this one:
select to_char(date_trunc('month',date), 'YYYY-MM-DD'), count(distinct post_id)
from some_table
where category_id=1
and entity_id = 75
and entity2_id = 115
and date <= '2008-12-06'
and date >= '2007-12-01'
group by date_trunc('month',date)
order by date_trunc('month',date);
to_char | count
------------+-------
2007-12-01 | 2
2008-01-01 | 2
2008-03-01 | 1
2008-04-01 | 2
2008-06-01 | 1
2008-08-01 | 3
2008-10-01 | 2
(7 rows)
where the required resultset is:
to_char | count
------------+-------
2007-12-01 | 2
2008-01-01 | 2
2008-02-01 | 0
2008-03-01 | 1
2008-04-01 | 2
2008-05-01 | 0
2008-06-01 | 1
2008-07-01 | 0
2008-08-01 | 3
2008-09-01 | 0
2008-10-01 | 2
2008-11-01 | 0
(12 rows)
A count of 0 for missing entries.
I have seen earlier discussions on Stack Overflow but they don't solve my problem it seems, since my grouping period is one of (day, week, month, quarter, year) and decided on runtime by the application. So an approach like left join with a calendar table or sequence table will not help I guess.
My current solution to this is to fill in these gaps in Python (in a Turbogears App) using the calendar module.
Is there a better way to do this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这个问题很老了。 但由于其他用户选择它作为新副本的主控,我添加了一个正确的答案。
正确的解决方案
当然使用
LEFT JOIN
。generate_series()
可以生成一个表格动态时间戳,而且速度非常快。 请参阅:在您加入之前聚合通常会更快。 小提琴测试用例的相关答案:
将
timestamp
转换为date
(::date
) 以获取基本格式。 有关更多信息,请使用to_char()
。GROUP BY 1
是引用第一个输出列的语法简写。 也可以是GROUP BY day
,但这可能与现有的同名列冲突。 或者 GROUP BY date_trunc('month', date_col)::date 但这对我来说太长了。使用
date_trunc( )
。count()
永远不会返回NULL
-
0
表示没有行 - 但LEFT JOIN
可以。要在外部
SELECT
中返回0
而不是NULL
,请使用COALESCE(some_count, 0) AS some_count
。 手册。对于 <强>更通用的解决方案或任意时间间隔请参阅:
This question is old. But since fellow users picked it as master for a new duplicate I am adding a proper answer.
Proper solution
Use
LEFT JOIN
, of course.generate_series()
can produce a table of timestamps on the fly, and very fast. See:It's typically faster to aggregate before you join. Related answer with test case in a fiddle:
Cast the
timestamp
todate
(::date
) for a basic format. For more useto_char()
.GROUP BY 1
is syntax shorthand to reference the first output column. Could beGROUP BY day
as well, but that might conflict with an existing column of the same name. OrGROUP BY date_trunc('month', date_col)::date
but that's too long for my taste.Works with the available interval arguments for
date_trunc()
.count()
never returnsNULL
-0
for no rows - but theLEFT JOIN
does.To return
0
instead ofNULL
in the outerSELECT
, useCOALESCE(some_count, 0) AS some_count
. The manual.For a more generic solution or arbitrary time intervals see:
您可以创建去年(例如)所有第一天的列表,
然后您可以加入该系列。
You can create the list of all first days of the last year (say) with
Then you can join with that series.
您可以在运行时创建一个临时表并在其上进行左连接。 这似乎是最有道理的。
You could create a temporary table at runtime and left join on that. That seems to make the most sense.