PostgreSQL组由包括零
我有一个 SQL 查询 (postgresql),看起来像这样:
SELECT
my_timestamp::timestamp::date as the_date,
count(*) as count
FROM my_table
WHERE ...
GROUP BY the_date
ORDER BY the_date
结果是一个由 YYYY-MM-DD, count
对组成的表。
现在我被要求用零填写空日期。因此,如果我之前提供了
2022-03-15 3
2022-03-17 1
我现在想要返回
2022-03-15 3
2022-03-16 0
2022-03-17 1
现在我可以轻松地执行此客户端(相对于数据库)并让我的程序根据 postgres 的原始列表计算并将零增强列表返回给其客户端。但如果我可以告诉 postgresql 包含零,也许会更好。
我怀疑这根本不容易,因为 postgres 没有明显的方法知道我在做什么。但为了更多地了解 postgres 和 SQL,我想我应该尝试一下。到目前为止,这个尝试还不太有希望......
在我得出结论认为我把这个留给我的(postgres客户端)程序是正确的之前,有什么指示吗?
更新
这是一个有趣的案例,我对问题的简化导致了一个对我不起作用的正确答案。对于后来的人,我认为值得记录接下来的内容,因为通过构建 SQL 查询需要一些有趣的曲折。
@a_horse_with_no_name 回复了一个查询,如果我简化自己的查询以进行匹配,我已经验证该查询是否有效。不幸的是,我的查询有一些我认为不相关的额外包袱,因此在发布原始问题时已被删除。
这是我真实的(原始)查询,保留了所有名称(如果缩短):
-- current query
SELECT
LEAST(time1, time2, time3, time4)::timestamp::date as the_date,
count(*) as count
FROM reading_group_reader rgr
INNER JOIN ( SELECT group_id, group_type ::group_type_name
FROM (VALUES (31198, 'excerpt')) as T(group_id, group_type)) TT
ON TT.group_id = rgr.group_id
AND TT.group_type = rgr.group_type
WHERE LEAST(time1, time2, time3, time4) > current_date - 30
GROUP BY the_date
ORDER BY the_date;
但是,如果我将其直接转换为建议的解决方案,则 reading_group_reader
和临时表 TT< 之间的内部联接/code> 导致左连接变为内部(我认为)并且日期序列再次删除其零。 Fwiw,表
TT
是一个表,因为有时它实际上是一个子选择。
所以我将我的查询转换为这样:
SELECT
g.dt::date as the_date,
count(*) as count
FROM generate_series(date '2022-03-06', date '2022-04-06', interval '1 day') as g(dt)
LEFT JOIN (
SELECT
LEAST(rgr.time1, rgr.time2, rgr.time3, rgr.time4)::timestamp::date as the_date
FROM reading_group_reader rgr
INNER JOIN (
SELECT group_id, group_type ::group_type_name
FROM (VALUES (31198, 'excerpt')) as T(group_id, group_type)) TT
ON TT.group_id = rgr.group_id
AND TT.group_type = rgr.group_type
) rgrt
ON rgrt.the_date = g.dt::date
GROUP BY g.dt
ORDER BY the_date;
但这在应该为 0 的地方输出 1,而不是 0。
然而,原因是因为我现在选择了每个日期,所以,当然,每个日期都有一个。我需要包含一个附加字段(将为 NULL)并对其进行计数。
所以这个查询最终达到了我想要的效果:
SELECT
g.dt::date as the_date,
count(rgrt.device_id) as count
FROM generate_series(date '2022-03-06', date '2022-04-06', interval '1 day') as g(dt)
LEFT JOIN (
SELECT
LEAST(rgr.time1, rgr.time2, rgr.time3, rgr.time4)::timestamp::date as the_date,
rgr.device_id
FROM reading_group_reader rgr
INNER JOIN (
SELECT group_id, group_type ::group_type_name
FROM (VALUES (31198, 'excerpt')) as T(group_id, group_type)
) TT
ON TT.group_id = rgr.group_id
AND TT.group_type = rgr.group_type
) rgrt(the_date)
ON rgrt.the_date = g.dt::date
GROUP BY g.dt
ORDER BY g.dt;
当然,在重新阅读已接受的答案时,我最终发现他确实计算了一个不相关的字段,而我在前几个字段中只是错过了它读数。
I have a SQL query (postgresql) that looks something like this:
SELECT
my_timestamp::timestamp::date as the_date,
count(*) as count
FROM my_table
WHERE ...
GROUP BY the_date
ORDER BY the_date
The result is a table of YYYY-MM-DD, count
pairs.
Now I've been asked to fill in the empty dates with zero. So if I was previously providing
2022-03-15 3
2022-03-17 1
I'd now want to return
2022-03-15 3
2022-03-16 0
2022-03-17 1
Now I can easily do this client-side (relative to the database) and let my program compute and return the zero-augmented list to its clients based on the original list from postgres. But perhaps it would better if I could just tell postgresql to include zeros.
I suspect this isn't easy at all, because postgres has no obvious way of knowing what I'm up to. But in the interests of learning more about postgres and SQL, I thought I'd have try. The try isn't too promising thus far...
Any pointers before I conclude that I was right to leave this to my (postgres client) program?
Update
This is an interesting case where my simplification of the problem led to a correct answer that didn't work for me. For those who come after, I thought it worth documenting what followed, because it take some fun twists through constructing SQL queries.
@a_horse_with_no_name responded with a query that I've verified works if I simplify my own query to match. Unfortunately, my query had some extra baggage that I didn't think pertinent, and so had trimmed out when posting the original question.
Here's my real (original) query, with all names preserved (if shortened):
-- current query
SELECT
LEAST(time1, time2, time3, time4)::timestamp::date as the_date,
count(*) as count
FROM reading_group_reader rgr
INNER JOIN ( SELECT group_id, group_type ::group_type_name
FROM (VALUES (31198, 'excerpt')) as T(group_id, group_type)) TT
ON TT.group_id = rgr.group_id
AND TT.group_type = rgr.group_type
WHERE LEAST(time1, time2, time3, time4) > current_date - 30
GROUP BY the_date
ORDER BY the_date;
If I translate that directly into the proposed solution, however, the inner join between reading_group_reader
and the temporary table TT
causes the left join to become inner (I think) and the date sequence drops its zeros again. Fwiw, the table TT
is a table because sometimes it actually is a subselect.
So I transformed my query into this:
SELECT
g.dt::date as the_date,
count(*) as count
FROM generate_series(date '2022-03-06', date '2022-04-06', interval '1 day') as g(dt)
LEFT JOIN (
SELECT
LEAST(rgr.time1, rgr.time2, rgr.time3, rgr.time4)::timestamp::date as the_date
FROM reading_group_reader rgr
INNER JOIN (
SELECT group_id, group_type ::group_type_name
FROM (VALUES (31198, 'excerpt')) as T(group_id, group_type)) TT
ON TT.group_id = rgr.group_id
AND TT.group_type = rgr.group_type
) rgrt
ON rgrt.the_date = g.dt::date
GROUP BY g.dt
ORDER BY the_date;
but this outputs 1's instead of 0's at the places that should be 0.
The reason for that, however, is because I've now selected every date, so, of course, there's one of each. I need to include an additional field (which will be NULL) and count that.
So this query finally does what I want:
SELECT
g.dt::date as the_date,
count(rgrt.device_id) as count
FROM generate_series(date '2022-03-06', date '2022-04-06', interval '1 day') as g(dt)
LEFT JOIN (
SELECT
LEAST(rgr.time1, rgr.time2, rgr.time3, rgr.time4)::timestamp::date as the_date,
rgr.device_id
FROM reading_group_reader rgr
INNER JOIN (
SELECT group_id, group_type ::group_type_name
FROM (VALUES (31198, 'excerpt')) as T(group_id, group_type)
) TT
ON TT.group_id = rgr.group_id
AND TT.group_type = rgr.group_type
) rgrt(the_date)
ON rgrt.the_date = g.dt::date
GROUP BY g.dt
ORDER BY g.dt;
And, of course, on re-reading the accepted answer, I eventually saw that he did count an unrelated field, which I'd simply missed on my first several readings.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您需要加入日期列表。例如,这可以使用
generate_series()
来完成。请注意,原始 WHERE 条件需要进入
LEFT JOIN
的连接条件。您不能将它们放入 WHERE 子句中,因为这会将外连接变回内连接(这意味着不会返回缺失的日期)。You will need to join to a list of dates. This can e.g. be done using
generate_series()
Note that the original WHERE conditions need to go into the join condition of the
LEFT JOIN
. You can't put them into a WHERE clause because that would turn the outer join back into an inner join (which means the missing dates wouldn't be returned).施放带有零以文本的整数字段对我解决了类似的问题。
Casting the integer field that carries the zeros to text solved a similar problem for me.