高性能聚合 where 子句
我正在尝试查找自最后一个用户创建以来三个月内创建的用户数量。全部按州分组。
这是一个有效的查询:
select count(u.id) as numberOfUsers,
s.state
from users u
join states s on u.state_id = s.id
where u.creationdate > (
select max(u2.creationdate)
from users u2
where u2.state_id = s.id
) - interval '3 months'
group by s.state
但是,它需要 100 秒。有人能给我一个性能更好的吗?
我希望这能起作用:
select count(u.id) as numberOfUsers,
s.state, max(u2.creationdate) as lastCreated
from users u
join states s on u.state_id = s.id
where u.creationdate > lastCreated - interval '3 months'
group by s.state
I'm trying to find the number of users that were created in the three months since the last user was created. All grouped by state.
Here's a query that works:
select count(u.id) as numberOfUsers,
s.state
from users u
join states s on u.state_id = s.id
where u.creationdate > (
select max(u2.creationdate)
from users u2
where u2.state_id = s.id
) - interval '3 months'
group by s.state
However, it takes 100 seconds. Can someone get me a more performant one?
I wish that this worked:
select count(u.id) as numberOfUsers,
s.state, max(u2.creationdate) as lastCreated
from users u
join states s on u.state_id = s.id
where u.creationdate > lastCreated - interval '3 months'
group by s.state
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
由于只进行一次扫描,这可能会表现得更好:
但是,在进行初始窗口聚合时,它将消耗大量工作内存。
嗯,也许更像是:
这是试图逗弄 PostgreSQL 进行正确的分区扫描,但它并不是很理想。它仍然进行全表扫描,但至少只有一次。迭代 CTE 的输出并在循环内发出外部查询结果的集合返回函数可能效果最好,因为它将能够为每个状态使用创建日期索引。
This might perform better due to only doing one scan:
However, it will chew through a lot of work memory doing the initial windowing aggregation.
Hmm, maybe something more like:
This is trying to tease PostgreSQL into doing a properly partitioned scan, but it's not really ideal. It still does a full table scan, but at least only one. A set-returning function that iterated through the output of the CTE and emitted the result of the outer query inside the loop would probably work best, since that would be able to utilise a
creationdate
index for each state.只是出于兴趣,以下查询的执行情况如何?我对 Postgresql 如何处理最里面的查询(状态表+标量子查询)特别感兴趣。
必须有用户的复合索引(state_id,creation_date)才能使其工作。
编辑这是为针对 3 个状态的 100,000 个用户行的查询生成的计划:
Just out of interest, how does the following query perform? I'm particularly interested in how Postgresql processes the innermost query (states table + scalar sub query).
There has to a compound index on users(state_id, creation_date) for this to work.
edit this is the plan produced for that query with 100,000 user rows against 3 states:
这是我用来将时间减少到 82 毫秒的查询:
谢谢 araqnid。
This is the query that I used to reduce the time to 82ms:
Thank you to araqnid.
您是否已确定查询的哪一部分速度较慢?可以添加索引吗?我不是 Postgres 专家,但我怀疑如果 users 没有在 users.creationdate 上建立索引,则 MAX() 函数将必须执行全表扫描。嗯,无论如何它可能必须做一件事......
也就是说,这里什么也没有!
Have you determined which part of the query is slow? Can you add indexing? I'm no Postgres guru, but I suspect that if users is not indexed on users.creationdate, the MAX() function will have to do a full table scan. Hmmm, it may have to do one anyway...
That said, here goes nothing!