SQL:快速累积频率查询(postgres)
我希望从我们的数据库中获取累积频率数据。我创建了一个简单的临时表,其中包含我们所看到的所有唯一状态更新计数,以及具有该数量状态更新的用户数量。
Table "pg_temp_4.statuses_count_tmp"
Column | Type | Modifiers
----------------+---------+-----------
statuses_count | integer |
frequency | bigint |
Indexes:
"statuses_count_idx" UNIQUE, btree (statuses_count)
我当前的查询是:
select statuses_count, frequency/(select * from total_statuses)::float, (select sum(frequency)/(select * from total_statuses)::float AS percentage from statuses_count_tmp WHERE statuses_count <= SCT.statuses_count) AS cumulative_percent FROM statuses_count_tmp AS SCT ORDER BY statuses_count DESC;
但这需要相当长的时间,并且查询数量增长得很快。因此,对于大约 50,000 行,我正在查看要读取的 50k 阶乘行。坐在这里看着查询逐渐消失,我希望有一个我还没有完成的更好的解决方案。
希望能得到这样的东西:
0 0.26975161 0.26975161
1 0.15306534 0.42281695
2 0.05513516 0.47795211
3 0.03050646 0.50845857
4 0.02064444 0.52910301
I'm looking to get Cumulative Frequency Data out of our database. I've created a simple temp table with all unique status update counts that we've seen, and the number of users that have that amount of status updates.
Table "pg_temp_4.statuses_count_tmp"
Column | Type | Modifiers
----------------+---------+-----------
statuses_count | integer |
frequency | bigint |
Indexes:
"statuses_count_idx" UNIQUE, btree (statuses_count)
My current query is:
select statuses_count, frequency/(select * from total_statuses)::float, (select sum(frequency)/(select * from total_statuses)::float AS percentage from statuses_count_tmp WHERE statuses_count <= SCT.statuses_count) AS cumulative_percent FROM statuses_count_tmp AS SCT ORDER BY statuses_count DESC;
But this takes quite a while and the number of queries grows quite quickly. So with the ~50,000 rows I have, I'm looking at 50k factorial rows to be read. Sitting here watching the query grind away I'm hoping theres a better solution that I haven't through of yet.
Hoping to get something like this:
0 0.26975161 0.26975161
1 0.15306534 0.42281695
2 0.05513516 0.47795211
3 0.03050646 0.50845857
4 0.02064444 0.52910301
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
假设您有 PostgreSQL 8.4 或更高版本,应该可以使用窗口函数来解决。我猜测
total_statuses
是一个视图或临时表,类似于select sum(Frequency) from statuses_count_tmp
?我在这里将其编写为 CTE,这应该使其在语句持续时间内只计算一次结果:如果没有 8.4 的窗口函数,您最好的选择就是迭代地处理数据:
Should be solvable with windowing functions, assuming you have PostgreSQL 8.4 or later. I am guessing that
total_statuses
is a view or temp table along the lines ofselect sum(frequency) from statuses_count_tmp
? I wrote it as a CTE here which should make it calculate the result just once for the duration of the statement:Without 8.4's window functions your best bet is simply to process the data iteratively: