查询优化
这是我当前的查询,它有效,但速度很慢:
SELECT row, MIN(flg) ||' to ' ||Max (flg) as xyz , avg(amt_won), count(*)
FROM(
SELECT (ROW_NUMBER() OVER (ORDER BY flg))*100/
(SELECT count(*)+100 as temprow FROM temporary_six_max) as row, flg, amt_won
FROM temporary_six_max
JOIN (
SELECT id_player AS pid, avg(flg_vpip::int) AS flg
FROM temporary_six_max
GROUP BY id_player
) AS auxtable
ON pid = id_player
) as auxtable2
group by 1
order by 1;
我按固定(或几乎固定)计数 100 个范围进行分组,这些范围按 avg(flg_vpip) 排序,并按 id_player 分组。
我在这里粘贴了结果,以防有助于理解: https://spreadsheets0.google.com/ccc ?key=tFVsxkWVn4fMWYBxxGYokwQ&authkey=CNDvuOcG&authkey=CNDvuOcG#gid=0
我想知道在这种情况下是否有比 ROW_NUMBER() 更好的函数可以使用,我觉得我做了太多子选择,但我不这样做不知道如何优化。
我将非常感谢任何帮助。
如果有不清楚的地方请告诉我。
谢谢。
编辑:
我创建 auxtable 2 的原因是因为当我使用 (ROW_NUMBER() OVER (ORDER BY flg) 并使用其他聚合命令(例如 avg(amt_won) 和 count(*))(这是必不可少的)时,我收到错误说 flg 应该在聚合函数中,但我无法通过 flg 的聚合函数进行排序。
That's my current query, it works but it is slow:
SELECT row, MIN(flg) ||' to ' ||Max (flg) as xyz , avg(amt_won), count(*)
FROM(
SELECT (ROW_NUMBER() OVER (ORDER BY flg))*100/
(SELECT count(*)+100 as temprow FROM temporary_six_max) as row, flg, amt_won
FROM temporary_six_max
JOIN (
SELECT id_player AS pid, avg(flg_vpip::int) AS flg
FROM temporary_six_max
GROUP BY id_player
) AS auxtable
ON pid = id_player
) as auxtable2
group by 1
order by 1;
I am grouping in fixed (or almost fixed) count 100 ranges that are ordered by avg(flg_vpip) grouped by id_player.
Here I've pasted the results in case it may help to understand:
https://spreadsheets0.google.com/ccc?key=tFVsxkWVn4fMWYBxxGYokwQ&authkey=CNDvuOcG&authkey=CNDvuOcG#gid=0
I wonder if there is a better function to use than ROW_NUMBER() in this case and I feel like I am doing too many subselects but I don't know how to optimize it.
I'll appreciate very much any help.
If something is not clear just let me know.
Thank you.
EDIT:
The reason I created auxtable 2, is because when I use (ROW_NUMBER() OVER (ORDER BY flg), and use other agregate commands such as avg(amt_won) and count(*), which are essential, I get an error saying that flg should be in the aggregate function, but I can't order by an aggregate function of flg.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我生成了一些数据来进行测试,如下所示:
您的查询成功运行,但不完全生成相同的计划,我在下臂中得到一个嵌套循环,而不是在初始化计划中进行合并和序列扫描- - 我希望你还没有关闭enable_seqscan?
仅使用表的单次扫描的解决方案:
坏消息是,这实际上在我的机器上执行更糟,特别是如果我将work_mem提高到足以避免第一次磁盘排序(制作player_stats,按FLG)。虽然增加 work_mem 确实使查询时间减半,所以我想这至少是一个开始?
话虽如此,我的查询运行了大约 5 秒来处理 tempor_six_max 中的 10E6 个输入行,这比您发布的速度快一个数量级。您的表适合您的缓冲区高速缓存吗?如果没有,单次扫描解决方案可能更适合您。 (您使用的是哪个版本的 Postgresql?“explain (analyze on, buffers on) select...”将显示 9.0 中的缓冲区命中/未命中率,或者只查看您的“shared_buffers”设置并与表大小进行比较)
I generated some data to test with like this:
Your query runs successfully against that, but doesn't quite generate the same plan, I get a nested loop in the lower arm rather than a merge and a seq scan in the init-plan-- you haven't turned off enable_seqscan I hope?
A solution just using a single scan of the table:
The bad news is that this actually performs worse on my machine, especially if I bump work_mem up enough to avoid the first disk sort (making player_stats, sorting by flg). Although increasing work_mem did halve the query time, so I guess that is at least a start?
Having said that, my queries are running for about 5 seconds to process 10E6 input rows in temporary_six_max, which is an order of magnitude faster than you posted. Does your table fit into your buffer cache? If not, a single-scan solution may be much better for you. (Which version of Postgresql are you using? "explain (analyze on, buffers on) select..." will show you buffer hit/miss rates in 9.0, or just look at your "shared_buffers" setting and compare with the table size)