查询优化

发布于 2024-10-11 02:13:21 字数 1139 浏览 1 评论 0原文

这是我当前的查询,它有效,但速度很慢:

    SELECT row, MIN(flg) ||' to ' ||Max (flg) as xyz , avg(amt_won), count(*)
FROM(
SELECT (ROW_NUMBER() OVER (ORDER BY flg))*100/
(SELECT  count(*)+100 as temprow FROM temporary_six_max) as row, flg, amt_won
FROM temporary_six_max 
    JOIN (
    SELECT id_player AS pid,  avg(flg_vpip::int)  AS flg 
    FROM temporary_six_max
    GROUP BY id_player 
    ) AS auxtable
    ON pid = id_player
) as auxtable2
group by 1
order by 1;

我按固定(或几乎固定)计数 100 个范围进行分组,这些范围按 avg(flg_vpip) 排序,并按 id_player 分组。

我在这里粘贴了结果,以防有助于理解: https://spreadsheets0.google.com/ccc ?key=tFVsxkWVn4fMWYBxxGYokwQ&authkey=CNDvuOcG&authkey=CNDvuOcG#gid=0

我想知道在这种情况下是否有比 ROW_NUMBER() 更好的函数可以使用,我觉得我做了太多子选择,但我不这样做不知道如何优化。

我将非常感谢任何帮助。

如果有不清楚的地方请告诉我。

谢谢。

编辑:

我创建 auxtable 2 的原因是因为当我使用 (ROW_NUMBER() OVER (ORDER BY flg) 并使用其他聚合命令(例如 avg(amt_won) 和 count(*))(这是必不可少的)时,我收到错误说 flg 应该在聚合函数中,但我无法通过 flg 的聚合函数进行排序。

That's my current query, it works but it is slow:

    SELECT row, MIN(flg) ||' to ' ||Max (flg) as xyz , avg(amt_won), count(*)
FROM(
SELECT (ROW_NUMBER() OVER (ORDER BY flg))*100/
(SELECT  count(*)+100 as temprow FROM temporary_six_max) as row, flg, amt_won
FROM temporary_six_max 
    JOIN (
    SELECT id_player AS pid,  avg(flg_vpip::int)  AS flg 
    FROM temporary_six_max
    GROUP BY id_player 
    ) AS auxtable
    ON pid = id_player
) as auxtable2
group by 1
order by 1;

I am grouping in fixed (or almost fixed) count 100 ranges that are ordered by avg(flg_vpip) grouped by id_player.

Here I've pasted the results in case it may help to understand:
https://spreadsheets0.google.com/ccc?key=tFVsxkWVn4fMWYBxxGYokwQ&authkey=CNDvuOcG&authkey=CNDvuOcG#gid=0

I wonder if there is a better function to use than ROW_NUMBER() in this case and I feel like I am doing too many subselects but I don't know how to optimize it.

I'll appreciate very much any help.

If something is not clear just let me know.

Thank you.

EDIT:

The reason I created auxtable 2, is because when I use (ROW_NUMBER() OVER (ORDER BY flg), and use other agregate commands such as avg(amt_won) and count(*), which are essential, I get an error saying that flg should be in the aggregate function, but I can't order by an aggregate function of flg.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

遇见了你 2024-10-18 02:13:21

我生成了一些数据来进行测试,如下所示:

create table temporary_six_max as
select id_player, flg_vpip,
       random()*100 * (case flg_vpip when 0 then 1 else -1 end) as amt_won
from (select (random()*1000)::int as id_player, random()::int as flg_vpip
      from generate_series(1,1000000)) source;
create index on temporary_six_max(id_player);

您的查询成功运行,但不完全生成相同的计划,我在下臂中得到一个嵌套循环,而不是在初始化计划中进行合并和序列扫描- - 我希望你还没有关闭enable_seqscan?

仅使用表的单次扫描的解决方案:

select row, min(flg) || ' to ' || max(flg) as xyz, avg(amt_won), count(*)
from (select flg, amt_won, ntile(100) over(order by flg) as row
      from (select id_player as pid, amt_won,
                   avg(flg_vpip::int) over (partition by id_player) as flg
            from temporary_six_max
           ) player_stats
     ) chunks
group by 1
order by 1

坏消息是,这实际上在我的机器上执行更糟,特别是如果我将work_mem提高到足以避免第一次磁盘排序(制作player_stats,按FLG)。虽然增加 work_mem 确实使查询时间减半,所以我想这至少是一个开始?

话虽如此,我的查询运行了大约 5 秒来处理 tempor_six_max 中的 10E6 个输入行,这比您发布的速度快一个数量级。您的表适合您的缓冲区高速缓存吗?如果没有,单次扫描解决方案可能更适合您。 (您使用的是哪个版本的 Postgresql?“explain (analyze on, buffers on) select...”将显示 9.0 中的缓冲区命中/未命中率,或者只查看您的“shared_buffers”设置并与表大小进行比较)

I generated some data to test with like this:

create table temporary_six_max as
select id_player, flg_vpip,
       random()*100 * (case flg_vpip when 0 then 1 else -1 end) as amt_won
from (select (random()*1000)::int as id_player, random()::int as flg_vpip
      from generate_series(1,1000000)) source;
create index on temporary_six_max(id_player);

Your query runs successfully against that, but doesn't quite generate the same plan, I get a nested loop in the lower arm rather than a merge and a seq scan in the init-plan-- you haven't turned off enable_seqscan I hope?

A solution just using a single scan of the table:

select row, min(flg) || ' to ' || max(flg) as xyz, avg(amt_won), count(*)
from (select flg, amt_won, ntile(100) over(order by flg) as row
      from (select id_player as pid, amt_won,
                   avg(flg_vpip::int) over (partition by id_player) as flg
            from temporary_six_max
           ) player_stats
     ) chunks
group by 1
order by 1

The bad news is that this actually performs worse on my machine, especially if I bump work_mem up enough to avoid the first disk sort (making player_stats, sorting by flg). Although increasing work_mem did halve the query time, so I guess that is at least a start?

Having said that, my queries are running for about 5 seconds to process 10E6 input rows in temporary_six_max, which is an order of magnitude faster than you posted. Does your table fit into your buffer cache? If not, a single-scan solution may be much better for you. (Which version of Postgresql are you using? "explain (analyze on, buffers on) select..." will show you buffer hit/miss rates in 9.0, or just look at your "shared_buffers" setting and compare with the table size)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文