查询优化

发布于 2024-10-11 02:13:21 字数 1139 浏览 1 评论 0原文

这是我当前的查询，它有效，但速度很慢：

    SELECT row, MIN(flg) ||' to ' ||Max (flg) as xyz , avg(amt_won), count(*)
FROM(
SELECT (ROW_NUMBER() OVER (ORDER BY flg))*100/
(SELECT  count(*)+100 as temprow FROM temporary_six_max) as row, flg, amt_won
FROM temporary_six_max 
    JOIN (
    SELECT id_player AS pid,  avg(flg_vpip::int)  AS flg 
    FROM temporary_six_max
    GROUP BY id_player 
    ) AS auxtable
    ON pid = id_player
) as auxtable2
group by 1
order by 1;

我按固定（或几乎固定）计数 100 个范围进行分组，这些范围按 avg(flg_vpip) 排序，并按 id_player 分组。

我在这里粘贴了结果，以防有助于理解： https://spreadsheets0.google.com/ccc ?key=tFVsxkWVn4fMWYBxxGYokwQ&authkey=CNDvuOcG&authkey=CNDvuOcG#gid=0

我想知道在这种情况下是否有比 ROW_NUMBER() 更好的函数可以使用，我觉得我做了太多子选择，但我不这样做不知道如何优化。

我将非常感谢任何帮助。

如果有不清楚的地方请告诉我。

谢谢。

编辑：

我创建 auxtable 2 的原因是因为当我使用 (ROW_NUMBER() OVER (ORDER BY flg) 并使用其他聚合命令（例如 avg(amt_won) 和 count(*)）（这是必不可少的）时，我收到错误说 flg 应该在聚合函数中，但我无法通过 flg 的聚合函数进行排序。

原文

That's my current query, it works but it is slow:

    SELECT row, MIN(flg) ||' to ' ||Max (flg) as xyz , avg(amt_won), count(*)
FROM(
SELECT (ROW_NUMBER() OVER (ORDER BY flg))*100/
(SELECT  count(*)+100 as temprow FROM temporary_six_max) as row, flg, amt_won
FROM temporary_six_max 
    JOIN (
    SELECT id_player AS pid,  avg(flg_vpip::int)  AS flg 
    FROM temporary_six_max
    GROUP BY id_player 
    ) AS auxtable
    ON pid = id_player
) as auxtable2
group by 1
order by 1;

I am grouping in fixed (or almost fixed) count 100 ranges that are ordered by avg(flg_vpip) grouped by id_player.

Here I've pasted the results in case it may help to understand:
https://spreadsheets0.google.com/ccc?key=tFVsxkWVn4fMWYBxxGYokwQ&authkey=CNDvuOcG&authkey=CNDvuOcG#gid=0

I wonder if there is a better function to use than ROW_NUMBER() in this case and I feel like I am doing too many subselects but I don't know how to optimize it.

I'll appreciate very much any help.

If something is not clear just let me know.

Thank you.

EDIT:

The reason I created auxtable 2, is because when I use (ROW_NUMBER() OVER (ORDER BY flg), and use other agregate commands such as avg(amt_won) and count(*), which are essential, I get an error saying that flg should be in the aggregate function, but I can't order by an aggregate function of flg.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

遇见了你 2024-10-18 02:13:21

我生成了一些数据来进行测试，如下所示：

create table temporary_six_max as
select id_player, flg_vpip,
       random()*100 * (case flg_vpip when 0 then 1 else -1 end) as amt_won
from (select (random()*1000)::int as id_player, random()::int as flg_vpip
      from generate_series(1,1000000)) source;
create index on temporary_six_max(id_player);

您的查询成功运行，但不完全生成相同的计划，我在下臂中得到一个嵌套循环，而不是在初始化计划中进行合并和序列扫描- - 我希望你还没有关闭enable_seqscan？

仅使用表的单次扫描的解决方案：

select row, min(flg) || ' to ' || max(flg) as xyz, avg(amt_won), count(*)
from (select flg, amt_won, ntile(100) over(order by flg) as row
      from (select id_player as pid, amt_won,
                   avg(flg_vpip::int) over (partition by id_player) as flg
            from temporary_six_max
           ) player_stats
     ) chunks
group by 1
order by 1

坏消息是，这实际上在我的机器上执行更糟，特别是如果我将work_mem提高到足以避免第一次磁盘排序（制作player_stats，按FLG）。虽然增加 work_mem 确实使查询时间减半，所以我想这至少是一个开始？

话虽如此，我的查询运行了大约 5 秒来处理 tempor_six_max 中的 10E6 个输入行，这比您发布的速度快一个数量级。您的表适合您的缓冲区高速缓存吗？如果没有，单次扫描解决方案可能更适合您。（您使用的是哪个版本的 Postgresql？“explain (analyze on, buffers on) select...”将显示 9.0 中的缓冲区命中/未命中率，或者只查看您的“shared_buffers”设置并与表大小进行比较）

I generated some data to test with like this:

create table temporary_six_max as
select id_player, flg_vpip,
       random()*100 * (case flg_vpip when 0 then 1 else -1 end) as amt_won
from (select (random()*1000)::int as id_player, random()::int as flg_vpip
      from generate_series(1,1000000)) source;
create index on temporary_six_max(id_player);

Your query runs successfully against that, but doesn't quite generate the same plan, I get a nested loop in the lower arm rather than a merge and a seq scan in the init-plan-- you haven't turned off enable_seqscan I hope?

A solution just using a single scan of the table:

select row, min(flg) || ' to ' || max(flg) as xyz, avg(amt_won), count(*)
from (select flg, amt_won, ntile(100) over(order by flg) as row
      from (select id_player as pid, amt_won,
                   avg(flg_vpip::int) over (partition by id_player) as flg
            from temporary_six_max
           ) player_stats
     ) chunks
group by 1
order by 1

The bad news is that this actually performs worse on my machine, especially if I bump work_mem up enough to avoid the first disk sort (making player_stats, sorting by flg). Although increasing work_mem did halve the query time, so I guess that is at least a start?

Having said that, my queries are running for about 5 seconds to process 10E6 input rows in temporary_six_max, which is an order of magnitude faster than you posted. Does your table fit into your buffer cache? If not, a single-scan solution may be much better for you. (Which version of Postgresql are you using? "explain (analyze on, buffers on) select..." will show you buffer hit/miss rates in 9.0, or just look at your "shared_buffers" setting and compare with the table size)

回复收藏 0 原文

~没有更多了~