SQL:快速累积频率查询(postgres)

发布于 2024-10-11 16:42:20 字数 1014 浏览 1 评论 0原文

我希望从我们的数据库中获取累积频率数据。我创建了一个简单的临时表,其中包含我们所看到的所有唯一状态更新计数,以及具有该数量状态更新的用户数量。

     Table "pg_temp_4.statuses_count_tmp"
     Column     |  Type   | Modifiers 
----------------+---------+-----------
 statuses_count | integer | 
 frequency      | bigint  | 
Indexes:
    "statuses_count_idx" UNIQUE, btree (statuses_count)

我当前的查询是:

select statuses_count, frequency/(select * from total_statuses)::float, (select sum(frequency)/(select * from total_statuses)::float AS percentage from statuses_count_tmp WHERE statuses_count <= SCT.statuses_count) AS cumulative_percent  FROM statuses_count_tmp AS SCT ORDER BY statuses_count DESC;

但这需要相当长的时间,并且查询数量增长得很快。因此,对于大约 50,000 行,我正在查看要读取的 50k 阶乘行。坐在这里看着查询逐渐消失,我希望有一个我还没有完成的更好的解决方案。

希望能得到这样的东西:

0       0.26975161      0.26975161
1       0.15306534      0.42281695
2       0.05513516      0.47795211
3       0.03050646      0.50845857
4       0.02064444      0.52910301

I'm looking to get Cumulative Frequency Data out of our database. I've created a simple temp table with all unique status update counts that we've seen, and the number of users that have that amount of status updates.

     Table "pg_temp_4.statuses_count_tmp"
     Column     |  Type   | Modifiers 
----------------+---------+-----------
 statuses_count | integer | 
 frequency      | bigint  | 
Indexes:
    "statuses_count_idx" UNIQUE, btree (statuses_count)

My current query is:

select statuses_count, frequency/(select * from total_statuses)::float, (select sum(frequency)/(select * from total_statuses)::float AS percentage from statuses_count_tmp WHERE statuses_count <= SCT.statuses_count) AS cumulative_percent  FROM statuses_count_tmp AS SCT ORDER BY statuses_count DESC;

But this takes quite a while and the number of queries grows quite quickly. So with the ~50,000 rows I have, I'm looking at 50k factorial rows to be read. Sitting here watching the query grind away I'm hoping theres a better solution that I haven't through of yet.

Hoping to get something like this:

0       0.26975161      0.26975161
1       0.15306534      0.42281695
2       0.05513516      0.47795211
3       0.03050646      0.50845857
4       0.02064444      0.52910301

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

第七度阳光i 2024-10-18 16:42:20

假设您有 PostgreSQL 8.4 或更高版本,应该可以使用窗口函数来解决。我猜测 total_statuses 是一个视图或临时表,类似于 select sum(Frequency) from statuses_count_tmp?我在这里将其编写为 CTE,这应该使其在语句持续时间内只计算一次结果:

with total_statuses as (select sum(frequency) from statuses_count_tmp)
select statuses_count,
       frequency / (select * from total_statuses) as frequency,
       sum(frequency) over(order by statuses_count)
           / (select * from total_statuses) as cumulative_frequency
from statuses_count_tmp

如果没有 8.4 的窗口函数,您最好的选择就是迭代地处理数据:

create type cumulative_sum_type as ( statuses_count int, frequency numeric, cumulative_frequency numeric );
create or replace function cumulative_sum() returns setof cumulative_sum_type strict stable language plpgsql as $
declare
  running_total bigint := 0;
  total bigint;
  data_in record;
  data_out cumulative_sum_type;
begin
  select sum(frequency) into total from statuses_count_tmp;
  for data_in in select statuses_count, frequency from statuses_count_tmp order by statuses_count
  loop
    data_out.statuses_count := data_in.statuses_count;
    running_total := running_total + data_in.frequency;
    data_out.frequency = data_in.frequency::numeric / total;
    data_out.cumulative_frequency = running_total::numeric / total;
    return next data_out;
  end loop;
end;
$;
select * from cumulative_sum();

Should be solvable with windowing functions, assuming you have PostgreSQL 8.4 or later. I am guessing that total_statuses is a view or temp table along the lines of select sum(frequency) from statuses_count_tmp? I wrote it as a CTE here which should make it calculate the result just once for the duration of the statement:

with total_statuses as (select sum(frequency) from statuses_count_tmp)
select statuses_count,
       frequency / (select * from total_statuses) as frequency,
       sum(frequency) over(order by statuses_count)
           / (select * from total_statuses) as cumulative_frequency
from statuses_count_tmp

Without 8.4's window functions your best bet is simply to process the data iteratively:

create type cumulative_sum_type as ( statuses_count int, frequency numeric, cumulative_frequency numeric );
create or replace function cumulative_sum() returns setof cumulative_sum_type strict stable language plpgsql as $
declare
  running_total bigint := 0;
  total bigint;
  data_in record;
  data_out cumulative_sum_type;
begin
  select sum(frequency) into total from statuses_count_tmp;
  for data_in in select statuses_count, frequency from statuses_count_tmp order by statuses_count
  loop
    data_out.statuses_count := data_in.statuses_count;
    running_total := running_total + data_in.frequency;
    data_out.frequency = data_in.frequency::numeric / total;
    data_out.cumulative_frequency = running_total::numeric / total;
    return next data_out;
  end loop;
end;
$;
select * from cumulative_sum();
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文