使用ratio_to_report分析

发布于 2024-10-21 12:29:09 字数 717 浏览 0 评论 0原文

我试图获取一组特定值所具有的行的百分比。最好通过例子来解释。我可以使用ratio-to-report函数和over()非常简单地通过每一列来完成此操作,但是我遇到了多个分组的问题

假设表有2列:

column a         column b
1000             some data
1100             some data
2000             some data
1400             some data
1500             some data

通过以下查询,我可以得到这个域集,每个都是总行数的 20%

select columna, count(*), trunc(ratio_to_report(count(columna)) over() * 100, 2) as perc
from table
group by columna
order by perc desc;

但是,我需要的是例如确定百分比和包含 1000、1400 或 2000 的行数;通过查看它,您可以看出它是 60%,但需要一个查询来返回该值。这需要高效,因为查询将针对数百万行运行。就像我之前说的,我对单个值及其百分比进行了处理,但倍数才是让我困惑的。

似乎我需要能够在某处放置 IN 子句,但值不会每次都是这些特定值。如果有意义的话,我需要从另一个表中获取“IN”部分的值。我想我需要某种多重分组。

I am trying to get the percentage of rows that a set of particular value has. Best explained by example. I can do this by each column very simply using ratio-to-report function and over(), but am having issues with multiple groupings

Assume table has 2 columns:

column a         column b
1000             some data
1100             some data
2000             some data
1400             some data
1500             some data

With the following query, I can get for this domain set, each one is 20% of the total rows

select columna, count(*), trunc(ratio_to_report(count(columna)) over() * 100, 2) as perc
from table
group by columna
order by perc desc;

However, what I need is for example to determine the percentage & count of the rows that contain 1000, 1400 or 2000; From looking at it, you can tell its 60%, but need a query to return that. This needs to be efficient, as the query will be running against millions of rows. Like I said before, I have this working on a single value and its percentage, but the multiple is what is throwing me.

Seems like I need to be able to put an IN clause somewhere, but the values will not be these specific values each time. I will need to get the values for the "IN" part of it from another table, if that makes sense. guess I need some kind of multiple grouping.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

空气里的味道 2024-10-28 12:29:09

您可能正在寻找类似的东西

SQL> ed
Wrote file afiedt.buf

  1  with x as (
  2    select 1000 a from dual
  3    union all
  4    select 1100 from dual
  5    union all
  6    select 1400 from dual
  7    union all
  8    select 1500 from dual
  9    union all
 10    select 2000 from dual
 11  )
 12  select (case when a in (1000,1400,1500)
 13               then 1
 14               else 0
 15           end) bucket,
 16         count(*),
 17         ratio_to_report(count(*)) over ()
 18    from x
 19   group by (case when a in (1000,1400,1500)
 20               then 1
 21               else 0
 22*          end)
SQL> /

    BUCKET   COUNT(*) RATIO_TO_REPORT(COUNT(*))OVER()
---------- ---------- -------------------------------
         1          3                              .6
         0          2                              .4

Potentially, you're looking for something like

SQL> ed
Wrote file afiedt.buf

  1  with x as (
  2    select 1000 a from dual
  3    union all
  4    select 1100 from dual
  5    union all
  6    select 1400 from dual
  7    union all
  8    select 1500 from dual
  9    union all
 10    select 2000 from dual
 11  )
 12  select (case when a in (1000,1400,1500)
 13               then 1
 14               else 0
 15           end) bucket,
 16         count(*),
 17         ratio_to_report(count(*)) over ()
 18    from x
 19   group by (case when a in (1000,1400,1500)
 20               then 1
 21               else 0
 22*          end)
SQL> /

    BUCKET   COUNT(*) RATIO_TO_REPORT(COUNT(*))OVER()
---------- ---------- -------------------------------
         1          3                              .6
         0          2                              .4
晒暮凉 2024-10-28 12:29:09

我不确定我是否完全理解该要求,但您是否需要 ratio_to_report ?看看下面的内容,让我知道这与您想要的有多接近,我们可以从那里开始工作!

T1 是包含示例数据的表

create table t1(a primary key) as
   select 1000 as a from dual union all
   select 1100 as a from dual union all
   select 1400 as a from dual union all
   select 1500 as a from dual union all
   select 2000 as a from dual;

T2 是您提到的查找表(在其中获取 ID 列表)

create table t2(a primary key) as
   select 1000 as a from dual union all
   select 1400 as a from dual union all
   select 2000 as a from dual;

T1->T2 的左连接将返回 T1 中的所有行与 T2 中的所有匹配行配对。对于 T1 中不存在于集合 (T2) 中的每个 A,结果将用 NULL 填充。我们可以利用 COUNT() 不计算(呵呵)空值的事实。

select count(t1.a) as num_rows
      ,count(t2.a) as in_set
      ,count(t2.a) / count(t1.a) as shr_in_set
  from t1
  left 
  join t2 on(t1.a = t2.a);

运行查询的结果是:

  NUM_ROWS     IN_SET SHR_IN_SET
---------- ---------- ----------
         5          3         ,6

I'm not sure I entirely understand the requirement, but do you need ratio_to_report at all? Have a look at the following, and let me know how close this is to what you want, and we can work from there!

T1 is the table containing your sample data

create table t1(a primary key) as
   select 1000 as a from dual union all
   select 1100 as a from dual union all
   select 1400 as a from dual union all
   select 1500 as a from dual union all
   select 2000 as a from dual;

T2 is the lookup table you mentioned (where you get the list of IDs)

create table t2(a primary key) as
   select 1000 as a from dual union all
   select 1400 as a from dual union all
   select 2000 as a from dual;

A left join from T1->T2 will return all rows in T1 paired with all matching rows in T2. For each A in T1 that does not exist in your set (T2), the result will be padded with NULL. We can exploit the fact that COUNT() doesn't count (hehe) nulls.

select count(t1.a) as num_rows
      ,count(t2.a) as in_set
      ,count(t2.a) / count(t1.a) as shr_in_set
  from t1
  left 
  join t2 on(t1.a = t2.a);

The result of running the query is:

  NUM_ROWS     IN_SET SHR_IN_SET
---------- ---------- ----------
         5          3         ,6
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文