使用 GROUP BY 查询计算百分比

发布于 2024-11-11 13:47:33 字数 793 浏览 5 评论 0原文

我有一个包含 3 列的表，如下所示：

File    User     Rating (1-5)
------------------------------
00001    1        3
00002    1        4
00003    2        2
00004    3        5
00005    4        3
00005    3        2
00006    2        3
Etc.

我想生成一个输出以下内容的查询（对于每个用户和评级，显示文件数量以及文件百分比）：

User    Rating   Count   Percentage
-----------------------------------
1       1         3      .18
1       2         6      .35
1       3         8      .47
2       5         12     .75
2       3         4      .25

使用 Postgresql，我知道如何创建使用以下查询包含前 3 列的查询，但我无法弄清楚如何计算 GROUP BY 内的百分比：

SELECT
    User,
    Rating,
    Count(*)
FROM
    Results
GROUP BY
    User, Rating
ORDER BY
    User, Rating

这里我希望将百分比计算应用于每个用户/评级组。

原文

I have a table with 3 columns which looks like this:

File    User     Rating (1-5)
------------------------------
00001    1        3
00002    1        4
00003    2        2
00004    3        5
00005    4        3
00005    3        2
00006    2        3
Etc.

I want to generate a query that outputs the following (for each user and rating, display the number of files as well as percentage of files):

User    Rating   Count   Percentage
-----------------------------------
1       1         3      .18
1       2         6      .35
1       3         8      .47
2       5         12     .75
2       3         4      .25

With Postgresql, I know how to create a query that includes the first 3 columns using the following query, but I can't figure out how to calculate percentage within the GROUP BY:

SELECT
    User,
    Rating,
    Count(*)
FROM
    Results
GROUP BY
    User, Rating
ORDER BY
    User, Rating

Here I want the percentage calculation to apply to each user/rating group.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

耶耶耶 2024-11-18 13:47:34

您可以在聚合函数之上使用分析函数：

select
    user,
    rating,
    count(*) as total_per_user_rating_tuple,
    sum(count(*)) over (partition by user) as total_per_user,
    count(*) / sum(count(*)) over (partition by user) as subtotal_upon_total_per_user
from results
group by user, rating
order by user, rating

查询将在任何实现分析（窗口）函数的 RDBMS 中运行。请注意，聚合函数嵌套在分析函数内。这是由于 SQL 中的操作顺序而起作用的。

DB 演示

You can use analytic functions on top of aggregate functions:

select
    user,
    rating,
    count(*) as total_per_user_rating_tuple,
    sum(count(*)) over (partition by user) as total_per_user,
    count(*) / sum(count(*)) over (partition by user) as subtotal_upon_total_per_user
from results
group by user, rating
order by user, rating

The query will work in any RDBMS that implements analytic (window) functions. Notice that aggregate functions are nested inside analytic functions. This works because of the order of operations in SQL.

Demo on DB<>Fiddle

回复收藏 0 原文

最美的太阳 2024-11-18 13:47:33

WITH t1 AS 
 (SELECT User, Rating, Count(*) AS n 
  FROM your_table
  GROUP BY User, Rating)
SELECT User, Rating, n, 
       (0.0+n)/(COUNT(*) OVER (PARTITION BY User)) -- no integer divide!
FROM t1;

或者

SELECT User, Rating, Count(*) OVER w_user_rating AS n, 
        (0.0+Count(*) OVER w_user_rating)/(Count(*) OVER (PARTITION BY User)) AS pct
FROM your_table
WINDOW w_user_rating AS (PARTITION BY User, Rating);

我会看看其中之一是否可以使用适合您的 RDBMS 的工具产生更好的查询计划。

WITH t1 AS 
 (SELECT User, Rating, Count(*) AS n 
  FROM your_table
  GROUP BY User, Rating)
SELECT User, Rating, n, 
       (0.0+n)/(COUNT(*) OVER (PARTITION BY User)) -- no integer divide!
FROM t1;

SELECT User, Rating, Count(*) OVER w_user_rating AS n, 
        (0.0+Count(*) OVER w_user_rating)/(Count(*) OVER (PARTITION BY User)) AS pct
FROM your_table
WINDOW w_user_rating AS (PARTITION BY User, Rating);

I would see if one of these or the other yields a better query plan with the appropriate tool for your RDBMS.

回复收藏 0 原文

旧夏天 2024-11-18 13:47:33

或者，你也可以采用老式的方式——可以说更容易理解：

select usr.User                   as User   ,
       usr.Rating                 as Rating ,
       usr.N                      as N      ,
       (100.0 * usr.N) / total.N as Pct
from ( select User, Rating , count(*) as N
       from Results
       group by User , Rating
     ) usr
join ( select User , count(*) as N
       from Results
       group by User
     ) total on total.User = usr.User
order by usr.User, usr.Rating

干杯！

Alternatively, you can do the old-school way — arguably easier to grok:

select usr.User                   as User   ,
       usr.Rating                 as Rating ,
       usr.N                      as N      ,
       (100.0 * usr.N) / total.N as Pct
from ( select User, Rating , count(*) as N
       from Results
       group by User , Rating
     ) usr
join ( select User , count(*) as N
       from Results
       group by User
     ) total on total.User = usr.User
order by usr.User, usr.Rating

Cheers!

回复收藏 0 原文

所谓喜欢 2024-11-18 13:47:33

在 TSQL 中这应该可以工作

SELECT
    User,
    Rating,
    Count(*), SUM(COUNT(*)) OVER (PARTITION BY User, Rating ORDER BY User, Rating) AS Total,
Count(*)/(SUM(COUNT(*)) OVER (PARTITION BY User, Rating ORDER BY User, Rating)) AS Percentage
FROM
    Results
GROUP BY
    User, Rating
ORDER BY
    User, Rating

In TSQL this should work

SELECT
    User,
    Rating,
    Count(*), SUM(COUNT(*)) OVER (PARTITION BY User, Rating ORDER BY User, Rating) AS Total,
Count(*)/(SUM(COUNT(*)) OVER (PARTITION BY User, Rating ORDER BY User, Rating)) AS Percentage
FROM
    Results
GROUP BY
    User, Rating
ORDER BY
    User, Rating

回复收藏 0 原文

诗笺 2024-11-18 13:47:33

最好的方法是使用窗口函数。

回复收藏 0 原文

很酷又爱笑 2024-11-18 13:47:33

WITH data AS 
 (SELECT User, Rating, Count(*) AS Count 
  FROM Results
  GROUP BY User, Rating)
SELECT User, Rating, Count, 
       (0.0+n)/(SUM(Count) OVER (PARTITION BY User))
FROM data;

WITH data AS 
 (SELECT User, Rating, Count(*) AS Count 
  FROM Results
  GROUP BY User, Rating)
SELECT User, Rating, Count, 
       (0.0+n)/(SUM(Count) OVER (PARTITION BY User))
FROM data;

回复收藏 0 原文

星 2024-11-18 13:47:33

如果数据不是太大，您可以考虑使用子查询以方便使用。确保子查询查询相同的数据。

SELECT User, [Rating], Count(*) AS COUNT,
100.0* (COUNT(*)/(SELECT COUNT(*) FROM Results) as 'Percentage'
FROM Results
GROUP BY User, [Rating]

我在一个类似的表上对此进行了测试

select OrderStatus, Count(*) as Antall,
CASE 
    WHEN OrderStatus = 0 THEN 'Bestilt (Ordered'
    WHEN OrderStatus = 1 THEN 'Besvart (Completed)'
    WHEN OrderStatus = 2 THEN 'Utløpt (Expired)'
    WHEN OrderStatus = 3 THEN 'Feilet (Error)'
    WHEN OrderStatus = 4 THEN 'Ikke bestilt (NotOrdered)'
    WHEN OrderStatus = 5 THEN 'Fysisk post returnert uåpnet (PhysicalMailReturned)'
    WHEN OrderStatus = 6 THEN 'Prosesserer bestilling (Processing)'
    WHEN OrderStatus = 7 THEN 'Deleted (Slettet)'
END AS OrderStatusText,
100 * (COUNT(*) *1.0 / (SELECT COUNT(*) FROM PromsFormOrder where created >= '2023-11-29')) as 'Percentage rate'
FROM PromsFormOrder
where Created >= '2023-11-29'
GROUP BY OrderStatus

这里与 1.0 相乘的摆弄是为了确保我们切换到基于浮点的算术，这样我们的百分比不会由于整数除法而成为 0 或 1。

使用分区函数或窗口函数可能比子查询更快，但语法使用更知名的 SQL 结构。评级列是 T-SQL (SQL Server) 中的保留字，因此我必须使用 [Rating]，在 Postgres 中这不是必需的。

If data is not too big, you might consider subqueries for ease of use. Make sure subquery queries same data.

SELECT User, [Rating], Count(*) AS COUNT,
100.0* (COUNT(*)/(SELECT COUNT(*) FROM Results) as 'Percentage'
FROM Results
GROUP BY User, [Rating]

I tested this on a similar table

select OrderStatus, Count(*) as Antall,
CASE 
    WHEN OrderStatus = 0 THEN 'Bestilt (Ordered'
    WHEN OrderStatus = 1 THEN 'Besvart (Completed)'
    WHEN OrderStatus = 2 THEN 'Utløpt (Expired)'
    WHEN OrderStatus = 3 THEN 'Feilet (Error)'
    WHEN OrderStatus = 4 THEN 'Ikke bestilt (NotOrdered)'
    WHEN OrderStatus = 5 THEN 'Fysisk post returnert uåpnet (PhysicalMailReturned)'
    WHEN OrderStatus = 6 THEN 'Prosesserer bestilling (Processing)'
    WHEN OrderStatus = 7 THEN 'Deleted (Slettet)'
END AS OrderStatusText,
100 * (COUNT(*) *1.0 / (SELECT COUNT(*) FROM PromsFormOrder where created >= '2023-11-29')) as 'Percentage rate'
FROM PromsFormOrder
where Created >= '2023-11-29'
GROUP BY OrderStatus

The fiddling with multiplying with 1.0 here is to make sure we switch to floating based arithmetic so our percentages are not 0 or 1 due to integer division.

Using partitions over or windowed functions are probably faster than subqueries, but syntax is using more well-known constructs of SQL. The Rating column is a reserved word in T-SQL (SQL Server) so I had to use [Rating], in Postgres this is not required.

回复收藏 0 原文

~没有更多了~