如何在不使用子查询的情况下从连接表中获取计数直方图?

发布于 2024-10-24 19:48:06 字数 869 浏览 3 评论 0原文

我有很多看起来像这样的表:(id,user_id,object_id)。我经常对“有多少用户拥有一个对象?有多少用户有两个?等等”这一问题感兴趣。并想查看分布。

对此的明显答案如下所示:

select x.ucount, count(*) 
from (select count(*) as ucount from objects_users group by user_id) as x 
group by x.ucount 
order by x.ucount;

这会产生如下结果:

ucount | count
-------|-------
1      | 15
2      | 17
3      | 23
4      | 104
5      | 76
7      | 12

在此处使用子查询对我来说感觉不优雅,我想弄清楚如何在不使用子查询的情况下产生相同的结果。此外,如果您要问的问题稍微复杂一些,那么从子查询中传递更多信息就会变得很混乱。例如,如果您希望数据按用户的创建日期进一步分组:

select 
    x.ucount, 
    (select cdate from users where id = x.user_id) as cdate, 
    count(*) 
from (
    select user_id, count(*) as ucount 
    from objects_users group by user_id
) as x 
group by cdate, x.ucount,  
order by cdate, x.ucount;

是否有某种方法可以避免子查询的爆炸?我想最终我的反对是出于美学目的,但这使得查询难以阅读和编写。

I have a lot of tables that look like this: (id, user_id, object_id). I am often interested in the question "how many users have one object? how many have two? etc." and would like to see the distribution.

The obvious answer to this looks like:

select x.ucount, count(*) 
from (select count(*) as ucount from objects_users group by user_id) as x 
group by x.ucount 
order by x.ucount;

This produces results like:

ucount | count
-------|-------
1      | 15
2      | 17
3      | 23
4      | 104
5      | 76
7      | 12

Using a subquery here feels inelegant to me and I'd like to figure out how to produce the same result without. Further, if the question you're trying to ask is slightly more complicated it gets messy passing more information out of the subquery. For example, if you want the data further grouped by the user's creation date:

select 
    x.ucount, 
    (select cdate from users where id = x.user_id) as cdate, 
    count(*) 
from (
    select user_id, count(*) as ucount 
    from objects_users group by user_id
) as x 
group by cdate, x.ucount,  
order by cdate, x.ucount;

Is there some way to avoid the explosion of subqueries? I suppose in the end my objection is aesthetic, but it makes the queries hard to read and hard to write.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

半暖夏伤 2024-10-31 19:48:06

我认为无论您的 RDBMS 是什么,子查询都是执行此操作的正确方法。为什么会显得不优雅呢?

对于第二个查询,只需像这样加入 users 表:

SELECT
 x.ucount,
 u.cdate,
 COUNT(*)
FROM (
 SELECT
  user_id,
  COUNT(*) AS ucount
 FROM objects_users
 GROUP BY user_id
) AS x
LEFT JOIN users AS u
 ON x.user_id = u.id
GROUP BY u.cdate, x.ucount
ORDER BY u.cdate, x.ucount

I think a subquery is exactly the appropriate way to do this, regardless of your RDBMS. Why would it be inelegant?

For the second query, just join the users table like this:

SELECT
 x.ucount,
 u.cdate,
 COUNT(*)
FROM (
 SELECT
  user_id,
  COUNT(*) AS ucount
 FROM objects_users
 GROUP BY user_id
) AS x
LEFT JOIN users AS u
 ON x.user_id = u.id
GROUP BY u.cdate, x.ucount
ORDER BY u.cdate, x.ucount
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文