在 postgresql 的子查询中使用外部查询结果

发布于 2024-12-16 17:38:22 字数 1589 浏览 4 评论 0原文

我有两个表pointscontacts,我试图获取按月分组的每个联系人的平均points.score请注意,积分和联系人没有关系,我只想将一个月内创建的积分总和除以该月存在的联系人数量。

因此,我需要对按created_at月份分组的积分进行求和,并且我只需要计算该月的联系人数量。正是最后一部分欺骗了我。我不确定如何在子查询中使用外部查询中的列。我尝试了这样的事情:

SELECT SUM(score) AS points_sum,
  EXTRACT(month FROM created_at) AS month,
  date_trunc('MONTH', created_at) + INTERVAL '1 month' AS next_month,
  (SELECT COUNT(id) FROM contacts WHERE contacts.created_at <= next_month) as contact_count
FROM points
GROUP BY month, next_month
ORDER BY month

所以,我正在提取我的积分被汇总的实际月份,同时获取下一个月的开始,这样我就可以说“获取他们在is < next_month"

但它抱怨列 next_month 不存在 这是可以理解的,因为子查询对外部查询一无所知。使用 points.next_month 进行资格认证也不起作用。

那么有人可以指出我如何实现这一目标的正确方向吗?

表格:

分数

score | created_at
10    | "2011-11-15 21:44:00.363423"
11    | "2011-10-15 21:44:00.69667" 
12    | "2011-09-15 21:44:00.773289"
13    | "2011-08-15 21:44:00.848838"
14    | "2011-07-15 21:44:00.924152"

联系人

id | created_at
6  | "2011-07-15 21:43:17.534777"
5  | "2011-08-15 21:43:17.520828"
4  | "2011-09-15 21:43:17.506452"
3  | "2011-10-15 21:43:17.491848"
1  | "2011-11-15 21:42:54.759225"

总和、月份和下一个月(不带子选择)

sum | month | next_month
14  | 7     | "2011-08-01 00:00:00"
13  | 8     | "2011-09-01 00:00:00"
12  | 9     | "2011-10-01 00:00:00"
11  | 10    | "2011-11-01 00:00:00"
10  | 11    | "2011-12-01 00:00:00"

I have two tables points and contacts and I'm trying to get the average points.score per contact grouped on a monthly basis. Note that points and contacts aren't related, I just want the sum of points created in a month divided by the number of contacts that existed in that month.

So, I need to sum points grouped by the created_at month, and I need to take the count of contacts FOR THAT MONTH ONLY. It's that last part that's tricking me up. I'm not sure how I can use a column from an outer query in the subquery. I tried something like this:

SELECT SUM(score) AS points_sum,
  EXTRACT(month FROM created_at) AS month,
  date_trunc('MONTH', created_at) + INTERVAL '1 month' AS next_month,
  (SELECT COUNT(id) FROM contacts WHERE contacts.created_at <= next_month) as contact_count
FROM points
GROUP BY month, next_month
ORDER BY month

So, I'm extracting the actual month that my points are being summed, and at the same time, getting the beginning of the next_month so that I can say "Get me the count of contacts where their created at is < next_month"

But it complains that column next_month doesn't exist This is understandable as the subquery knows nothing about the outer query. Qualifying with points.next_month doesn't work either.

So can someone point me in the right direction of how to achieve this?

Tables:

Points

score | created_at
10    | "2011-11-15 21:44:00.363423"
11    | "2011-10-15 21:44:00.69667" 
12    | "2011-09-15 21:44:00.773289"
13    | "2011-08-15 21:44:00.848838"
14    | "2011-07-15 21:44:00.924152"

Contacts

id | created_at
6  | "2011-07-15 21:43:17.534777"
5  | "2011-08-15 21:43:17.520828"
4  | "2011-09-15 21:43:17.506452"
3  | "2011-10-15 21:43:17.491848"
1  | "2011-11-15 21:42:54.759225"

sum, month and next_month (without the subselect)

sum | month | next_month
14  | 7     | "2011-08-01 00:00:00"
13  | 8     | "2011-09-01 00:00:00"
12  | 9     | "2011-10-01 00:00:00"
11  | 10    | "2011-11-01 00:00:00"
10  | 11    | "2011-12-01 00:00:00"

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

执着的年纪 2024-12-23 17:38:23

立即编辑

联系人的运行总数。我的初稿每月使用新联系人,这显然不是 OP 想要的。

WITH c AS (
    SELECT created_at
          ,count(id) OVER (order BY created_at) AS ct
    FROM   contacts
    ), p AS (
    SELECT date_trunc('month', created_at) AS month
          ,sum(score) AS points_sum
    FROM   points
    GROUP  BY 1
    )
SELECT p.month
      ,EXTRACT(month FROM p.month) AS month_nr
      ,p.points_sum
      ,( SELECT c.ct
         FROM   c
         WHERE  c.created_at < (p.month + interval '1 month')
         ORDER  BY c.created_at DESC
         LIMIT  1) AS contacts
FROM   p
ORDER  BY 1
  • 这适用于多年来的任意月份。
  • 假设points 表中没有缺少任何月份。如果您想要所有月份,包括中缺少的月份,请使用generate_series() 并 LEFT JOIN 到它。
  • 使用窗口函数在 CTE 中构建运行总和。
  • 这两个 CTE 并不是绝对必要的 - 仅出于性能和简化目的。
  • 在子选择中获取contacts_count。

您的原始查询形式可能会像这样工作:

SELECT month
      ,EXTRACT(month FROM month) AS month_nr
      ,points_sum
      ,(SELECT count(*)
        FROM   contacts c
        WHERE  c.created_at < (p.month + interval '1 month')) AS contact_count
FROM   (
    SELECT date_trunc('MONTH', created_at) AS month
          ,sum(score) AS points_sum
    FROM   points p
    GROUP  BY 1
    ) p
ORDER  BY 1

解决错误的直接原因是将聚合放入子查询中。你以一种不可能的方式混合关卡。
我希望我的变体在大表上会稍微快一些。不确定较小的桌子。如果您能报告测试结果,那就太好了。
加上一个小修复:< 而不是 <=

Edit

Now with running sum of contacts. My first draft used new contacts per month, which is obviously not what OP wants.

WITH c AS (
    SELECT created_at
          ,count(id) OVER (order BY created_at) AS ct
    FROM   contacts
    ), p AS (
    SELECT date_trunc('month', created_at) AS month
          ,sum(score) AS points_sum
    FROM   points
    GROUP  BY 1
    )
SELECT p.month
      ,EXTRACT(month FROM p.month) AS month_nr
      ,p.points_sum
      ,( SELECT c.ct
         FROM   c
         WHERE  c.created_at < (p.month + interval '1 month')
         ORDER  BY c.created_at DESC
         LIMIT  1) AS contacts
FROM   p
ORDER  BY 1
  • This works for any number of months across the years.
  • Assumes that no month is missing in the table points. If you want all months, including missing ones in points, generate a list of months with generate_series() and LEFT JOIN to it.
  • Build a running sum in a CTE with a window function.
  • Both CTE are not strictly necessary - for performance and simplification only.
  • Get contacts_count in a subselect.

Your original form of the query could work like this:

SELECT month
      ,EXTRACT(month FROM month) AS month_nr
      ,points_sum
      ,(SELECT count(*)
        FROM   contacts c
        WHERE  c.created_at < (p.month + interval '1 month')) AS contact_count
FROM   (
    SELECT date_trunc('MONTH', created_at) AS month
          ,sum(score) AS points_sum
    FROM   points p
    GROUP  BY 1
    ) p
ORDER  BY 1

The fix for the immediate cause of your error is to put the aggregate into a subquery. You were mixing levels in a way that is impossible.
I expect my variant to be slightly faster with big tables. Not sure about smaller tables. Would be great if you'd report back with test results.
Plus a minor fix: < instead of <=.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文