在 postgresql 的子查询中使用外部查询结果

发布于 2024-12-16 17:38:22 字数 1589 浏览 4 评论 0原文

我有两个表points 和contacts，我试图获取按月分组的每个联系人的平均points.score。 请注意，积分和联系人没有关系，我只想将一个月内创建的积分总和除以该月存在的联系人数量。

因此，我需要对按created_at月份分组的积分进行求和，并且我只需要计算该月的联系人数量。正是最后一部分欺骗了我。我不确定如何在子查询中使用外部查询中的列。我尝试了这样的事情：

SELECT SUM(score) AS points_sum,
  EXTRACT(month FROM created_at) AS month,
  date_trunc('MONTH', created_at) + INTERVAL '1 month' AS next_month,
  (SELECT COUNT(id) FROM contacts WHERE contacts.created_at <= next_month) as contact_count
FROM points
GROUP BY month, next_month
ORDER BY month

所以，我正在提取我的积分被汇总的实际月份，同时获取下一个月的开始，这样我就可以说“获取他们在is < next_month"

但它抱怨列 next_month 不存在 这是可以理解的，因为子查询对外部查询一无所知。使用 points.next_month 进行资格认证也不起作用。

那么有人可以指出我如何实现这一目标的正确方向吗？

表格：

分数

score | created_at
10    | "2011-11-15 21:44:00.363423"
11    | "2011-10-15 21:44:00.69667" 
12    | "2011-09-15 21:44:00.773289"
13    | "2011-08-15 21:44:00.848838"
14    | "2011-07-15 21:44:00.924152"

联系人

id | created_at
6  | "2011-07-15 21:43:17.534777"
5  | "2011-08-15 21:43:17.520828"
4  | "2011-09-15 21:43:17.506452"
3  | "2011-10-15 21:43:17.491848"
1  | "2011-11-15 21:42:54.759225"

总和、月份和下一个月（不带子选择）

sum | month | next_month
14  | 7     | "2011-08-01 00:00:00"
13  | 8     | "2011-09-01 00:00:00"
12  | 9     | "2011-10-01 00:00:00"
11  | 10    | "2011-11-01 00:00:00"
10  | 11    | "2011-12-01 00:00:00"

原文

I have two tables points and contacts and I'm trying to get the average points.score per contact grouped on a monthly basis. Note that points and contacts aren't related, I just want the sum of points created in a month divided by the number of contacts that existed in that month.

So, I need to sum points grouped by the created_at month, and I need to take the count of contacts FOR THAT MONTH ONLY. It's that last part that's tricking me up. I'm not sure how I can use a column from an outer query in the subquery. I tried something like this:

SELECT SUM(score) AS points_sum,
  EXTRACT(month FROM created_at) AS month,
  date_trunc('MONTH', created_at) + INTERVAL '1 month' AS next_month,
  (SELECT COUNT(id) FROM contacts WHERE contacts.created_at <= next_month) as contact_count
FROM points
GROUP BY month, next_month
ORDER BY month

So, I'm extracting the actual month that my points are being summed, and at the same time, getting the beginning of the next_month so that I can say "Get me the count of contacts where their created at is < next_month"

But it complains that column next_month doesn't exist This is understandable as the subquery knows nothing about the outer query. Qualifying with points.next_month doesn't work either.

So can someone point me in the right direction of how to achieve this?

Tables:

Points

score | created_at
10    | "2011-11-15 21:44:00.363423"
11    | "2011-10-15 21:44:00.69667" 
12    | "2011-09-15 21:44:00.773289"
13    | "2011-08-15 21:44:00.848838"
14    | "2011-07-15 21:44:00.924152"

Contacts

id | created_at
6  | "2011-07-15 21:43:17.534777"
5  | "2011-08-15 21:43:17.520828"
4  | "2011-09-15 21:43:17.506452"
3  | "2011-10-15 21:43:17.491848"
1  | "2011-11-15 21:42:54.759225"

sum, month and next_month (without the subselect)

sum | month | next_month
14  | 7     | "2011-08-01 00:00:00"
13  | 8     | "2011-09-01 00:00:00"
12  | 9     | "2011-10-01 00:00:00"
11  | 10    | "2011-11-01 00:00:00"
10  | 11    | "2011-12-01 00:00:00"

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

执着的年纪 2024-12-23 17:38:23

立即编辑

联系人的运行总数。我的初稿每月使用新联系人，这显然不是 OP 想要的。

WITH c AS (
    SELECT created_at
          ,count(id) OVER (order BY created_at) AS ct
    FROM   contacts
    ), p AS (
    SELECT date_trunc('month', created_at) AS month
          ,sum(score) AS points_sum
    FROM   points
    GROUP  BY 1
    )
SELECT p.month
      ,EXTRACT(month FROM p.month) AS month_nr
      ,p.points_sum
      ,( SELECT c.ct
         FROM   c
         WHERE  c.created_at < (p.month + interval '1 month')
         ORDER  BY c.created_at DESC
         LIMIT  1) AS contacts
FROM   p
ORDER  BY 1

这适用于多年来的任意月份。
假设points 表中没有缺少任何月份。如果您想要所有月份，包括点中缺少的月份，请使用generate_series() 并 LEFT JOIN 到它。
使用窗口函数在 CTE 中构建运行总和。
这两个 CTE 并不是绝对必要的 - 仅出于性能和简化目的。
在子选择中获取contacts_count。

您的原始查询形式可能会像这样工作：

SELECT month
      ,EXTRACT(month FROM month) AS month_nr
      ,points_sum
      ,(SELECT count(*)
        FROM   contacts c
        WHERE  c.created_at < (p.month + interval '1 month')) AS contact_count
FROM   (
    SELECT date_trunc('MONTH', created_at) AS month
          ,sum(score) AS points_sum
    FROM   points p
    GROUP  BY 1
    ) p
ORDER  BY 1

解决错误的直接原因是将聚合放入子查询中。你以一种不可能的方式混合关卡。
我希望我的变体在大表上会稍微快一些。不确定较小的桌子。如果您能报告测试结果，那就太好了。
加上一个小修复：< 而不是 <=。

Edit

Now with running sum of contacts. My first draft used new contacts per month, which is obviously not what OP wants.

WITH c AS (
    SELECT created_at
          ,count(id) OVER (order BY created_at) AS ct
    FROM   contacts
    ), p AS (
    SELECT date_trunc('month', created_at) AS month
          ,sum(score) AS points_sum
    FROM   points
    GROUP  BY 1
    )
SELECT p.month
      ,EXTRACT(month FROM p.month) AS month_nr
      ,p.points_sum
      ,( SELECT c.ct
         FROM   c
         WHERE  c.created_at < (p.month + interval '1 month')
         ORDER  BY c.created_at DESC
         LIMIT  1) AS contacts
FROM   p
ORDER  BY 1

This works for any number of months across the years.
Assumes that no month is missing in the table points. If you want all months, including missing ones in points, generate a list of months with generate_series() and LEFT JOIN to it.
Build a running sum in a CTE with a window function.
Both CTE are not strictly necessary - for performance and simplification only.
Get contacts_count in a subselect.

Your original form of the query could work like this:

SELECT month
      ,EXTRACT(month FROM month) AS month_nr
      ,points_sum
      ,(SELECT count(*)
        FROM   contacts c
        WHERE  c.created_at < (p.month + interval '1 month')) AS contact_count
FROM   (
    SELECT date_trunc('MONTH', created_at) AS month
          ,sum(score) AS points_sum
    FROM   points p
    GROUP  BY 1
    ) p
ORDER  BY 1

The fix for the immediate cause of your error is to put the aggregate into a subquery. You were mixing levels in a way that is impossible.
I expect my variant to be slightly faster with big tables. Not sure about smaller tables. Would be great if you'd report back with test results.
Plus a minor fix: < instead of <=.

回复收藏 0 原文

~没有更多了~