比较一个小组中的AVG行，所有行不在组中（BigQuery）

发布于 2025-01-22 16:51:24 字数 1903 浏览 4 评论 0原文

我有一个看起来像这样的数据集：

date	grp_name	uid	value_a	value_b	value_c
2022-01-01	A	1	1	10	5
2022-01-01	B	2	7	1	20
2022-01-01	C	10	7	3	20
2022-01- 01	A	3	3	12	4
2022-01-02	B	2	6	1 21	21
2022-01-02	B	5	3	4	19
2022-01-03	A	6	1	15	6
2022-01-01-03	C	7	8	2	22
2022-01- 03	d	9	10	2	18

对于每个日期，每个grp_name，我想计算value_a，value_b and value_c and value_c copross as low as low lows and and（这是我遇到问题的地方）：value_a，value_b and value_c的avg avg不在组中的行。

GRP_NAME的预期= a日期= 2022-01-01。我想生成一个IN_GRP列以将手头组与非组成员的平均值分开。

date	grp_name	in_grp	value_a	value_b	value_c
2022-01-01	a	true	2	11	4.5
2022-01-01	a	false	7	2	20

这是我到目前为止写的简单查询，缺乏拾取非集团成员的能力对于平均值，并创建IN_GRP列以将组成员与非组成员分开：

SELECT
  date,
  grp_name,
  AVG(value_a) value_a,
  AVG(value_b) value_b,
  AVG(value_c) value_c
FROM table
GROUP BY date, grp_name

有关如何解决此问题的任何建议？

原文

I have a data set that looks like this:

date	grp_name	uid	value_a	value_b	value_c
2022-01-01	A	1	1	10	5
2022-01-01	B	2	7	1	20
2022-01-01	C	10	7	3	20
2022-01-01	A	3	3	12	4
2022-01-02	B	2	6	1	21
2022-01-02	B	5	3	4	19
2022-01-03	A	6	1	15	6
2022-01-03	C	7	8	2	22
2022-01-03	D	9	10	2	18

For each date, and each grp_name, I want to calculate the AVG of value_a, value_b and value_c accross all rows, and (here's where I run into problems): the AVG of value_a, value_b and value_c for all rows that are NOT in the group.

Expected for grp_name = A on date = 2022-01-01. I imagine generating an in_grp column to separate the average values that are from the group at hand from those that are from non-group members.

date	grp_name	in_grp	value_a	value_b	value_c
2022-01-01	A	TRUE	2	11	4.5
2022-01-01	A	FALSE	7	2	20

Here is the simple query that I've written so far, that lacks the ability to pick up non-group members for the averages, and create the in_grp column to separate the group members from the non-group members:

SELECT
  date,
  grp_name,
  AVG(value_a) value_a,
  AVG(value_b) value_b,
  AVG(value_c) value_c
FROM table
GROUP BY date, grp_name

Any advice on how to solve this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

執念 2025-01-29 16:51:24

请考虑以下方法

with temp as (
  select distinct date, grp_name,
    count(*) over() count_all,
    count(*) over(partition by date, grp_name) count_in_grp,
    sum(value_a) over() sum_a,
    sum(value_a) over(partition by date, grp_name) sum_a_in_grp,
    sum(value_b) over() sum_b,
    sum(value_b) over(partition by date, grp_name) sum_b_in_grp,
    sum(value_c) over() sum_c,
    sum(value_c) over(partition by date, grp_name) sum_c_in_grp,
  from your_table
)
select date, grp_name, true as in_grp, 
  sum_a_in_grp / count_in_grp as value_a,
  sum_b_in_grp / count_in_grp as value_b,
  sum_c_in_grp / count_in_grp as value_c
from temp 
union all
select date, grp_name, false as in_grp, 
  (sum_a - sum_a_in_grp) / (count_all - count_in_grp) as value_a,
  (sum_b - sum_b_in_grp) / (count_all - count_in_grp) as value_b,
  (sum_c - sum_c_in_grp) / (count_all - count_in_grp) as value_c
from temp
-- order by date, grp_name, in_grp desc

如果应用于您的问题输出中的样本数据，则为

Consider below approach

with temp as (
  select distinct date, grp_name,
    count(*) over() count_all,
    count(*) over(partition by date, grp_name) count_in_grp,
    sum(value_a) over() sum_a,
    sum(value_a) over(partition by date, grp_name) sum_a_in_grp,
    sum(value_b) over() sum_b,
    sum(value_b) over(partition by date, grp_name) sum_b_in_grp,
    sum(value_c) over() sum_c,
    sum(value_c) over(partition by date, grp_name) sum_c_in_grp,
  from your_table
)
select date, grp_name, true as in_grp, 
  sum_a_in_grp / count_in_grp as value_a,
  sum_b_in_grp / count_in_grp as value_b,
  sum_c_in_grp / count_in_grp as value_c
from temp 
union all
select date, grp_name, false as in_grp, 
  (sum_a - sum_a_in_grp) / (count_all - count_in_grp) as value_a,
  (sum_b - sum_b_in_grp) / (count_all - count_in_grp) as value_b,
  (sum_c - sum_c_in_grp) / (count_all - count_in_grp) as value_c
from temp
-- order by date, grp_name, in_grp desc

if applied to sample data in your question output is