在这种情况下,如何在不手动列出所有类别的情况下对表(BigQuery)进行透视?
我有一个表,其架构大致如下:
user_id | segment_id | day
segment_id
的值范围不小:从 1 到 70。 对于天
,它是 0-2。
理想情况下,我想将此表转换为
user_id | segment_1_day_1_count | segment_2_day_1_count | ... segment_70_day_1_count | ... | segment_70_day_3_count | ... segment_1_count | segment_2_count | segment_3_count | day_1_count | day_2_count | day_3_count | total_count
粗略地说,对于每个 user_id
我想获取所有细分组合的计数:
- 按分段
- 按天
- 按分段和按天
- 总计
这看起来非常像旋转,但是我不确定是否有一种方法可以基于多个列进行旋转。
我当前的尝试如下:
SELECT
user_id,
segment_id,
day,
COUNT(*) OVER (PARTITION BY user_id, segment_id),
COUNT(*) OVER (PARTITION BY user_id, day),
COUNT(*) OVER (PARTITION BY user_id, segment_id, day),
COUNT(*) OVER (PARTITION BY user_id)
FROM some_table
这是我需要的数据,但不是所需的格式。
I have a table where the schema is roughly the following:
user_id | segment_id | day
The range of values for segment_id
is not small: from 1 to 70.
For day
, it's 0-2.
Ideally, I would like to transform this table to
user_id | segment_1_day_1_count | segment_2_day_1_count | ... segment_70_day_1_count | ... | segment_70_day_3_count | ... segment_1_count | segment_2_count | segment_3_count | day_1_count | day_2_count | day_3_count | total_count
Roughly speaking, for each user_id
I want to get counts for all breakdown combinations:
- by segment
- by day
- by segment and day
- total
That looks very much like pivoting, but I am not sure there is a way to pivot based on several columns.
My current attempt is the following:
SELECT
user_id,
segment_id,
day,
COUNT(*) OVER (PARTITION BY user_id, segment_id),
COUNT(*) OVER (PARTITION BY user_id, day),
COUNT(*) OVER (PARTITION BY user_id, segment_id, day),
COUNT(*) OVER (PARTITION BY user_id)
FROM some_table
This is the data I need, but not in the desired format.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
请考虑以下方法
如果应用于类似于您所描述的虚拟数据,
- 输出为
Consider below approach
if applied to dummy data similar to what you described - output is
当然。只是[相对]简单地扩展我最初提出的解决方案,
如果应用于类似于您所描述的虚拟数据 - 输出为
Sure. just [relatively] simple extension of my originally proposed solution
if applied to dummy data similar to what you described - output is