获取表中每个k记录的array_agg（id）数组

发布于 2025-02-09 05:13:58 字数 206 浏览 2 评论 0原文

我想查询一个雪花表，以便将其所有记录都批准成包含ID阵列的行，最多最多是K元素。例如，如果k = 2，并且该表包含带有IDS 1,2,3,4,5的记录，我想发布一个可以返回的查询[1,2]，[3,4]，[5]（实际批处理标准将不重要）。有可能吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

风月客 2025-02-16 05:13:58

需要一个基于伪柱，我们可以进行分组以创建这些集合。
使用row_number（）和做div根据k的值创建组。

以下几个示例

with cte(col1) as (
    select * from values
    (1),(2),(3),(4),(5),(6),(7),(8),(9)
)
,cte_1 as (
    select ceil(row_number() over (order by null)/3) rn, 
       col1 
    from cte
)
select array_agg(col1) agg_output 
from cte_1 
group by rn order by rn;

-Agg_output
[1，2，3]
[4，5，6]
[7，8，9]

Need a pseudo-column based on which we can do grouping to create these sets.
Using row_number() and do DIV to create groups as per value of k.

Few examples given below -

with cte(col1) as (
    select * from values
    (1),(2),(3),(4),(5),(6),(7),(8),(9)
)
,cte_1 as (
    select ceil(row_number() over (order by null)/3) rn, 
       col1 
    from cte
)
select array_agg(col1) agg_output 
from cte_1 
group by rn order by rn;

AGG_OUTPUT
[ 1, 2, 3 ]
[ 4, 5, 6 ]
[ 7, 8, 9 ]

回复收藏 0 原文

濫情▎り 2025-02-16 05:13:58

我喜欢Pankaj的答案，我和以为我可以摆脱第二个CTE（而不会使它成为子选择），但最终出于范围的原因而需要。

我确实翻转了使用变量来控制桶。我首先将两个顺序添加到了桶中，我使用了所讨论的值，因此可以看到如何不随机将值分配到存储桶中。其次是数组上的组内部的一个，因此始终“按顺序”始终不需要“按顺序”。但也不需要说明。但是，如果总是需要答案，则需要按顺序进行。

set width = 4;

with table_of_numbers(val) as (
    -- just a CTE to make 14 numbers
    select 
        row_number() over(order by null)
    from table(generator(ROWCOUNT => 14))
), pre_cond as(
    select *,
        ceil(row_number() over (order by val)/$width) as rn
    from table_of_numbers
)
select 
    array_agg(val)within group(order by val) agg_output 
from pre_cond 
group by rn order by rn;

给出：

agg_output
[1，2，3，4] [5，6，7，8
]
[9，10，11，12 ]
[13，14]

在存储桶上注意的点是Row_number在1开始值将是自然间隔，因此需要CEIL将PRIO N-1部分推入同一桶。由于不只有整数划分（自然地板）减去1 form_number，因此无法利用更简单的逻辑。

I like Pankaj's answer, I and thought I could get rid of the second CTE (without making it a sub-select) but in the end it is needed for scoping reasons.

I did flip to using a variable to control the bucketing. And I added two ORDER BY's firstly to the bucket, I used the values in question, so it can be seen how to not randomly allocate the values into buckets. And secondly a WITHIN GROUP on the array so the values are always "in order" both perhaps not required. But also not stated as not required. But if the answers are always want this way, the ORDER BY are required.

set width = 4;

with table_of_numbers(val) as (
    -- just a CTE to make 14 numbers
    select 
        row_number() over(order by null)
    from table(generator(ROWCOUNT => 14))
), pre_cond as(
    select *,
        ceil(row_number() over (order by val)/$width) as rn
    from table_of_numbers
)
select 
    array_agg(val)within group(order by val) agg_output 
from pre_cond 
group by rn order by rn;

gives:

AGG_OUTPUT
[ 1, 2, 3, 4 ]
[ 5, 6, 7, 8 ]
[ 9, 10, 11, 12 ]
[ 13, 14 ]

The points to note on the bucketing is ROW_NUMBER starts at 1 thus the Nth values will be a natural interval, thus CEIL is needed to push the prio N-1 partials into the same bucket. As there is not integer only division (with natural floor) subtracting 1 form ROW_NUMBER cannot be leveraged for a simpler logic.

回复收藏 0 原文

街角迷惘 2025-02-16 05:13:58

这可能不会胜过pankaj的答案，但避免了子查询。也许array_slice读取更多的自我记录。在组中添加（订购。

set K=3;

select array_slice(array_agg(col) over (),
                   sum($K) over (order by col) - $K,  
                   sum($K) over (order by col)) as array_part
from cte
qualify array_size(array_part) > 0

This likely won't outperform Pankaj' answer, but it avoids a subquery. Maybe array_slice reads more self-documenting. Add within group (order by..) to array_agg if ordering matters

set K=3;

select array_slice(array_agg(col) over (),
                   sum($K) over (order by col) - $K,  
                   sum($K) over (order by col)) as array_part
from cte
qualify array_size(array_part) > 0

回复收藏 0 原文

~没有更多了~