获取表中每个k记录的array_agg(id)数组
我想查询一个雪花表,以便将其所有记录都批准成包含ID阵列的行,最多最多是K元素。 例如,如果k = 2
,并且该表包含带有IDS 1,2,3,4,5
的记录,我想发布一个可以返回的查询[1,2]
,[3,4]
,[5]
(实际批处理标准将不重要)。有可能吗?
I'd like to query a snowflake table such that all of its records would be batched into rows containing id arrays of, at most, k elements.
For instance, if k=2
, and the table contained records with ids 1,2,3,4,5
, I'd like to issue a query that could return [1,2]
, [3,4]
, [5]
(the actual batching criteria would be unimportant). Is that possible?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
需要一个基于伪柱,我们可以进行分组以创建这些集合。
使用row_number()和做
div
根据k
的值创建组。以下几个示例
Need a pseudo-column based on which we can do grouping to create these sets.
Using row_number() and do
DIV
to create groups as per value ofk
.Few examples given below -
我喜欢Pankaj的答案,我和以为我可以摆脱第二个CTE(而不会使它成为子选择),但最终出于范围的原因而需要。
我确实翻转了使用变量来控制桶。我首先将两个顺序添加到了桶中,我使用了所讨论的值,因此可以看到如何不随机将值分配到存储桶中。其次是数组上的组内部的一个,因此始终“按顺序”始终不需要“按顺序”。但也不需要说明。但是,如果总是需要答案,则需要按顺序进行。
给出:
在存储桶上注意的点是Row_number在1开始值将是自然间隔,因此需要CEIL将PRIO N-1部分推入同一桶。由于不只有整数划分(自然地板)减去1 form_number,因此无法利用更简单的逻辑。
I like Pankaj's answer, I and thought I could get rid of the second CTE (without making it a sub-select) but in the end it is needed for scoping reasons.
I did flip to using a variable to control the bucketing. And I added two ORDER BY's firstly to the bucket, I used the values in question, so it can be seen how to not randomly allocate the values into buckets. And secondly a WITHIN GROUP on the array so the values are always "in order" both perhaps not required. But also not stated as not required. But if the answers are always want this way, the ORDER BY are required.
gives:
The points to note on the bucketing is ROW_NUMBER starts at 1 thus the Nth values will be a natural interval, thus CEIL is needed to push the prio N-1 partials into the same bucket. As there is not integer only division (with natural floor) subtracting 1 form ROW_NUMBER cannot be leveraged for a simpler logic.
这可能不会胜过pankaj的答案,但避免了子查询。也许
array_slice
读取更多的自我记录。在组中添加(订购。
This likely won't outperform Pankaj' answer, but it avoids a subquery. Maybe
array_slice
reads more self-documenting. Addwithin group (order by..)
toarray_agg
if ordering matters