获取 Hive 中每个 NTILE 存储桶的最小值
我正在尝试使用 NTILE 窗口函数将数据划分为百分位数(100 个相等的桶),对于按分数列排序的每个merchant_id。查询的输出将包含源表中每条记录的merchant_id、分数和百分位数。 (示例代码如下)
CREATE TABLE merchant_score_ntiles
AS
SELECT merchant_id, score, NTILE(100) OVER (PARTITION BY merchant_id ORDER BY score DESC) as percentile
FROM merch_table
这将返回示例输出,如下所示:
merchant_id,分数,百分位数
1001,900,1
1001,800,1
1001,760,1
1002,900,2
1002,800,2
1002,750,2
有没有办法我们可以根据百分位数列仅返回每个merchant_id 的最低分数,如下所示?
merchant_id,分数,百分位数
1001,760,1
1002,750,2
I am trying to partition the data into percentiles (100 equal buckets) using NTILE window function for each merchant_id ordered by score column. The output of the query will contain merchant_id, score, and percentile for every record in the source table. (Sample code below)
CREATE TABLE merchant_score_ntiles
AS
SELECT merchant_id, score, NTILE(100) OVER (PARTITION BY merchant_id ORDER BY score DESC) as percentile
FROM merch_table
This will return sample output as follows:
merchant_id,score,percentile
1001,900,1
1001,800,1
1001,760,1
1002,900,2
1002,800,2
1002,750,2
Is there a way we can return only the minimum score for each merchant_id based on percentile column such as below?
merchant_id,score,percentile
1001,760,1
1002,750,2
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在使用
NTILE
窗口函数之前,您可以尝试在子查询中使用ROW_NUMBER
窗口函数You can try to use
ROW_NUMBER
window function in subquery before usingNTILE
window function