如何按降序连接 HIVE 中具有相同主键的所有值?

发布于 2025-01-17 09:43:53 字数 2230 浏览 5 评论 0 原文

我正在使用 HIVE,我有一个这样的表:

S.no ID application_num f_name l_name Primary Key
1 123 202201A1 akhil yadav 123~&~akhil~&~yadav
2 123 202202A2 akhil yadav 123~&~akhil~& ;~亚达夫
3 123 202203A2 阿基尔 ·亚达夫 123~&~akhil~&~yadav
4 987 202201B8 安基特·亚达夫 987 ~&~ankit~&~yadav
5 987 202203B19 安基特· 亚达夫 987~&~ankit~&~yadav
6 987 202208B34 ankit yadav 987~&~ankit~&~yadav

我想将同一primary_key 的所有applicant_num 连接成一个字符串,以便它们保持降序(最后一个申请应该排在前面)。

这是我尝试过的:

select 
    concat_ws('~', collect_set(applicant_num)), primary_key 
from 
    (select * 
     from table 
     order by applicant_num desc) 
group by 
    primary_key  

使用此查询我得到如下结果:

applicant_num Primary Key
202203A2~202201A1~202202A2 123~&~akhil~&~yadav
202203B19~202208B34~202201B8 987~&~ankit~& 〜亚达夫

而我想要的是

applicant_num Primary Key
202203A2~202202A2~202201A1 123~&~akhil~&~yadav
202208B34~202203B19~202201B8 987~&~ankit~&~yadav

连接不保持降序。

我尝试过

select 
    concat_ws('~', sort_array(collect_set(applicant_num))), primary_key 
from 
    (select * 
     from table 
     order by applicant_num desc) 
group by 
    primary_key  

,但是 sort_array 按升序返回数组,并且无法按降序排序。

有没有办法将所有applicant_num连接在一个字符串中作为主键,同时保持applicant_num的降序排列?

I am using HIVE and I have a table like this:

S.no ID applicant_num f_name l_name Primary Key
1 123 202201A1 akhil yadav 123~&~akhil~&~yadav
2 123 202202A2 akhil yadav 123~&~akhil~&~yadav
3 123 202203A2 akhil yadav 123~&~akhil~&~yadav
4 987 202201B8 ankit yadav 987~&~ankit~&~yadav
5 987 202203B19 ankit yadav 987~&~ankit~&~yadav
6 987 202208B34 ankit yadav 987~&~ankit~&~yadav

I want to concatenate all the applicant_num for same primary_key into a string such that they maintain their descending order (Last application should come first).

Here is what I tried:

select 
    concat_ws('~', collect_set(applicant_num)), primary_key 
from 
    (select * 
     from table 
     order by applicant_num desc) 
group by 
    primary_key  

Using this query I am getting a result like this:

applicant_num Primary Key
202203A2~202201A1~202202A2 123~&~akhil~&~yadav
202203B19~202208B34~202201B8 987~&~ankit~&~yadav

whereas what I want is

applicant_num Primary Key
202203A2~202202A2~202201A1 123~&~akhil~&~yadav
202208B34~202203B19~202201B8 987~&~ankit~&~yadav

The concatenation is not maintaining the descending order.

I tried

select 
    concat_ws('~', sort_array(collect_set(applicant_num))), primary_key 
from 
    (select * 
     from table 
     order by applicant_num desc) 
group by 
    primary_key  

But sort_array returns array in ascending order and it cannot be sorted in descending.

Is there any to concatenate all applicant_num in one string for a primary key while maintaining the descending order of applicant_num?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

鸵鸟症 2025-01-24 09:43:53

array_sort 始终按上升顺序排序(羞耻)。

幸运的是, 确实支持它(有点)。因此,尽管表达不必要,但这就是您要寻找的

from pyspark.sql import functions as F

df = spark.createDataFrame([
    (1, '1x'),
    (1, '1a'),
    (1, '1b'),
    (2, '2a'),
    (2, '2b'),
    (2, '2c'),
], ['id', 'col'])

(df
    .groupBy('id')
    .agg(
        F.concat_ws('~', F.expr('array_sort(collect_list(col), (left, right) -> case when left < right then 1 when left > right then -1 else 0 end)')).alias('col')
    )
    .show()
)

+---+--------+
| id|     col|
+---+--------+
|  1|1x~1b~1a|
|  2|2c~2b~2a|
+---+--------+

PySpark API of array_sort always sorting in ascending order (shame).

Luckily, Spark SQL API of array_sort does support it (sort of). So despite the unnecessary lengthy expression, this is what you're looking for

from pyspark.sql import functions as F

df = spark.createDataFrame([
    (1, '1x'),
    (1, '1a'),
    (1, '1b'),
    (2, '2a'),
    (2, '2b'),
    (2, '2c'),
], ['id', 'col'])

(df
    .groupBy('id')
    .agg(
        F.concat_ws('~', F.expr('array_sort(collect_list(col), (left, right) -> case when left < right then 1 when left > right then -1 else 0 end)')).alias('col')
    )
    .show()
)

+---+--------+
| id|     col|
+---+--------+
|  1|1x~1b~1a|
|  2|2c~2b~2a|
+---+--------+
温柔戏命师 2025-01-24 09:43:53

sort_array也可以按降序顺序排序,请参考此处

select 
    concat_ws('~', sort_array(collect_set(applicant_num), false)), primary_key 
from 
    (select * 
     from table 
     order by applicant_num desc) 
group by primary_key  

sort_array can also be sorted in descending order, Please refer to here

select 
    concat_ws('~', sort_array(collect_set(applicant_num), false)), primary_key 
from 
    (select * 
     from table 
     order by applicant_num desc) 
group by primary_key  
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文