by ID组并根据Pyspark中的优先级创建列
有人可以帮我下面吗? 我有一个输入数据框架。
ID | Process_Type | stp_stagewise |
---|---|---|
1 | Loan_Creation | 手册 |
1 | 贷款 | NSTP |
1 | 赔偿 | STP |
2 | Loan_Creation | STP |
2 | 报销 | NSTP |
3 | Loan_Creation | 4 |
3 | Loan_Creation screat_creation | 4 |
Loan_Creation | Loan_creation | Manuual |
4 | Loan_Creation | NSTP |
输出
1 | ID_CREATIION | in | nsstp |
---|---|---|---|
1 | loat_creation | STP | Man |
1 | stp | STP | 手册 |
报销 | dotion | STP | STP |
2 | LOAN_CREATION | STP | STP |
2 | 报销 | NSTP | NSTP |
3 | Loan_Creation | 手册 | MANUL |
3 | LOAN_CREATION | STP | NSTP |
4 | LOAN_CREATION | NSTP NSTP | NSTP NSTP NSTP NSTP |
NSTP | NSTP NSTP NSTP NSTP | 我 | 需要 |
分组ID和Process_type列和PRISTIS_TYPE列,并优先列表,MARAUL>>>>>>>> nstp>> STP并创建另一列。
有人可以提供解决这个问题的方法。提前致谢。
略有更改与ID一起,也应在过程类型上完成组。
Can someone help me with the below.
I have an input dataframe.
ID | process_type | STP_stagewise |
---|---|---|
1 | loan_creation | Manual |
1 | loan creation | NSTP |
1 | reimbursement | STP |
2 | loan_creation | STP |
2 | reimbursement | NSTP |
3 | loan_creation | Manual |
3 | loan_creation | STP |
4 | loan_creation | Manual |
4 | loan_creation | NSTP |
Output dataframe required:
ID | process_type | STP_stagewise | STP_type |
---|---|---|---|
1 | loan_creation | Manual | Manual |
1 | loan creation | NSTP | Manual |
1 | reimbursement | STP | STP |
2 | loan_creation | STP | STP |
2 | reimbursement | NSTP | NSTP |
3 | loan_creation | Manual | Manual |
3 | loan_creation | STP | Manual |
4 | loan_creation | NSTP | NSTP |
4 | loan_creation | NSTP | NSTP |
I need to groupby id and process_type column and prioritize, Manual >> NSTP >> STP and create a different column.
Can someone provide an approach to solve this. Thanks in Advance.
Slight change along with ID, group by should be done on process type also.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以解决此问题的一种方法是在
ID
中汇总并将所有独特的stp_stagewise
收集到列表中,然后用custom_sort_map
对其进行排序以获取第一个索引元素,最后将其加入您的主要数据帧数据准备
聚合 - 收集设置&排序
加入
One way you can solve this is by aggregating at
id
and collecting all the distinctSTP_stagewise
into a list and sorting it with acustom_sort_map
to get the first index element and finally joining it back to your main DataFrameData Preparation
Aggregation - Collect Set & Sort
Join