在通过另一列中的唯一ID分组时,如何在数据框列的上一行中比较名称?
我有下表:
唯一ID | 名称 |
---|---|
111 | Mayank |
111 | Mayanak |
222 | Leddie。 |
222 | Leddie Chan |
333 | May Bou Karam |
333 | May Bou Karam |
我试图将“名称”列中的每一行与以前的列进行比较,同时按唯一ID进行分组。期望输出类似:
唯一的ID | 名称 | 2 | 分数 |
---|---|---|---|
111 | Mayank | Mayanak | 67 |
111 | Mayanak | Nan | 67 |
222 | Leddie。 | Leddie Chan | 90 |
222 | Leddie Chan | Nan | 90 |
333 | May Bou Karam | May Bou Karam | 90 |
333 | May Bou Karam | Nan | 33 |
我使用以下代码来评估与下一行相比的一排的得分按唯一ID进行分组:
df['Name2'] = df['Name'].shift(-1)
df['Score'] = df.apply(lambda x: fuzz.partial_ratio(x['Name'], x['Name2']), axis=1)
我将获得以下输出:
唯一ID | 名称 | 2 | 分数 |
---|---|---|---|
111 | Mayank | Mayanak | 67 |
111 | Mayanak | Leddie。 | 0 |
222 | LEDDIE。 | Leddie Chan | 100 |
222 | Leddie Chan | May Bou Karam | 18 |
333 | May Bou Karam | May Bou Karam | 90 |
333 | May Bou Karam | Nan | 33 |
我尚未与使用Fuzz.partial_ratio结婚。如果有更好的方法可以匹配字符串,我会为此而成为游戏。
另外,我意识到Name2列不是必需的,但是我创建了它,以确保我的每个步骤正确。很抱歉,如果令人困惑。任何帮助和反馈将不胜感激。谢谢。
I have the following table:
Unique ID | Name |
---|---|
111 | Mayank |
111 | Mayanak |
222 | Leddie . |
222 | Leddie Chan |
333 | May BOU KARAM |
333 | May Bou Karam |
I'm trying to compare each row in the 'Name' column to its previous column while grouping by Unique ID. Expecting an output similar to:
Unique ID | Name | Name2 | Score |
---|---|---|---|
111 | Mayank | Mayanak | 67 |
111 | Mayanak | nan | 67 |
222 | Leddie . | Leddie Chan | 90 |
222 | Leddie Chan | nan | 90 |
333 | May BOU KARAM | May Bou Karam | 90 |
333 | May Bou Karam | nan | 33 |
I've used the following code to evaluate the score of one row compared to its next row but I can't figure out how to group it by the Unique ID:
df['Name2'] = df['Name'].shift(-1)
df['Score'] = df.apply(lambda x: fuzz.partial_ratio(x['Name'], x['Name2']), axis=1)
I get the following output:
Unique ID | Name | Name2 | Score |
---|---|---|---|
111 | Mayank | Mayanak | 67 |
111 | Mayanak | Leddie . | 0 |
222 | Leddie . | Leddie Chan | 100 |
222 | Leddie Chan | May BOU KARAM | 18 |
333 | May BOU KARAM | May Bou Karam | 90 |
333 | May Bou Karam | nan | 33 |
I'm not married to using fuzz.partial_ratio to match the strings. If there's a better way to match the strings, I'd be game for that.
Also, I realise that the Name2 column is not necessary, but I created it to make sure I'm getting each step right. Apologies if it's confusing. Any help and feedback would be greatly appreciated. Thank you.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论