在2个PANDAS数据框架中执行模糊匹配
我有两个数据范围,不同的行编号包含有关玩家的信息。第一个具有我需要的所有名称。
df1 = pd.DataFrame({'Player': ["John Sepi", 'Zan Fred', 'Mark Daniel', 'Adam Pop', 'Paul Sepi', 'John Hernandez', 'Price Josiah', 'John Hernandez', 'Adam Pop'],
'Team': ['A', 'C', 'E', 'C', 'B', 'D', 'B', 'A', 'D']})
另一个数据框架缺少某些玩家,但具有年龄的专栏。在某些情况下,玩家的名字差异较小。
df2 = pd.DataFrame({'Player': ["John Sepi", 'Mark A. Daniel', 'John Hernandez', 'Price Josiah', 'John Hernandez', 'Adam Pop'],
'Team': ['A', 'E', 'D', 'B', 'A', 'D'],
'Age': [22, 21, 26, 18, 19, 25]})
等于名字是不同的人,因此我需要在同一时间和团队中匹配。我想创建一个新的DataFrame,其中包括第一个数据帧的所有名称,并从第二个数据框架上各自的年龄。如果第二次缺少播放器,则具有恒定值的完整新数据帧(例如XX年,可以是任何年龄。最终数据帧:
print(final_df)
Player Team Age
0 John Sepi A 22
1 Zan Fred C XX
2 Mark Daniel E 21
3 Adam Pop C XX
4 Paul Sepi B XX
5 John Hernandez D 26
6 Price Josiah B 18
7 John Hernandez A 19
8 Adam Pop D 25
I have two dataframes with different rows numbers contain information about players. The first has all names that I need.
df1 = pd.DataFrame({'Player': ["John Sepi", 'Zan Fred', 'Mark Daniel', 'Adam Pop', 'Paul Sepi', 'John Hernandez', 'Price Josiah', 'John Hernandez', 'Adam Pop'],
'Team': ['A', 'C', 'E', 'C', 'B', 'D', 'B', 'A', 'D']})
The another dataframe is missing some players, but has a column with age. The player's names have smaller differences in some cases.
df2 = pd.DataFrame({'Player': ["John Sepi", 'Mark A. Daniel', 'John Hernandez', 'Price Josiah', 'John Hernandez', 'Adam Pop'],
'Team': ['A', 'E', 'D', 'B', 'A', 'D'],
'Age': [22, 21, 26, 18, 19, 25]})
The equals names are different persons, because of that i need match at the same time Player and Team. I want to create a new dataframe with all names from first dataframe with respective age from second dataframe. In case of missing players in second, complete new dataframe with constant value(like XX years, can be any age..just to illustrate). The final dataframe:
print(final_df)
Player Team Age
0 John Sepi A 22
1 Zan Fred C XX
2 Mark Daniel E 21
3 Adam Pop C XX
4 Paul Sepi B XX
5 John Hernandez D 26
6 Price Josiah B 18
7 John Hernandez A 19
8 Adam Pop D 25
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以使用
fuzzywuzzy
库的文本匹配功能与Python中的Pandas功能混合。首先,导入以下库:
您可以使用
fuzzywuzzy
python库的文本匹配功能:使用此代码和定义为70的阈值,您将获得以下结果:
可以移动阈值为了提高两个数据范围之间的文本匹配功能的准确性。
请注意,不建议使用
.iterrows()
作为数据框架上的迭代时要小心。您可以检查
fuzzywuzzy
doc: /You can use the text matching capabilities of the
fuzzywuzzy
library mixed with pandas functions in python.First, import the following libraries :
You can use the text matching capabilities of the
fuzzywuzzy
python library :with this code and a threshold defined as 70, you get the following result:
It is possible to move the threshold to increase the accuracy of the text matching capabilities between the two dataframes.
Please note that you should be careful when using
.iterrows()
as iteration on a dataframe is not advised.You can check the
fuzzywuzzy
doc here https://pypi.org/project/fuzzywuzzy/这是一种方法:
here is one way: