比较两列并返回最相似的列python
我有两个数据框。
df1 如下所示。
List1
[apple, banana]
[carrots]
[for, spinach, mushrooms, the]
df2 如下所示。
List2
[apple, garden]
[spinach, smoothie]
[garlic, carrots]
[carrots]
[mushroom, the]
我想将 df1 中的列表与 df2 中的列表进行匹配并生成相似度分数。
因此所需的输出如下所示。
List1 List2 Sim_Score
[apple, banana] [apple, garden] 0.52
[carrots] [carrots] 1.0
[for, spinach, mushrooms, the] [mushrooms, the] 0.49
我可以处理相似度分数部分。我的问题是如何使用 List2 找到 List1 中每一行的最佳匹配?
I have two data frames.
df1 looks like below.
List1
[apple, banana]
[carrots]
[for, spinach, mushrooms, the]
df2 looks like below.
List2
[apple, garden]
[spinach, smoothie]
[garlic, carrots]
[carrots]
[mushroom, the]
I want to match the lists in df1 to the lists in df2 and produce a similarity score.
So desired output looks something like below.
List1 List2 Sim_Score
[apple, banana] [apple, garden] 0.52
[carrots] [carrots] 1.0
[for, spinach, mushrooms, the] [mushrooms, the] 0.49
I can handle the similarity score part. My question is how can I find the best match for every row in List1 using List2?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您的问题是如何使用
List2
找到List1
中每一行的最佳匹配。为了回答这个问题,我将模拟一个相似性评分算法,只是为了获得对于正在分析相似性的不同对而言不同的数字。
下面是一些代码,用于执行您所要求的操作,即针对
List1
中的每一行识别List2
中具有“最佳”(即最高)相似度的匹配项得分:输出为:
关键代码行可以展开(为了更容易理解)如下:
Your question is how to find the best match for every row in
List1
usingList2
.To answer it, I will mock up a similarity score algorithm just to get numbers that are different for different pairs being analyzed for similarity.
Here is some code that does what you're asking in terms of identifying, for each row in
List1
, the match inList2
with the "best" (namely highest) similarity score:Output is:
The key line of code can be expanded (for easier understanding) as follows: