模糊匹配值列表python的列表

发布于 2025-02-06 02:01:19 字数 281 浏览 2 评论 0原文

努力以pythonic的方式做这件事。我有一个列表，我们可以调用名称

[('Jimmy', 'Smith'), ('James', 'Wilson'), ('Hugh' "Laurie')]

，然后我有两个变量

First_name = 'Jimm'

Last_name = 'Smitn'

，我想通过此列表列表，一个和姓氏的列表，以模糊为匹配这些值，并返回最接近指定的first_name的列表和last_name

原文

Struggling with how to do this in a pythonic way. I have a list of list which we can call names

[('Jimmy', 'Smith'), ('James', 'Wilson'), ('Hugh' "Laurie')]

And then I have a two variables

First_name = 'Jimm'

Last_name = 'Smitn'

I want to iterate through this list of list, of first and last names to fuzzy match these values and return the list that is the closest to the specified First_name and Last_name

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

从此见与不见 2025-02-13 02:01:19

您可以实现模糊匹配，获得最佳匹配比（使用” （））由 difflib.Sequecematcher（） 。

要实现此目的，我们应该通过 lambda作为键参数，将返回匹配比率。在我的示例中，我将使用） ，但是如果性能很重要，您也应该尝试 sequencematcher.quick_ratio（） 和 sequencematcher.real_quick_ratio（） 。

from difflib import SequenceMatcher

lst = [('Jimmy', 'Smith'), ('James', 'Wilson'), ('Hugh', 'Laurie')]
first_name = 'Jimm'
last_name = 'Smitn'

matcher = SequenceMatcher(a=first_name + ' ' + last_name)
match_first_name, match_last_name = max(lst,
    key=lambda x: matcher.set_seq2(' '.join(x)) or matcher.ratio())

print(first_name, last_name, '-', match_first_name, match_last_name)

You can implement fuzzy matching obtaining best match ratio (using max()) returned by difflib.SequenceMatcher().

To implement this we should pass lambda as key argument which will return match ratio. In my example I'd use SequenceMatcher.ratio(), but if performance is important you should also try with SequenceMatcher.quick_ratio() and SequenceMatcher.real_quick_ratio().

from difflib import SequenceMatcher

lst = [('Jimmy', 'Smith'), ('James', 'Wilson'), ('Hugh', 'Laurie')]
first_name = 'Jimm'
last_name = 'Smitn'

matcher = SequenceMatcher(a=first_name + ' ' + last_name)
match_first_name, match_last_name = max(lst,
    key=lambda x: matcher.set_seq2(' '.join(x)) or matcher.ratio())

print(first_name, last_name, '-', match_first_name, match_last_name)

回复收藏 0 原文

说不完的你爱 2025-02-13 02:01:19

另一个可能的路径是使用集合交叉点。

names = [('Jimmy', 'Smith'), ('James', 'Wilson'), ('Hugh', 'Laurie')]
first_name = "Jimm"
last_name = "Smitn"

setf = set(first_name)
# {'m', 'i', 'J'}
setl = set(last_name)
# {'t', 'n', 'm', 'i', 'S'}

ranked = [(len(setf & set(f)) + len(setl & set(l)), f, l) for f, l in names]
# [(7, 'Jimmy', 'Smith'), (4, 'James', 'Wilson'), (1, 'Hugh', 'Laurie')]

best_match = max(ranked, key=lambda x: x[0])[1:]
# ('Jimmy', 'Smith')

Another possible path would be to use set intersections.

names = [('Jimmy', 'Smith'), ('James', 'Wilson'), ('Hugh', 'Laurie')]
first_name = "Jimm"
last_name = "Smitn"

setf = set(first_name)
# {'m', 'i', 'J'}
setl = set(last_name)
# {'t', 'n', 'm', 'i', 'S'}

ranked = [(len(setf & set(f)) + len(setl & set(l)), f, l) for f, l in names]
# [(7, 'Jimmy', 'Smith'), (4, 'James', 'Wilson'), (1, 'Hugh', 'Laurie')]

best_match = max(ranked, key=lambda x: x[0])[1:]
# ('Jimmy', 'Smith')

回复收藏 0 原文

~没有更多了~