模糊匹配值列表python的列表

发布于 2025-02-06 02:01:19 字数 281 浏览 2 评论 0原文

努力以pythonic的方式做这件事。我有一个列表,我们可以调用名称

[('Jimmy', 'Smith'), ('James', 'Wilson'), ('Hugh' "Laurie')]

,然后我有两个变量

First_name = 'Jimm'

Last_name = 'Smitn'

,我想通过此列表列表,一个和姓氏的列表,以模糊为匹配这些值,并返回最接近指定的first_name的列表和last_name

Struggling with how to do this in a pythonic way. I have a list of list which we can call names

[('Jimmy', 'Smith'), ('James', 'Wilson'), ('Hugh' "Laurie')]

And then I have a two variables

First_name = 'Jimm'

Last_name = 'Smitn'

I want to iterate through this list of list, of first and last names to fuzzy match these values and return the list that is the closest to the specified First_name and Last_name

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

从此见与不见 2025-02-13 02:01:19

您可以实现模糊匹配,获得最佳匹配比(使用” () difflib.Sequecematcher()

要实现此目的,我们应该通过 lambda作为参数,将返回匹配比率。在我的示例中,我将使用 ,但是如果性能很重要,您也应该尝试 sequencematcher.quick_ratio() sequencematcher.real_quick_ratio()

from difflib import SequenceMatcher

lst = [('Jimmy', 'Smith'), ('James', 'Wilson'), ('Hugh', 'Laurie')]
first_name = 'Jimm'
last_name = 'Smitn'

matcher = SequenceMatcher(a=first_name + ' ' + last_name)
match_first_name, match_last_name = max(lst,
    key=lambda x: matcher.set_seq2(' '.join(x)) or matcher.ratio())

print(first_name, last_name, '-', match_first_name, match_last_name)

You can implement fuzzy matching obtaining best match ratio (using max()) returned by difflib.SequenceMatcher().

To implement this we should pass lambda as key argument which will return match ratio. In my example I'd use SequenceMatcher.ratio(), but if performance is important you should also try with SequenceMatcher.quick_ratio() and SequenceMatcher.real_quick_ratio().

from difflib import SequenceMatcher

lst = [('Jimmy', 'Smith'), ('James', 'Wilson'), ('Hugh', 'Laurie')]
first_name = 'Jimm'
last_name = 'Smitn'

matcher = SequenceMatcher(a=first_name + ' ' + last_name)
match_first_name, match_last_name = max(lst,
    key=lambda x: matcher.set_seq2(' '.join(x)) or matcher.ratio())

print(first_name, last_name, '-', match_first_name, match_last_name)
说不完的你爱 2025-02-13 02:01:19

另一个可能的路径是使用集合交叉点。

names = [('Jimmy', 'Smith'), ('James', 'Wilson'), ('Hugh', 'Laurie')]
first_name = "Jimm"
last_name = "Smitn"

setf = set(first_name)
# {'m', 'i', 'J'}
setl = set(last_name)
# {'t', 'n', 'm', 'i', 'S'}

ranked = [(len(setf & set(f)) + len(setl & set(l)), f, l) for f, l in names]
# [(7, 'Jimmy', 'Smith'), (4, 'James', 'Wilson'), (1, 'Hugh', 'Laurie')]

best_match = max(ranked, key=lambda x: x[0])[1:]
# ('Jimmy', 'Smith')

Another possible path would be to use set intersections.

names = [('Jimmy', 'Smith'), ('James', 'Wilson'), ('Hugh', 'Laurie')]
first_name = "Jimm"
last_name = "Smitn"

setf = set(first_name)
# {'m', 'i', 'J'}
setl = set(last_name)
# {'t', 'n', 'm', 'i', 'S'}

ranked = [(len(setf & set(f)) + len(setl & set(l)), f, l) for f, l in names]
# [(7, 'Jimmy', 'Smith'), (4, 'James', 'Wilson'), (1, 'Hugh', 'Laurie')]

best_match = max(ranked, key=lambda x: x[0])[1:]
# ('Jimmy', 'Smith')
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文