比较两个字符串并返回最相似的一个

发布于 2024-12-21 09:07:13 字数 652 浏览 0 评论 0 原文

我必须编写一个函数，该函数接受一个字符串作为参数，并将该字符串与其他两个字符串进行比较，并返回最相似的字符串和差异数。

def func("LUMB"):
    lst=["JIBM", "NUNE", "NUMB"]
should return:
("NUMB",1)

我尝试过：

def f(word):
    lst=["JIBM", "NUNE", "NUMB"]
    for i in lst:
        d=k(word, lst)
        return differences
        for n in d:
            print min(sum(n))

where:

def k(word1, word2):
    L=[]
    for w in range(len(word1)):
        if word1[w] != word2[w]:
            L.append(1)
        else:
            L.append(0)
    return L

这样我就可以得到一个列表，例如，[1,0,0,0] if word1="NUMB" and word2="LUMB"

原文

I have to write a function that takes a string as argument and compair this string to two other strings and return the string most similar and the number of differences.

def func("LUMB"):
    lst=["JIBM", "NUNE", "NUMB"]
should return:
("NUMB",1)

I have tried:

def f(word):
    lst=["JIBM", "NUNE", "NUMB"]
    for i in lst:
        d=k(word, lst)
        return differences
        for n in d:
            print min(sum(n))

where:

def k(word1, word2):
    L=[]
    for w in range(len(word1)):
        if word1[w] != word2[w]:
            L.append(1)
        else:
            L.append(0)
    return L

so that i get a list of eg, [1,0,0,0] if word1="NUMB" and word2="LUMB"

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

醉殇 2024-12-28 09:07:13

看起来 Shawn Chin 提供了最好的解决方案，但是如果您被阻止使用非内置模块，则似乎 get_close_matches 可能会有所帮助：

import difflib
difflib.get_close_matches("LUMB", ["JIBM", "NUNE", "NUMB"], 1)

可以使用 SequenceMatcher 并使用它的返回值。

Looks like Shawn Chin has provided the best solution, but if you're prevented from using non-builtin modules, it seems like get_close_matches from difflib might help:

import difflib
difflib.get_close_matches("LUMB", ["JIBM", "NUNE", "NUMB"], 1)

The number of differences can be gotten using the get_opcodes method of SequenceMatcher and working with its return value.

回复收藏 0 原文

三岁铭 2024-12-28 09:07:13

使用 pylevenshtein 计算 Levenshtein distance：

>>> from Levenshtein import distance
>>> from operator import itemgetter
>>> lst = ["JIBM", "NUNE", "NUMB"]
>>> min([(x, distance("LUMB", x)) for x in lst], key=itemgetter(1))
('NUMB', 1)

或者，作为函数：

from Levenshtein import distance
from operator import itemgetter
def closest(word, lst):
    return min([(x, distance(word, x)) for x in lst], key=itemgetter(1))

print closest("NUMB", ["JIBM", "NUNE", "NUMB"])

ps 如果您想避免额外的依赖关系，您始终可以实现自己的函数来计算距离。例如，wikibook 中提出了多个版本，每个版本都有自己的优点和缺点缺点。

但是，如果性能是一个问题，请考虑坚持使用定制模块。除了 pylevenshtein 之外，还有 python-levenshtein 和 nltk.metrics.distance （如果您碰巧已经使用NLTK）。

Using pylevenshtein to calculate Levenshtein distance:

>>> from Levenshtein import distance
>>> from operator import itemgetter
>>> lst = ["JIBM", "NUNE", "NUMB"]
>>> min([(x, distance("LUMB", x)) for x in lst], key=itemgetter(1))
('NUMB', 1)

Or, as a function:

from Levenshtein import distance
from operator import itemgetter
def closest(word, lst):
    return min([(x, distance(word, x)) for x in lst], key=itemgetter(1))

print closest("NUMB", ["JIBM", "NUNE", "NUMB"])

p.s. If you want to avoid additional dependencies, you could always implement your own function for calculating the distance. For example, several version are proposed in wikibooks each with their own pros and cons.

However, if performance is a concern, do consider sticking to the custom built modules. Apart from pylevenshtein, there's also python-levenshtein and nltk.metrics.distance (if you happen to already use NLTK).