比较两个字符串并返回最相似的一个
我必须编写一个函数,该函数接受一个字符串作为参数,并将该字符串与其他两个字符串进行比较,并返回最相似的字符串和差异数。
def func("LUMB"):
lst=["JIBM", "NUNE", "NUMB"]
should return:
("NUMB",1)
我尝试过:
def f(word):
lst=["JIBM", "NUNE", "NUMB"]
for i in lst:
d=k(word, lst)
return differences
for n in d:
print min(sum(n))
where:
def k(word1, word2):
L=[]
for w in range(len(word1)):
if word1[w] != word2[w]:
L.append(1)
else:
L.append(0)
return L
这样我就可以得到一个列表,例如,[1,0,0,0] if word1="NUMB" and word2="LUMB"
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
看起来 Shawn Chin 提供了最好的解决方案,但是如果您被阻止使用非内置模块,则似乎
get_close_matches
可能会有所帮助:可以使用
SequenceMatcher
并使用它的返回值。Looks like Shawn Chin has provided the best solution, but if you're prevented from using non-builtin modules, it seems like
get_close_matches
fromdifflib
might help:The number of differences can be gotten using the
get_opcodes
method ofSequenceMatcher
and working with its return value.使用 pylevenshtein 计算 Levenshtein distance:
或者,作为函数:
ps 如果您想避免额外的依赖关系,您始终可以实现自己的函数来计算距离。例如,wikibook 中提出了多个版本,每个版本都有自己的优点和缺点缺点。
但是,如果性能是一个问题,请考虑坚持使用定制模块。除了 pylevenshtein 之外,还有 python-levenshtein 和
nltk.metrics.distance
(如果您碰巧已经使用NLTK)。Using pylevenshtein to calculate Levenshtein distance:
Or, as a function:
p.s. If you want to avoid additional dependencies, you could always implement your own function for calculating the distance. For example, several version are proposed in wikibooks each with their own pros and cons.
However, if performance is a concern, do consider sticking to the custom built modules. Apart from pylevenshtein, there's also python-levenshtein and
nltk.metrics.distance
(if you happen to already use NLTK).