如何使用SequenceMatcher查找两个字符串之间的相似性？

发布于 2024-10-14 21:49:14 字数 214 浏览 1 评论 0原文

import difflib

a='abcd'
b='ab123'
seq=difflib.SequenceMatcher(a=a.lower(),b=b.lower())
seq=difflib.SequenceMatcher(a,b)
d=seq.ratio()*100
print d

我使用了上面的代码，但获得的输出是0.0。我怎样才能得到有效的答案？

原文

import difflib

a='abcd'
b='ab123'
seq=difflib.SequenceMatcher(a=a.lower(),b=b.lower())
seq=difflib.SequenceMatcher(a,b)
d=seq.ratio()*100
print d

I used the above code but obtained output is 0.0. How can I get a valid answer?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

留一抹残留的笑 2024-10-21 21:49:14

您忘记了 SequenceMatcher 的第一个参数。

>>> import difflib
>>> 
>>> a='abcd'
>>> b='ab123'
>>> seq=difflib.SequenceMatcher(None, a,b)
>>> d=seq.ratio()*100
>>> print d
44.4444444444

http://docs.python.org/library/difflib.html

You forgot the first parameter to SequenceMatcher.

>>> import difflib
>>> 
>>> a='abcd'
>>> b='ab123'
>>> seq=difflib.SequenceMatcher(None, a,b)
>>> d=seq.ratio()*100
>>> print d
44.4444444444

http://docs.python.org/library/difflib.html

回复收藏 0 原文

谁人与我共长歌 2024-10-21 21:49:14

来自文档：

SequenceMatcher 类具有以下构造函数：
class difflib.SequenceMatcher(isjunk=None, a='', b='', autojunk=True)

代码中的问题是，通过这样做，

seq=difflib.SequenceMatcher(a,b)

您将 a 作为 的值传递isjunk 和 b 作为 a 的值，为 b 保留默认的 '' 值。这会导致比率为 0.0。

克服这个问题的一种方法（Lennart 已经提到过）是显式传递 None 作为额外的第一个参数，以便为所有关键字参数分配正确的值。

然而我刚刚发现，并想提及另一个解决方案，它不触及 isjunk 参数，而是使用 set_seqs() 方法来指定不同的序列。

>>> import difflib
>>> a = 'abcd'
>>> b = 'ab123'
>>> seq = difflib.SequenceMatcher()
>>> seq.set_seqs(a.lower(), b.lower())
>>> d = seq.ratio()*100
>>> print d
44.44444444444444

From the docs:

The SequenceMatcher class has this constructor:
class difflib.SequenceMatcher(isjunk=None, a='', b='', autojunk=True)

The problem in your code is that by doing

seq=difflib.SequenceMatcher(a,b)

you are passing a as value for isjunk and b as value for a, leaving the default '' value for b. This results in a ratio of 0.0.

One way to overcome this (already mentioned by Lennart) is to explicitly pass None as extra first parameter so all the keyword arguments get assigned the correct values.

However I just found, and wanted to mention another solution, that doesn't touch the isjunk argument but uses the set_seqs() method to specify the different sequences.

>>> import difflib
>>> a = 'abcd'
>>> b = 'ab123'
>>> seq = difflib.SequenceMatcher()
>>> seq.set_seqs(a.lower(), b.lower())
>>> d = seq.ratio()*100
>>> print d
44.44444444444444

回复收藏 0 原文

~没有更多了~