查找句子之间的相似度分数

发布于 2025-01-14 09:16:03 字数 229 浏览 1 评论 0 原文

我试图从我的数据中找到相似的句子，我的代码给出的输出基本上对相似的句子进行排名，如 RANK 1、2 和 3，其中 Rank 1 将是高度相似的句子。我用 BM25 找到了这个例如：句子 1：“这个人穿着一件红色衬衫

排名 1：“这个男孩穿着一件红色衬衫”

排名 2：“这个男孩穿着一件衬衫”

排名 3：“这个女孩穿着一件裙子”

I还想知道相似度分数以了解句子的相似程度，需要帮助！

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

抱着落日 2025-01-21 09:16:03

您可以使用 difflib 中的 SequenceMatcher

from difflib import SequenceMatcher
s = SequenceMatcher(None, "the boy is wearing a red shirt", "the boy is wearing a shirt")
print(s.ratio())

输出

0.9285714285714286 # 1 being max

或者

您可以使用 thefuzz 库

fuzz.ratio("the boy is wearing a red shirt", "the boy is wearing a shirt") # 100 being max

或者

您可以使用 jellyfish 库

import jellyfish
jellyfish.levenshtein_distance(u'jellyfish', u'smellyfish') # 2

jellyfish.jaro_distance(u'jellyfish', u'smellyfish') # 0.89629629629629

jellyfish.damerau_levenshtein_distance(u'jellyfish', u'jellyfihs') # 1

您可以在此链接下找到大多数文本相似度方法及其计算方式：https://github.com/luozhouyang/python-string-similarity#python-string-similarity

You can use SequenceMatcher from difflib

from difflib import SequenceMatcher
s = SequenceMatcher(None, "the boy is wearing a red shirt", "the boy is wearing a shirt")
print(s.ratio())

Output

0.9285714285714286 # 1 being max

You can use thefuzz library

fuzz.ratio("the boy is wearing a red shirt", "the boy is wearing a shirt") # 100 being max

You can use jellyfish library

import jellyfish
jellyfish.levenshtein_distance(u'jellyfish', u'smellyfish') # 2

jellyfish.jaro_distance(u'jellyfish', u'smellyfish') # 0.89629629629629

jellyfish.damerau_levenshtein_distance(u'jellyfish', u'jellyfihs') # 1

You can find most of the text similarity methods and how they are calculated under this link: https://github.com/luozhouyang/python-string-similarity#python-string-similarity

回复收藏 0 原文

~没有更多了~