查找句子之间的相似度分数

发布于 2025-01-14 09:16:03 字数 229 浏览 1 评论 0 原文

我试图从我的数据中找到相似的句子,我的代码给出的输出基本上对相似的句子进行排名,如 RANK 1、2 和 3,其中 Rank 1 将是高度相似的句子。我用 BM25 找到了这个 例如: 句子 1:“这个人穿着一件红色衬衫

排名 1:“这个男孩穿着一件红色衬衫”

排名 2:“这个男孩穿着一件衬衫”

排名 3:“这个女孩穿着一件裙子”

I还想知道相似度分数以了解句子的相似程度,需要帮助!

I am trying to find the similar sentences from my data and my code gives me an output that basically ranks the similar sentences like RANK 1, 2 and 3 where Rank 1 will be the highly similar sentence. I used BM25 to find this out
For example: Sentence 1: "The person is wearing a red-shirt

Rank 1 : "the boy is wearing a red shirt"

Rank 2 : "the boy is wearing a shirt"

Rank 3 : "the girl is wearing a dress"

I would also want to know the similarity score to find out how similar the sentences are. Would need help there!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

抱着落日 2025-01-21 09:16:03

您可以使用 difflib 中的 SequenceMatcher

from difflib import SequenceMatcher
s = SequenceMatcher(None, "the boy is wearing a red shirt", "the boy is wearing a shirt")
print(s.ratio())

输出

0.9285714285714286 # 1 being max

或者

您可以使用 thefuzz

fuzz.ratio("the boy is wearing a red shirt", "the boy is wearing a shirt") # 100 being max

或者

您可以使用 jellyfish

import jellyfish
jellyfish.levenshtein_distance(u'jellyfish', u'smellyfish') # 2

jellyfish.jaro_distance(u'jellyfish', u'smellyfish') # 0.89629629629629

jellyfish.damerau_levenshtein_distance(u'jellyfish', u'jellyfihs') # 1

您可以在此链接下找到大多数文本相似度方法及其计算方式:https://github.com/luozhouyang/python-string-similarity#python-string-similarity

You can use SequenceMatcher from difflib

from difflib import SequenceMatcher
s = SequenceMatcher(None, "the boy is wearing a red shirt", "the boy is wearing a shirt")
print(s.ratio())

Output

0.9285714285714286 # 1 being max

Or

You can use thefuzz library

fuzz.ratio("the boy is wearing a red shirt", "the boy is wearing a shirt") # 100 being max

Or

You can use jellyfish library

import jellyfish
jellyfish.levenshtein_distance(u'jellyfish', u'smellyfish') # 2

jellyfish.jaro_distance(u'jellyfish', u'smellyfish') # 0.89629629629629

jellyfish.damerau_levenshtein_distance(u'jellyfish', u'jellyfihs') # 1

You can find most of the text similarity methods and how they are calculated under this link: https://github.com/luozhouyang/python-string-similarity#python-string-similarity

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文