我试图从我的数据中找到相似的句子,我的代码给出的输出基本上对相似的句子进行排名,如 RANK 1、2 和 3,其中 Rank 1 将是高度相似的句子。我用 BM25 找到了这个
例如: 句子 1:“这个人穿着一件红色衬衫
排名 1:“这个男孩穿着一件红色衬衫”
排名 2:“这个男孩穿着一件衬衫”
排名 3:“这个女孩穿着一件裙子”
I还想知道相似度分数以了解句子的相似程度,需要帮助!
I am trying to find the similar sentences from my data and my code gives me an output that basically ranks the similar sentences like RANK 1, 2 and 3 where Rank 1 will be the highly similar sentence. I used BM25 to find this out
For example: Sentence 1: "The person is wearing a red-shirt
Rank 1 : "the boy is wearing a red shirt"
Rank 2 : "the boy is wearing a shirt"
Rank 3 : "the girl is wearing a dress"
I would also want to know the similarity score to find out how similar the sentences are. Would need help there!
发布评论
评论(1)
您可以使用
difflib
中的SequenceMatcher
输出
或者
您可以使用 thefuzz 库
或者
您可以使用 jellyfish 库
您可以在此链接下找到大多数文本相似度方法及其计算方式:https://github.com/luozhouyang/python-string-similarity#python-string-similarity
You can use
SequenceMatcher
fromdifflib
Output
Or
You can use thefuzz library
Or
You can use jellyfish library
You can find most of the text similarity methods and how they are calculated under this link: https://github.com/luozhouyang/python-string-similarity#python-string-similarity