如何查看两个字符串列表的匹配百分比？

发布于 2025-02-09 18:57:31 字数 300 浏览 3 评论 0 原文

我是Python的初学者。在这里，我在比较两个列表时遇到了问题。我的第一个问题是不应准确比较列表。但是应该将其与其他列表匹配 70％，如果存在，则返回true。包含（）方法在这种情况下无济于事。这是我的清单：

TotalTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow", "vitamin-A"] #etc
LikedTags = ["citrus", "orange", "vitamin-D"]

原文

I am a beginner at python. Here I had a problem with comparing two lists. My first problem is the list should not be compared exactly. But It should be compared about 70% matching with other list and return true if exist. contains() method doesn't help in this case. Here is my list:

TotalTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow", "vitamin-A"] #etc
LikedTags = ["citrus", "orange", "vitamin-D"]

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

苍白女子 2025-02-16 18:57:31

Sahil Desai的答案中的FuzzyWuzzy图书馆看起来真的很简单。

这是具有基本功能的想法。

TotalTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow", "vitamin-A"] #etc
LikedTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow"]

print(len(set(TotalTags).intersection(set(LikedTags))) / len(TotalTags))  # 0.8333333
print(sum([True for x in TotalTags if x in LikedTags]) / len(TotalTags))  # 0.8333333

fuzzywuzzy library in Sahil Desai's answer looks really simple.

Here is an idea with basic functions.

TotalTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow", "vitamin-A"] #etc
LikedTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow"]

print(len(set(TotalTags).intersection(set(LikedTags))) / len(TotalTags))  # 0.8333333
print(sum([True for x in TotalTags if x in LikedTags]) / len(TotalTags))  # 0.8333333

回复收藏 0 原文

花开浅夏 2025-02-16 18:57:31

您可以利用FuzzyWuzzy Python库，

from fuzzywuzzy import fuzz

TotalTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow", "vitamin-A"] #etc
LikedTags = ["citrus", "orange", "vitamin-D"]


per = fuzz.ratio(TotalTags,LikedTags)
per

 65

此方法如果您只想匹配项目，则可以直接匹配两个列表的字符，然后可以使用Jaccard相似方法。

you can utilizes fuzzywuzzy python library

from fuzzywuzzy import fuzz

TotalTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow", "vitamin-A"] #etc
LikedTags = ["citrus", "orange", "vitamin-D"]


per = fuzz.ratio(TotalTags,LikedTags)
per

 65

This method directly match the characters of the two list if you want to just match the items then you can used Jaccard similarity method.

回复收藏 0 原文

何以畏孤独 2025-02-16 18:57:31

您可以使用 difflib.Sequecemecematcher 并从下面的两个列表中找到每个两个单词之间的相似性：（输出仅显示两个具有相似性＆gt; 70％的单词）

from difflib import SequenceMatcher
TotalTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow", "vitamin-A"]
LikedTags = ["citrus", "orange", "vitamin-D"]
for a in LikedTags:
    for b in TotalTags:
        sim = SequenceMatcher(None, a, b).ratio()
        if sim > 0.7:
            print(f'similarity of {a} & {b} : {sim}')

输出：

similarity of citrus & citrus : 1.0
similarity of orange & orange : 1.0
similarity of vitamin-D & vitamin-C : 0.8888888888888888
similarity of vitamin-D & vitamin-A : 0.8888888888888888

You can use difflib.SequenceMatcher and find similarity between each two word from two list like below: (Output only shows two words that have similarity > 70%)

from difflib import SequenceMatcher
TotalTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow", "vitamin-A"]
LikedTags = ["citrus", "orange", "vitamin-D"]
for a in LikedTags:
    for b in TotalTags:
        sim = SequenceMatcher(None, a, b).ratio()
        if sim > 0.7:
            print(f'similarity of {a} & {b} : {sim}')

Output:

similarity of citrus & citrus : 1.0
similarity of orange & orange : 1.0
similarity of vitamin-D & vitamin-C : 0.8888888888888888
similarity of vitamin-D & vitamin-A : 0.8888888888888888

回复收藏 0 原文

岁月蹉跎了容颜 2025-02-16 18:57:31

您还可以使用hundin 代码> 模块

import collections
TotalTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow", "vitamin-A"] #etc
LikedTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow"]
c = collections.Counter(TotalTags)
c.subtract(LinkedTags)
print(1-c.total()/len(TotalTags))

输出：

0.8333333333333334

you can also do something like this with the builtin collections module

import collections
TotalTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow", "vitamin-A"] #etc
LikedTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow"]
c = collections.Counter(TotalTags)
c.subtract(LinkedTags)
print(1-c.total()/len(TotalTags))

output: