如何查看两个字符串列表的匹配百分比?

发布于 2025-02-09 18:57:31 字数 300 浏览 3 评论 0 原文

我是Python的初学者。在这里,我在比较两个列表时遇到了问题。我的第一个问题是不应准确比较列表。但是应该将其与其他列表匹配 70%,如果存在,则返回true。 包含()方法在这种情况下无济于事。这是我的清单:

TotalTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow", "vitamin-A"] #etc
LikedTags = ["citrus", "orange", "vitamin-D"]

I am a beginner at python. Here I had a problem with comparing two lists. My first problem is the list should not be compared exactly. But It should be compared about 70% matching with other list and return true if exist. contains() method doesn't help in this case. Here is my list:

TotalTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow", "vitamin-A"] #etc
LikedTags = ["citrus", "orange", "vitamin-D"]

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

苍白女子 2025-02-16 18:57:31

Sahil Desai的答案中的FuzzyWuzzy图书馆看起来真的很简单。

这是具有基本功能的想法。

TotalTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow", "vitamin-A"] #etc
LikedTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow"]

print(len(set(TotalTags).intersection(set(LikedTags))) / len(TotalTags))  # 0.8333333
print(sum([True for x in TotalTags if x in LikedTags]) / len(TotalTags))  # 0.8333333

fuzzywuzzy library in Sahil Desai's answer looks really simple.

Here is an idea with basic functions.

TotalTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow", "vitamin-A"] #etc
LikedTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow"]

print(len(set(TotalTags).intersection(set(LikedTags))) / len(TotalTags))  # 0.8333333
print(sum([True for x in TotalTags if x in LikedTags]) / len(TotalTags))  # 0.8333333
花开浅夏 2025-02-16 18:57:31

您可以利用FuzzyWuzzy Python库,

from fuzzywuzzy import fuzz

TotalTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow", "vitamin-A"] #etc
LikedTags = ["citrus", "orange", "vitamin-D"]


per = fuzz.ratio(TotalTags,LikedTags)
per

 65

此方法如果您只想匹配项目,则可以直接匹配两个列表的字符,然后可以使用Jaccard相似方法。

you can utilizes fuzzywuzzy python library

from fuzzywuzzy import fuzz

TotalTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow", "vitamin-A"] #etc
LikedTags = ["citrus", "orange", "vitamin-D"]


per = fuzz.ratio(TotalTags,LikedTags)
per

 65

This method directly match the characters of the two list if you want to just match the items then you can used Jaccard similarity method.

何以畏孤独 2025-02-16 18:57:31

您可以使用 difflib.Sequecemecematcher 并从下面的两个列表中找到每个两个单词之间的相似性:(输出仅显示两个具有相似性> 70%的单词)

from difflib import SequenceMatcher
TotalTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow", "vitamin-A"]
LikedTags = ["citrus", "orange", "vitamin-D"]
for a in LikedTags:
    for b in TotalTags:
        sim = SequenceMatcher(None, a, b).ratio()
        if sim > 0.7:
            print(f'similarity of {a} & {b} : {sim}')

输出:

similarity of citrus & citrus : 1.0
similarity of orange & orange : 1.0
similarity of vitamin-D & vitamin-C : 0.8888888888888888
similarity of vitamin-D & vitamin-A : 0.8888888888888888

You can use difflib.SequenceMatcher and find similarity between each two word from two list like below: (Output only shows two words that have similarity > 70%)

from difflib import SequenceMatcher
TotalTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow", "vitamin-A"]
LikedTags = ["citrus", "orange", "vitamin-D"]
for a in LikedTags:
    for b in TotalTags:
        sim = SequenceMatcher(None, a, b).ratio()
        if sim > 0.7:
            print(f'similarity of {a} & {b} : {sim}')

Output:

similarity of citrus & citrus : 1.0
similarity of orange & orange : 1.0
similarity of vitamin-D & vitamin-C : 0.8888888888888888
similarity of vitamin-D & vitamin-A : 0.8888888888888888
岁月蹉跎了容颜 2025-02-16 18:57:31

您还可以使用hundin 代码> 模块

import collections
TotalTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow", "vitamin-A"] #etc
LikedTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow"]
c = collections.Counter(TotalTags)
c.subtract(LinkedTags)
print(1-c.total()/len(TotalTags))

输出:

0.8333333333333334

you can also do something like this with the builtin collections module

import collections
TotalTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow", "vitamin-A"] #etc
LikedTags = ["citrus", "orange", "vitamin-C", "sweet", "yellow"]
c = collections.Counter(TotalTags)
c.subtract(LinkedTags)
print(1-c.total()/len(TotalTags))

output:

0.8333333333333334
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文