基于匹配的umigrams的列表中的编辑大元

发布于 2025-02-03 21:22:01 字数 1046 浏览 0 评论 0 原文

从一系列bigrams中,我需要编辑至少一个 至少匹配至少一个学期的大型杂物。

这两个列表

,'数据可视化']

bigram_list = ['计算机视觉' visio','可视化']

目标

cleaned_bigrams = ['数据可视化']

我尝试了

我尝试调整这种方法在这里,但失败了:来自Python 3.x的另一个列表的单独列表

我也尝试过,但无法正常工作:

我试图从一个我试图适应一个, 从而摆脱列表中的umigrams-python 我提出过的上一个问题,但无法做到这一点:创建基于标记的Pandas DataFrame中出现的特定大型图片的新布尔字段

在此先感谢您提供的任何帮助,如果您认为这是一个很好的问题,请欣赏upvote!

From a list of bigrams, I need to redact bigrams that do not have at least one term that exactly matches at least one term in a list of unigrams.

The Two Lists

bigram_list = ['computer vision', 'data excellence', 'data visualization']

unigram_list = ['excel', 'tableau', 'visio', 'visualization']

The Objective

cleaned_bigrams = ['data visualization']

What I've Tried

I tried adapting this approach here, but failed: Removing separate list of items from another list in Python 3.x

I also tried this, but couldn't get it to work: Get rid of unigrams in a list if contained within bigrams or trigrams python

I tried to adapt from a previous question I asked, but couldn't get that going: Create new boolean fields based on specific bigrams appearing in a tokenized pandas dataframe

Thanks in advance for any help you can provide, and would appreciate an upvote if you think this is a good question!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

浊酒尽余欢 2025-02-10 21:22:01

这是一种方法:

bigram_list = ["computer vision", "data excellence", "data visualization"]
unigram_list = ["excel", "tableau", "visio", "visualization"]

# Init a dict for counting number of match
counts = {key: 0 for key in bigram_list}

# Count number of match for each bigram
for big in bigram_list:
    for uni in unigram_list:
        if uni in big.split(" "):
            counts[big] += 1

# Filter
cleaned_bigrams = [item for item in bigram_list if counts[item] > 0]
print(cleaned_bigrams)
# Output
['data visualization']

Here is one way to do it:

bigram_list = ["computer vision", "data excellence", "data visualization"]
unigram_list = ["excel", "tableau", "visio", "visualization"]

# Init a dict for counting number of match
counts = {key: 0 for key in bigram_list}

# Count number of match for each bigram
for big in bigram_list:
    for uni in unigram_list:
        if uni in big.split(" "):
            counts[big] += 1

# Filter
cleaned_bigrams = [item for item in bigram_list if counts[item] > 0]
print(cleaned_bigrams)
# Output
['data visualization']
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文