词干提取和旅鼠词

发布于 2025-01-17 08:18:44 字数 1468 浏览 4 评论 0原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

何以笙箫默 2025-01-24 08:18:45

我真的不明白你想在列表推导中做什么,所以我只会写下我会如何做:

from nltk import WordNetLemmatizer, SnowballStemmer

lemmatizer = WordNetLemmatizer()
stemmer = SnowballStemmer("english")


def find_roots(token_list, n):
    token = token_list[n]
    stem = stemmer.stem(token)
    lemma = lemmatizer.lemmatize(token)
    return {"original": token, "stem": stem, "lemma": lemma}


roots_dict = find_roots(["said", "talked", "walked"], n=2)
print(roots_dict)
> {'original': 'walked', 'stem': 'walk', 'lemma': 'walked'}

I really don't understand what you are trying to do in the list comprehensions, so I'll just write how I would do it:

from nltk import WordNetLemmatizer, SnowballStemmer

lemmatizer = WordNetLemmatizer()
stemmer = SnowballStemmer("english")


def find_roots(token_list, n):
    token = token_list[n]
    stem = stemmer.stem(token)
    lemma = lemmatizer.lemmatize(token)
    return {"original": token, "stem": stem, "lemma": lemma}


roots_dict = find_roots(["said", "talked", "walked"], n=2)
print(roots_dict)
> {'original': 'walked', 'stem': 'walk', 'lemma': 'walked'}
琉璃梦幻 2025-01-24 08:18:45

您可以使用 spacy 执行您想要的操作,如下所示:(在许多情况下 spacy 的性能比 nltk 更好。)

# $ pip install -U spacy

import spacy
from nltk import WordNetLemmatizer, SnowballStemmer

sp = spacy.load('en_core_web_sm')
lemmatizer = WordNetLemmatizer()
stemmer = SnowballStemmer("english")


words = ['compute', 'computer', 'computed', 'computing', 'said', 'talked', 'walked']
for word in words:
    print(f'Orginal Word : {word}')
    print(f'Stemmer with nltk : {stemmer.stem(word)}')
    print(f'Lemmatization with nltk : {lemmatizer.lemmatize(word)}')
    
    sp_word = sp(word)
    print(f'Lemmatization with spacy : {sp_word[0].lemma_}')

输出:

Orginal Word : compute
Stemmer with nltk : comput
Lemmatization with nltk : compute
Lemmatization with spacy : compute
Orginal Word : computer
Stemmer with nltk : comput
Lemmatization with nltk : computer
Lemmatization with spacy : computer
Orginal Word : computed
Stemmer with nltk : comput
Lemmatization with nltk : computed
Lemmatization with spacy : compute
Orginal Word : computing
Stemmer with nltk : comput
Lemmatization with nltk : computing
Lemmatization with spacy : compute
Orginal Word : said
Stemmer with nltk : said
Lemmatization with nltk : said
Lemmatization with spacy : say
Orginal Word : talked
Stemmer with nltk : talk
Lemmatization with nltk : talked
Lemmatization with spacy : talk
Orginal Word : walked
Stemmer with nltk : walk
Lemmatization with nltk : walked
Lemmatization with spacy : walk

You can do what you want with spacy like below: (In many cases spacy performs better than nltk.)

# $ pip install -U spacy

import spacy
from nltk import WordNetLemmatizer, SnowballStemmer

sp = spacy.load('en_core_web_sm')
lemmatizer = WordNetLemmatizer()
stemmer = SnowballStemmer("english")


words = ['compute', 'computer', 'computed', 'computing', 'said', 'talked', 'walked']
for word in words:
    print(f'Orginal Word : {word}')
    print(f'Stemmer with nltk : {stemmer.stem(word)}')
    print(f'Lemmatization with nltk : {lemmatizer.lemmatize(word)}')
    
    sp_word = sp(word)
    print(f'Lemmatization with spacy : {sp_word[0].lemma_}')

Output:

Orginal Word : compute
Stemmer with nltk : comput
Lemmatization with nltk : compute
Lemmatization with spacy : compute
Orginal Word : computer
Stemmer with nltk : comput
Lemmatization with nltk : computer
Lemmatization with spacy : computer
Orginal Word : computed
Stemmer with nltk : comput
Lemmatization with nltk : computed
Lemmatization with spacy : compute
Orginal Word : computing
Stemmer with nltk : comput
Lemmatization with nltk : computing
Lemmatization with spacy : compute
Orginal Word : said
Stemmer with nltk : said
Lemmatization with nltk : said
Lemmatization with spacy : say
Orginal Word : talked
Stemmer with nltk : talk
Lemmatization with nltk : talked
Lemmatization with spacy : talk
Orginal Word : walked
Stemmer with nltk : walk
Lemmatization with nltk : walked
Lemmatization with spacy : walk
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文