当前位置：文江博客话题详情

词干提取和旅鼠词

发布于 2025-01-17 08:18:44 字数 1468 浏览 4 评论 0原文

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

何以笙箫默 2025-01-24 08:18:45

我真的不明白你想在列表推导中做什么，所以我只会写下我会如何做：

from nltk import WordNetLemmatizer, SnowballStemmer

lemmatizer = WordNetLemmatizer()
stemmer = SnowballStemmer("english")


def find_roots(token_list, n):
    token = token_list[n]
    stem = stemmer.stem(token)
    lemma = lemmatizer.lemmatize(token)
    return {"original": token, "stem": stem, "lemma": lemma}


roots_dict = find_roots(["said", "talked", "walked"], n=2)
print(roots_dict)
> {'original': 'walked', 'stem': 'walk', 'lemma': 'walked'}

I really don't understand what you are trying to do in the list comprehensions, so I'll just write how I would do it:

from nltk import WordNetLemmatizer, SnowballStemmer

lemmatizer = WordNetLemmatizer()
stemmer = SnowballStemmer("english")


def find_roots(token_list, n):
    token = token_list[n]
    stem = stemmer.stem(token)
    lemma = lemmatizer.lemmatize(token)
    return {"original": token, "stem": stem, "lemma": lemma}


roots_dict = find_roots(["said", "talked", "walked"], n=2)
print(roots_dict)
> {'original': 'walked', 'stem': 'walk', 'lemma': 'walked'}

回复收藏 0 原文

琉璃梦幻 2025-01-24 08:18:45

您可以使用 spacy 执行您想要的操作，如下所示：（在许多情况下 spacy 的性能比 nltk 更好。）

# $ pip install -U spacy

import spacy
from nltk import WordNetLemmatizer, SnowballStemmer

sp = spacy.load('en_core_web_sm')
lemmatizer = WordNetLemmatizer()
stemmer = SnowballStemmer("english")


words = ['compute', 'computer', 'computed', 'computing', 'said', 'talked', 'walked']
for word in words:
    print(f'Orginal Word : {word}')
    print(f'Stemmer with nltk : {stemmer.stem(word)}')
    print(f'Lemmatization with nltk : {lemmatizer.lemmatize(word)}')
    
    sp_word = sp(word)
    print(f'Lemmatization with spacy : {sp_word[0].lemma_}')

输出：

Orginal Word : compute
Stemmer with nltk : comput
Lemmatization with nltk : compute
Lemmatization with spacy : compute
Orginal Word : computer
Stemmer with nltk : comput
Lemmatization with nltk : computer
Lemmatization with spacy : computer
Orginal Word : computed
Stemmer with nltk : comput
Lemmatization with nltk : computed
Lemmatization with spacy : compute
Orginal Word : computing
Stemmer with nltk : comput
Lemmatization with nltk : computing
Lemmatization with spacy : compute
Orginal Word : said
Stemmer with nltk : said
Lemmatization with nltk : said
Lemmatization with spacy : say
Orginal Word : talked
Stemmer with nltk : talk
Lemmatization with nltk : talked
Lemmatization with spacy : talk
Orginal Word : walked
Stemmer with nltk : walk
Lemmatization with nltk : walked
Lemmatization with spacy : walk

You can do what you want with spacy like below: (In many cases spacy performs better than nltk.)

# $ pip install -U spacy

import spacy
from nltk import WordNetLemmatizer, SnowballStemmer

sp = spacy.load('en_core_web_sm')
lemmatizer = WordNetLemmatizer()
stemmer = SnowballStemmer("english")


words = ['compute', 'computer', 'computed', 'computing', 'said', 'talked', 'walked']
for word in words:
    print(f'Orginal Word : {word}')
    print(f'Stemmer with nltk : {stemmer.stem(word)}')
    print(f'Lemmatization with nltk : {lemmatizer.lemmatize(word)}')
    
    sp_word = sp(word)
    print(f'Lemmatization with spacy : {sp_word[0].lemma_}')

Output:

Orginal Word : compute
Stemmer with nltk : comput
Lemmatization with nltk : compute
Lemmatization with spacy : compute
Orginal Word : computer
Stemmer with nltk : comput
Lemmatization with nltk : computer
Lemmatization with spacy : computer
Orginal Word : computed
Stemmer with nltk : comput
Lemmatization with nltk : computed
Lemmatization with spacy : compute
Orginal Word : computing
Stemmer with nltk : comput
Lemmatization with nltk : computing
Lemmatization with spacy : compute
Orginal Word : said
Stemmer with nltk : said
Lemmatization with nltk : said
Lemmatization with spacy : say
Orginal Word : talked
Stemmer with nltk : talk
Lemmatization with nltk : talked
Lemmatization with spacy : talk
Orginal Word : walked
Stemmer with nltk : walk
Lemmatization with nltk : walked
Lemmatization with spacy : walk

回复收藏 0 原文

~没有更多了~

关于作者

爱已欠费

暂无简介

文章

26 人气

关注发私信

牛↙奶布丁

文章 0 评论 0

关注

COSO

文章 0 评论 0

关注

落叶

文章 0 评论 0

关注

暗地喜欢

文章 0 评论 0

关注

qq_i8qOEG

文章 0 评论 0

关注

qq_Wl4Sbi

文章 0 评论 0

友情链接

文江博客

词干提取和旅鼠词

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

牛↙奶布丁

COSO

落叶

暗地喜欢

qq_i8qOEG

qq_Wl4Sbi

友情链接

词干提取和旅鼠词

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

牛↙奶布丁

COSO

落叶

暗地喜欢

qq_i8qOEG

qq_Wl4Sbi

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。