当前位置：文江博客话题详情

NLTK Python 中的词义消歧

发布于 2024-09-19 13:47:28 字数 290 浏览 6 评论 0原文

我是 NLTK Python 的新手，我正在寻找一些可以进行词义消歧的示例应用程序。我在搜索结果中有很多算法，但没有示例应用程序。我只是想传递一句话，并想通过参考wordnet库来了解每个单词的含义。谢谢，

我在 PERL 中找到了类似的模块。 http://marimba.d.umn.edu/allwords/allwords.html NLTK Python 中是否存在这样的模块？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

美羊羊 2024-09-26 13:50:24

是的，可以使用 NLTK 中的 wordnet 模块。
您帖子中提到的工具中使用的相似性度量也存在于 NLTK wordnet 模块中。

回复收藏 0 原文

最近可好 2024-09-26 13:50:06

NLTK 有访问 Wordnet 的 API。 Wordnet 将单词作为同义词集。这将为您提供有关该单词、其上位词、下位词、词根等的一些信息。

《Python Text Handling with NLTK 2.0 Cookbook》是一本好书，可帮助您开始了解 NLTK 的各种功能。它易于阅读、理解和实施。

另外，您还可以查看其他论文（NLTK 领域之外），其中讨论了使用维基百科进行词义消歧。

回复收藏 0 原文

守不住的情 2024-09-26 13:49:46

作为对OP请求的实际回答，这里是几个WSD方法的python实现，它以NLTK的同义词集的形式返回意义，https://github.com/alvations/pywsd

它包含

Lesk 算法（包括原始 Lesk、改编 Lesk 和简单 Lesk）
基线算法（随机感知、第一感知、最频繁感知）

它可以这样使用：

#!/usr/bin/env python -*- coding: utf-8 -*-

bank_sents = ['I went to the bank to deposit my money',
'The river bank was full of dead fishes']

plant_sents = ['The workers at the industrial plant were overworked',
'The plant was no longer bearing flowers']

print "======== TESTING simple_lesk ===========\n"
from lesk import simple_lesk
print "#TESTING simple_lesk() ..."
print "Context:", bank_sents[0]
answer = simple_lesk(bank_sents[0],'bank')
print "Sense:", answer
print "Definition:",answer.definition
print

print "#TESTING simple_lesk() with POS ..."
print "Context:", bank_sents[1]
answer = simple_lesk(bank_sents[1],'bank','n')
print "Sense:", answer
print "Definition:",answer.definition
print

print "#TESTING simple_lesk() with POS and stems ..."
print "Context:", plant_sents[0]
answer = simple_lesk(plant_sents[0],'plant','n', True)
print "Sense:", answer
print "Definition:",answer.definition
print

print "======== TESTING baseline ===========\n"
from baseline import random_sense, first_sense
from baseline import max_lemma_count as most_frequent_sense

print "#TESTING random_sense() ..."
print "Context:", bank_sents[0]
answer = random_sense('bank')
print "Sense:", answer
print "Definition:",answer.definition
print

print "#TESTING first_sense() ..."
print "Context:", bank_sents[0]
answer = first_sense('bank')
print "Sense:", answer
print "Definition:",answer.definition
print

print "#TESTING most_frequent_sense() ..."
print "Context:", bank_sents[0]
answer = most_frequent_sense('bank')
print "Sense:", answer
print "Definition:",answer.definition
print

[out]：

======== TESTING simple_lesk ===========

#TESTING simple_lesk() ...
Context: I went to the bank to deposit my money
Sense: Synset('depository_financial_institution.n.01')
Definition: a financial institution that accepts deposits and channels the money into lending activities

#TESTING simple_lesk() with POS ...
Context: The river bank was full of dead fishes
Sense: Synset('bank.n.01')
Definition: sloping land (especially the slope beside a body of water)

#TESTING simple_lesk() with POS and stems ...
Context: The workers at the industrial plant were overworked
Sense: Synset('plant.n.01')
Definition: buildings for carrying on industrial labor

======== TESTING baseline ===========
#TESTING random_sense() ...
Context: I went to the bank to deposit my money
Sense: Synset('deposit.v.02')
Definition: put into a bank account

#TESTING first_sense() ...
Context: I went to the bank to deposit my money
Sense: Synset('bank.n.01')
Definition: sloping land (especially the slope beside a body of water)

#TESTING most_frequent_sense() ...
Context: I went to the bank to deposit my money
Sense: Synset('bank.n.01')
Definition: sloping land (especially the slope beside a body of water)

As a practical answer to the OP's request, here's a python implementation of several WSD methods that returns senses in form of NLTK's synset(s), https://github.com/alvations/pywsd

It includes

Lesk algorithms (includes original Lesk, adapted Lesk and simple Lesk)
Baseline algorithms (random sense, first sense, Most Frequent Sense)

It can be used as such:

#!/usr/bin/env python -*- coding: utf-8 -*-

bank_sents = ['I went to the bank to deposit my money',
'The river bank was full of dead fishes']

plant_sents = ['The workers at the industrial plant were overworked',
'The plant was no longer bearing flowers']

print "======== TESTING simple_lesk ===========\n"
from lesk import simple_lesk
print "#TESTING simple_lesk() ..."
print "Context:", bank_sents[0]
answer = simple_lesk(bank_sents[0],'bank')
print "Sense:", answer
print "Definition:",answer.definition
print

print "#TESTING simple_lesk() with POS ..."
print "Context:", bank_sents[1]
answer = simple_lesk(bank_sents[1],'bank','n')
print "Sense:", answer
print "Definition:",answer.definition
print

print "#TESTING simple_lesk() with POS and stems ..."
print "Context:", plant_sents[0]
answer = simple_lesk(plant_sents[0],'plant','n', True)
print "Sense:", answer
print "Definition:",answer.definition
print

print "======== TESTING baseline ===========\n"
from baseline import random_sense, first_sense
from baseline import max_lemma_count as most_frequent_sense

print "#TESTING random_sense() ..."
print "Context:", bank_sents[0]
answer = random_sense('bank')
print "Sense:", answer
print "Definition:",answer.definition
print

print "#TESTING first_sense() ..."
print "Context:", bank_sents[0]
answer = first_sense('bank')
print "Sense:", answer
print "Definition:",answer.definition
print

print "#TESTING most_frequent_sense() ..."
print "Context:", bank_sents[0]
answer = most_frequent_sense('bank')
print "Sense:", answer
print "Definition:",answer.definition
print

[out]:

======== TESTING simple_lesk ===========

#TESTING simple_lesk() ...
Context: I went to the bank to deposit my money
Sense: Synset('depository_financial_institution.n.01')
Definition: a financial institution that accepts deposits and channels the money into lending activities

#TESTING simple_lesk() with POS ...
Context: The river bank was full of dead fishes
Sense: Synset('bank.n.01')
Definition: sloping land (especially the slope beside a body of water)

#TESTING simple_lesk() with POS and stems ...
Context: The workers at the industrial plant were overworked
Sense: Synset('plant.n.01')
Definition: buildings for carrying on industrial labor

======== TESTING baseline ===========
#TESTING random_sense() ...
Context: I went to the bank to deposit my money
Sense: Synset('deposit.v.02')
Definition: put into a bank account

#TESTING first_sense() ...
Context: I went to the bank to deposit my money
Sense: Synset('bank.n.01')
Definition: sloping land (especially the slope beside a body of water)

#TESTING most_frequent_sense() ...
Context: I went to the bank to deposit my money
Sense: Synset('bank.n.01')
Definition: sloping land (especially the slope beside a body of water)

回复收藏 0 原文

清晨说晚安 2024-09-26 13:49:26

请参阅http： //jaganadhg.freeflux.net/blog/archive/2010/10/16/wordnet-sense-similarity-with-nltk-some-basics.html

回复收藏 0 原文

怎樣才叫好 2024-09-26 13:49:03

是的，事实上，NLTK 团队写了一本书，其中有多章关于分类和它们明确介绍了如何使用 WordNet。您还可以从 Safari 购买该书的实体版。

仅供参考：NLTK 由自然语言编程学者编写，用于他们的入门编程课程。

回复收藏 0 原文

习ぎ惯性依靠 2024-09-26 13:48:36

最近，部分pywsd代码已被移植到wsd.py模块中的NLTK'的前沿版本，请尝试：

>>> from nltk.wsd import lesk
>>> sent = 'I went to the bank to deposit my money'
>>> ambiguous = 'bank'
>>> lesk(sent, ambiguous)
Synset('bank.v.04')
>>> lesk(sent, ambiguous).definition()
u'act as the banker in a game or in gambling'

为了更好WSD 性能，使用 pywsd 库而不是 NLTK 模块。一般来说，pywsd 中的 simple_lesk() 比 NLTK 中的 lesk 效果更好。当我有空时，我会尝试尽可能多地更新 NLTK 模块。

在回复 Chris Spencer 的评论时，请注意 Lesk 算法的局限性。我只是给出算法的准确实现。这不是灵丹妙药，http://en.wikipedia.org/wiki/Lesk_algorithm

另请注意，尽管：

lesk("My cat likes to eat mice.", "cat", "n")

没有给你正确的答案，你可以使用 pywsd 实现 max_similarity()：

>>> from pywsd.similarity import max_similiarity
>>> max_similarity('my cat likes to eat mice', 'cat', 'wup', pos='n').definition 
'feline mammal usually having thick soft fur and no ability to roar: domestic cats; wildcats'
>>> max_similarity('my cat likes to eat mice', 'cat', 'lin', pos='n').definition 
'feline mammal usually having thick soft fur and no ability to roar: domestic cats; wildcats'

@Chris，如果你想要一个 python setup.py ，只需做一个礼貌的请求，我来写...

Recently, part of the pywsd code has been ported into the bleeding edge version of NLTK' in the wsd.py module, try:

>>> from nltk.wsd import lesk
>>> sent = 'I went to the bank to deposit my money'
>>> ambiguous = 'bank'
>>> lesk(sent, ambiguous)
Synset('bank.v.04')
>>> lesk(sent, ambiguous).definition()
u'act as the banker in a game or in gambling'

For better WSD performance, use the pywsd library instead of the NLTK module. In general, simple_lesk() from pywsd does better than lesk from NLTK. I'll try to update the NLTK module as much as possible when I'm free.

In responds to Chris Spencer's comment, please note the limitations of Lesk algorithms. I'm simply giving an accurate implementation of the algorithms. It's not a silver bullet, http://en.wikipedia.org/wiki/Lesk_algorithm

Also please note that, although:

lesk("My cat likes to eat mice.", "cat", "n")

don't give you the right answer, you can use pywsd implementation of max_similarity():

>>> from pywsd.similarity import max_similiarity
>>> max_similarity('my cat likes to eat mice', 'cat', 'wup', pos='n').definition 
'feline mammal usually having thick soft fur and no ability to roar: domestic cats; wildcats'
>>> max_similarity('my cat likes to eat mice', 'cat', 'lin', pos='n').definition 
'feline mammal usually having thick soft fur and no ability to roar: domestic cats; wildcats'

@Chris, if you want a python setup.py , just do a polite request, i'll write it...

回复收藏 0 原文

~没有更多了~