NLTK Python 中的词义消歧
我是 NLTK Python 的新手,我正在寻找一些可以进行词义消歧的示例应用程序。我在搜索结果中有很多算法,但没有示例应用程序。我只是想传递一句话,并想通过参考wordnet库来了解每个单词的含义。 谢谢,
我在 PERL 中找到了类似的模块。 http://marimba.d.umn.edu/allwords/allwords.html NLTK Python 中是否存在这样的模块?
I am new to NLTK Python and i am looking for some sample application which can do word sense disambiguation. I have got a lot of algorithms in search results but not a sample application. I just want to pass a sentence and want to know the sense of each word by referring to wordnet library.
Thanks
I have found a similar module in PERL. http://marimba.d.umn.edu/allwords/allwords.html
Is there such module present in NLTK Python?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
是的,可以使用 NLTK 中的 wordnet 模块。
您帖子中提到的工具中使用的相似性度量也存在于 NLTK wordnet 模块中。
Yes it is possible with the wordnet module in NLTK.
The similarity mesures which used in the tool which mentioned in your post exists in NLTK wordnet module too.
NLTK 有访问 Wordnet 的 API。 Wordnet 将单词作为同义词集。这将为您提供有关该单词、其上位词、下位词、词根等的一些信息。
《Python Text Handling with NLTK 2.0 Cookbook》是一本好书,可帮助您开始了解 NLTK 的各种功能。它易于阅读、理解和实施。
另外,您还可以查看其他论文(NLTK 领域之外),其中讨论了使用维基百科进行词义消歧。
NLTK has apis to access Wordnet. Wordnet places words as synsets. This would give you some information on the word, its hypernyms, hyponyms, root word etc.
"Python Text Processing with NLTK 2.0 Cookbook" is a good book to get you started on various features of NLTK. It is easy to read, understand and implement.
Also, you can look at other papers(outside the realm of NLTK) which talks about using wikipedia for word sense disambiguation.
作为对OP请求的实际回答,这里是几个WSD方法的python实现,它以NLTK的同义词集的形式返回意义,https://github.com/alvations/pywsd
它包含
它可以这样使用:
[out]:
As a practical answer to the OP's request, here's a python implementation of several WSD methods that returns senses in form of NLTK's synset(s), https://github.com/alvations/pywsd
It includes
It can be used as such:
[out]:
请参阅http: //jaganadhg.freeflux.net/blog/archive/2010/10/16/wordnet-sense-similarity-with-nltk-some-basics.html
Refer http://jaganadhg.freeflux.net/blog/archive/2010/10/16/wordnet-sense-similarity-with-nltk-some-basics.html
是的,事实上,NLTK 团队写了一本书,其中有多章关于分类和它们明确介绍了如何使用 WordNet。您还可以从 Safari 购买该书的实体版。
仅供参考:NLTK 由自然语言编程学者编写,用于他们的入门编程课程。
Yes, in fact, there is a book that the NLTK team wrote which has multiple chapters on classification and they explicitly cover how to use WordNet. You can also buy a physical version of the book from Safari.
FYI: NLTK is written by natural language programming academics for use in their introductory programming courses.
最近,部分
pywsd
代码已被移植到wsd.py
模块中的NLTK
'的前沿版本,请尝试:为了更好WSD 性能,使用 pywsd 库而不是 NLTK 模块。一般来说,
pywsd
中的simple_lesk()
比NLTK
中的lesk
效果更好。当我有空时,我会尝试尽可能多地更新NLTK
模块。在回复 Chris Spencer 的评论时,请注意 Lesk 算法的局限性。我只是给出算法的准确实现。这不是灵丹妙药,http://en.wikipedia.org/wiki/Lesk_algorithm
另请注意,尽管:
没有给你正确的答案,你可以使用
pywsd
实现max_similarity()
:@Chris,如果你想要一个 python setup.py ,只需做一个礼貌的请求,我来写...
Recently, part of the
pywsd
code has been ported into the bleeding edge version ofNLTK
' in thewsd.py
module, try:For better WSD performance, use the
pywsd
library instead of theNLTK
module. In general,simple_lesk()
frompywsd
does better thanlesk
fromNLTK
. I'll try to update theNLTK
module as much as possible when I'm free.In responds to Chris Spencer's comment, please note the limitations of Lesk algorithms. I'm simply giving an accurate implementation of the algorithms. It's not a silver bullet, http://en.wikipedia.org/wiki/Lesk_algorithm
Also please note that, although:
don't give you the right answer, you can use
pywsd
implementation ofmax_similarity()
:@Chris, if you want a python setup.py , just do a polite request, i'll write it...