使用 WordNet 查找同义词、定义和例句
我需要获取一个包含一个单词的输入文本文件。然后我需要使用 wordnet 找到该词的引理名称、定义和同义词集示例。我已经阅读了这本书:“Python Text Handling with NLTK 2.0 Cookbook”和“Natural Language Processing using NLTK”来帮助我朝这个方向发展。尽管我已经了解如何使用终端来完成此操作,但我无法使用文本编辑器执行相同的操作。
例如,如果输入文本包含“flabbergasted”一词,则输出需要采用以下方式:
flabbergasted (动词)大吃一惊,大吃一惊,大吃一惊——惊讶不已; “这令人难以置信!” (形容词)目瞪口呆,目瞪口呆,目瞪口呆,目瞪口呆,目瞪口呆,惊呆了,目瞪口呆——好像因惊讶和惊讶而哑口无言; “一圈警察站在那里,因为她否认亲眼目睹了这起事故”; “目瞪口呆的市议员哑口无言”; “他晋升的消息令他震惊”
同义词集、定义和例句直接从 WordNet 获取!
我有以下代码:
from __future__ import division
import nltk
from nltk.corpus import wordnet as wn
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
fp = open("inpsyn.txt")
data = fp.read()
#to tokenize input text into sentences
print '\n-----\n'.join(tokenizer.tokenize(data))# splits text into sentences
#to tokenize the tokenized sentences into words
tokens = nltk.wordpunct_tokenize(data)
text = nltk.Text(tokens)
words = [w.lower() for w in text]
print words #to print the tokens
for a in words:
print a
syns = wn.synsets(a)
print "synsets:", syns
for s in syns:
for l in s.lemmas:
print l.name
print s.definition
print s.examples
我得到以下输出:
flabbergasted
['flabbergasted']
flabbergasted
synsets: [Synset('flabbergast.v.01'), Synset('dumbfounded.s.01')]
flabbergast
boggle
bowl_over
overcome with amazement
['This boggles the mind!']
dumbfounded
dumfounded
flabbergasted
stupefied
thunderstruck
dumbstruck
dumbstricken
as if struck dumb with astonishment and surprise
['a circle of policement stood dumbfounded by her denial of having seen the accident', 'the flabbergasted aldermen were speechless', 'was thunderstruck by the news of his promotion']
有没有办法检索词性以及引理名称组?
I need to take an input text file with a one word. I then need to find the lemma_names, definition and examples of the synset of the word using wordnet. I have gone through the book : "Python Text Processing with NLTK 2.0 Cookbook" and also "Natural Language Processing using NLTK" to help me in this direction. Though I have understood how this can be done using the terminal, I'm not able to do the same using a text editor.
For example, if the input text has the word "flabbergasted", the output needs to be in this fashion:
flabbergasted
(verb) flabbergast, boggle, bowl over - overcome with amazement ; "This boggles the mind!"
(adjective) dumbfounded , dumfounded , flabbergasted , stupefied , thunderstruck , dumbstruck , dumbstricken - as if struck dumb with astonishment and surprise; "a circle of policement stood dumbfounded by her denial of having seen the accident"; "the flabbergasted aldermen were speechless"; "was thunderstruck by the news of his promotion"
The synsets, definitions and example sentences are obtained from WordNet directly!
I have the following piece of code:
from __future__ import division
import nltk
from nltk.corpus import wordnet as wn
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
fp = open("inpsyn.txt")
data = fp.read()
#to tokenize input text into sentences
print '\n-----\n'.join(tokenizer.tokenize(data))# splits text into sentences
#to tokenize the tokenized sentences into words
tokens = nltk.wordpunct_tokenize(data)
text = nltk.Text(tokens)
words = [w.lower() for w in text]
print words #to print the tokens
for a in words:
print a
syns = wn.synsets(a)
print "synsets:", syns
for s in syns:
for l in s.lemmas:
print l.name
print s.definition
print s.examples
I get the following output:
flabbergasted
['flabbergasted']
flabbergasted
synsets: [Synset('flabbergast.v.01'), Synset('dumbfounded.s.01')]
flabbergast
boggle
bowl_over
overcome with amazement
['This boggles the mind!']
dumbfounded
dumfounded
flabbergasted
stupefied
thunderstruck
dumbstruck
dumbstricken
as if struck dumb with astonishment and surprise
['a circle of policement stood dumbfounded by her denial of having seen the accident', 'the flabbergasted aldermen were speechless', 'was thunderstruck by the news of his promotion']
Is there a way to retrieve the part of speech along with the group of lemma names?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
不返回任何内容,因此默认情况下您会得到
None
您应该编写
Extracting lemma name:
doesn't return anything so by default you get
None
you should write
Extracting lemma names:
在这里,我创建了一个可以轻松使用(导入)的模块,并且将字符串传递给它,将返回该字符串的所有引理单词。
模块:
用法:
注意:模块名称是 lemma.py 因此“from lemma import lemmalist”
干杯!
Here I have created a module which can easily be used(imported), and with a string being passed to it, will return all the lemma words of the string.
Module:
Usage:
Note: module name is lemma.py hence "from lemma import lemmalist"
Cheers!
在
NLTK 3.0
中,lemma_names
已从属性到方法。因此,如果您收到错误消息:您可以使用以下命令修复它:
这将输出:
In
NLTK 3.0
,lemma_names
has been changed from attribute to method. So if you get an error saying:You can fix it using:
This will output: