使用 WordNet 查找同义词、定义和例句

发布于 2024-10-30 02:05:57 字数 1760 浏览 11 评论 0原文

我需要获取一个包含一个单词的输入文本文件。然后我需要使用 wordnet 找到该词的引理名称、定义和同义词集示例。我已经阅读了这本书：“Python Text Handling with NLTK 2.0 Cookbook”和“Natural Language Processing using NLTK”来帮助我朝这个方向发展。尽管我已经了解如何使用终端来完成此操作，但我无法使用文本编辑器执行相同的操作。

例如，如果输入文本包含“flabbergasted”一词，则输出需要采用以下方式：

flabbergasted （动词）大吃一惊，大吃一惊，大吃一惊——惊讶不已； “这令人难以置信！” （形容词）目瞪口呆，目瞪口呆，目瞪口呆，目瞪口呆，目瞪口呆，惊呆了，目瞪口呆——好像因惊讶和惊讶而哑口无言； “一圈警察站在那里，因为她否认亲眼目睹了这起事故”； “目瞪口呆的市议员哑口无言”； “他晋升的消息令他震惊”

同义词集、定义和例句直接从 WordNet 获取！

我有以下代码：


from __future__ import division
import nltk
from nltk.corpus import wordnet as wn


tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
fp = open("inpsyn.txt")
data = fp.read()

#to tokenize input text into sentences

print '\n-----\n'.join(tokenizer.tokenize(data))# splits text into sentences

#to tokenize the tokenized sentences into words

tokens = nltk.wordpunct_tokenize(data)
text = nltk.Text(tokens)
words = [w.lower() for w in text]  
print words     #to print the tokens

for a in words:
    print a

syns = wn.synsets(a)
print "synsets:", syns

for s in syns:
    for l in s.lemmas:
        print l.name
    print s.definition
    print s.examples

我得到以下输出：


flabbergasted

['flabbergasted']
flabbergasted
synsets: [Synset('flabbergast.v.01'), Synset('dumbfounded.s.01')]
flabbergast
boggle
bowl_over
overcome with amazement
['This boggles the mind!']
dumbfounded
dumfounded
flabbergasted
stupefied
thunderstruck
dumbstruck
dumbstricken
as if struck dumb with astonishment and surprise
['a circle of policement stood dumbfounded by her denial of having seen the accident', 'the flabbergasted aldermen were speechless', 'was thunderstruck by the news of his promotion']

有没有办法检索词性以及引理名称组？

原文

I need to take an input text file with a one word. I then need to find the lemma_names, definition and examples of the synset of the word using wordnet. I have gone through the book : "Python Text Processing with NLTK 2.0 Cookbook" and also "Natural Language Processing using NLTK" to help me in this direction. Though I have understood how this can be done using the terminal, I'm not able to do the same using a text editor.

For example, if the input text has the word "flabbergasted", the output needs to be in this fashion:

flabbergasted
(verb) flabbergast, boggle, bowl over - overcome with amazement ; "This boggles the mind!"
(adjective) dumbfounded , dumfounded , flabbergasted , stupefied , thunderstruck , dumbstruck , dumbstricken - as if struck dumb with astonishment and surprise; "a circle of policement stood dumbfounded by her denial of having seen the accident"; "the flabbergasted aldermen were speechless"; "was thunderstruck by the news of his promotion"

The synsets, definitions and example sentences are obtained from WordNet directly!

I have the following piece of code:


from __future__ import division
import nltk
from nltk.corpus import wordnet as wn


tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
fp = open("inpsyn.txt")
data = fp.read()

#to tokenize input text into sentences

print '\n-----\n'.join(tokenizer.tokenize(data))# splits text into sentences

#to tokenize the tokenized sentences into words

tokens = nltk.wordpunct_tokenize(data)
text = nltk.Text(tokens)
words = [w.lower() for w in text]  
print words     #to print the tokens

for a in words:
    print a

syns = wn.synsets(a)
print "synsets:", syns

for s in syns:
    for l in s.lemmas:
        print l.name
    print s.definition
    print s.examples

I get the following output:


flabbergasted

['flabbergasted']
flabbergasted
synsets: [Synset('flabbergast.v.01'), Synset('dumbfounded.s.01')]
flabbergast
boggle
bowl_over
overcome with amazement
['This boggles the mind!']
dumbfounded
dumfounded
flabbergasted
stupefied
thunderstruck
dumbstruck
dumbstricken
as if struck dumb with astonishment and surprise
['a circle of policement stood dumbfounded by her denial of having seen the accident', 'the flabbergasted aldermen were speechless', 'was thunderstruck by the news of his promotion']

Is there a way to retrieve the part of speech along with the group of lemma names?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

白龙吟 2024-11-06 02:05:57

def synset(word):
    wn.synsets(word)

不返回任何内容，因此默认情况下您会得到 None

您应该编写

def synset(word):
    return wn.synsets(word)

Extracting lemma name：

from nltk.corpus import wordnet
syns = wordnet.synsets('car')
syns[0].lemmas[0].name
>>> 'car'
[s.lemmas[0].name for s in syns]
>>> ['car', 'car', 'car', 'car', 'cable_car']


[l.name for s in syns for l in s.lemmas]
>>>['car', 'auto', 'automobile', 'machine', 'motorcar', 'car', 'railcar', 'railway_car', 'railroad_car', 'car', 'gondola', 'car', 'elevator_car', 'cable_car', 'car']

def synset(word):
    wn.synsets(word)

doesn't return anything so by default you get None

you should write

def synset(word):
    return wn.synsets(word)

Extracting lemma names:

from nltk.corpus import wordnet
syns = wordnet.synsets('car')
syns[0].lemmas[0].name
>>> 'car'
[s.lemmas[0].name for s in syns]
>>> ['car', 'car', 'car', 'car', 'cable_car']


[l.name for s in syns for l in s.lemmas]
>>>['car', 'auto', 'automobile', 'machine', 'motorcar', 'car', 'railcar', 'railway_car', 'railroad_car', 'car', 'gondola', 'car', 'elevator_car', 'cable_car', 'car']

回复收藏 0 原文

寒尘 2024-11-06 02:05:57

在这里，我创建了一个可以轻松使用（导入）的模块，并且将字符串传递给它，将返回该字符串的所有引理单词。

模块：

#!/usr/bin/python2.7
''' pass a string to this funciton ( eg 'car') and it will give you a list of
words which is related to cat, called lemma of CAT. '''
from nltk.corpus import wordnet as wn
import sys
#print all the synset element of an element
def lemmalist(str):
    syn_set = []
    for synset in wn.synsets(str):
        for item in synset.lemma_names:
            syn_set.append(item)
    return syn_set

用法：

注意：模块名称是 lemma.py 因此“from lemma import lemmalist”

>>> from lemma import lemmalist
>>> lemmalist('car')
['car', 'auto', 'automobile', 'machine', 'motorcar', 'car', 'railcar', 'railway_car', 'railroad_car', 'car', 'gondola', 'car', 'elevator_car', 'cable_car', 'car']

干杯！

Here I have created a module which can easily be used(imported), and with a string being passed to it, will return all the lemma words of the string.

Module:

#!/usr/bin/python2.7
''' pass a string to this funciton ( eg 'car') and it will give you a list of
words which is related to cat, called lemma of CAT. '''
from nltk.corpus import wordnet as wn
import sys
#print all the synset element of an element
def lemmalist(str):
    syn_set = []
    for synset in wn.synsets(str):
        for item in synset.lemma_names:
            syn_set.append(item)
    return syn_set

Usage:

Note: module name is lemma.py hence "from lemma import lemmalist"

>>> from lemma import lemmalist
>>> lemmalist('car')
['car', 'auto', 'automobile', 'machine', 'motorcar', 'car', 'railcar', 'railway_car', 'railroad_car', 'car', 'gondola', 'car', 'elevator_car', 'cable_car', 'car']

Cheers!

回复收藏 0 原文

掩饰不了的爱 2024-11-06 02:05:57

synonyms = []
for syn in wordnet.synsets("car"):
    for l in syn.lemmas():
        synonyms.append(l.name())
print synonyms

synonyms = []
for syn in wordnet.synsets("car"):
    for l in syn.lemmas():
        synonyms.append(l.name())
print synonyms

回复收藏 0 原文

空城旧梦 2024-11-06 02:05:57

在 NLTK 3.0 中，lemma_names 已从属性到方法。因此，如果您收到错误消息：

TypeError: 'method' object is not iterable

您可以使用以下命令修复它：

>>> from nltk.corpus import wordnet as wn
>>> [item for sysnet in wn.synsets('car') for item in sysnet.lemma_names()]

这将输出：

>>> [
       'car', 'auto', 'automobile', 'machine', 'motorcar', 'car', 
       'railcar', 'railway_car', 'railroad_car', 'car', 'gondola', 
       'car', 'elevator_car', 'cable_car', 'car'
    ]

In NLTK 3.0, lemma_names has been changed from attribute to method. So if you get an error saying:

TypeError: 'method' object is not iterable

You can fix it using:

>>> from nltk.corpus import wordnet as wn
>>> [item for sysnet in wn.synsets('car') for item in sysnet.lemma_names()]

This will output:

>>> [
       'car', 'auto', 'automobile', 'machine', 'motorcar', 'car', 
       'railcar', 'railway_car', 'railroad_car', 'car', 'gondola', 
       'car', 'elevator_car', 'cable_car', 'car'
    ]

回复收藏 0 原文

~没有更多了~