如何在给定偏移 ID 的情况下获取 WordNet 同义词集?

发布于 2024-12-14 20:14:20 字数 89 浏览 2 评论 0原文

我有一个 WordNet 同义词集偏移量(例如 id="n#05576222")。给定这个偏移量,我如何使用 Python 获取同义词集?

I have a WordNet synset offset (for example id="n#05576222"). Given this offset, how can I get the synset using Python?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

后eg是否自 2024-12-21 20:14:20

从 NLTK 3.2.3 开始,有一个公共方法可以执行此操作:

wordnet.synset_from_pos_and_offset(pos, offset)

在早期版本中,您可以使用:

wordnet._synset_from_pos_and_offset(pos, offset)

这会根据其 POS 和 offest ID 返回一个同义词集。我认为这种方法仅在 NLTK 3.0 中可用,但我不确定。

例子:

from nltk.corpus import wordnet as wn
wn.synset_from_pos_and_offset('n',4543158)
>> Synset('wagon.n.01')

As of NLTK 3.2.3, there's a public method for doing this:

wordnet.synset_from_pos_and_offset(pos, offset)

In earlier versions you can use:

wordnet._synset_from_pos_and_offset(pos, offset)

This returns a synset based on it's POS and offest ID. I think this method is only available in NLTK 3.0 but I'm not sure.

Example:

from nltk.corpus import wordnet as wn
wn.synset_from_pos_and_offset('n',4543158)
>> Synset('wagon.n.01')
话少心凉 2024-12-21 20:14:20

对于 NTLK 3.2.3 或更高版本,请参阅 donners45 的回答。

对于旧版本的 NLTK:

NLTK 中没有内置方法,但您可以使用以下方法

from nltk.corpus import wordnet

syns = list(wordnet.all_synsets())
offsets_list = [(s.offset(), s) for s in syns]
offsets_dict = dict(offsets_list)

offsets_dict[14204095]
>>> Synset('heatstroke.n.01')

:然后可以腌制字典并在需要时加载它。

对于 3.0 之前的 NLTK 版本,请将该行替换

offsets_list = [(s.offset(), s) for s in syns]

为,

offsets_list = [(s.offset, s) for s in syns]

因为在 NLTK 3.0 之前 offset 是属性而不是方法。

For NTLK 3.2.3 or newer, please see donners45's answer.

For older versions of NLTK:

There is no built-in method in the NLTK but you could use this:

from nltk.corpus import wordnet

syns = list(wordnet.all_synsets())
offsets_list = [(s.offset(), s) for s in syns]
offsets_dict = dict(offsets_list)

offsets_dict[14204095]
>>> Synset('heatstroke.n.01')

You can then pickle the dictionary and load it whenever you need it.

For NLTK versions prior to 3.0, replace the line

offsets_list = [(s.offset(), s) for s in syns]

with

offsets_list = [(s.offset, s) for s in syns]

since prior to NLTK 3.0 offset was an attribute instead of a method.

波浪屿的海角声 2024-12-21 20:14:20

您可以使用 of2ss() ,例如:

from nltk.corpus import wordnet as wn
syn = wn.of2ss('01580050a')

将返回
Synset('necessary.a.01')

You can use of2ss(), For example:

from nltk.corpus import wordnet as wn
syn = wn.of2ss('01580050a')

will return
Synset('necessary.a.01')

靑春怀旧 2024-12-21 20:14:20

除了使用 NLTK 之外,另一种选择是使用 Open Multilingual WordNet http://compling.hss.ntu.edu.sg/omw/ 用于普林斯顿 WordNet。通常我使用下面的方法来访问 wordnet 作为字典,以偏移量为键,以 ; 分隔字符串作为值:

# Gets first instance of matching key given a value and a dictionary.    
def getKey(dic, value):
  return [k for k,v.split(";") in dic.items() if v in value]

# Read Open Multi WN's .tab file
def readWNfile(wnfile, option="ss"):
  reader = codecs.open(wnfile, "r", "utf8").readlines()
  wn = {}
  for l in reader:
    if l[0] == "#": continue
    if option=="ss":
      k = l.split("\t")[0] #ss as key
      v = l.split("\t")[2][:-1] #word
    else:
      v = l.split("\t")[0] #ss as value
      k = l.split("\t")[2][:-1] #word as key
    try:
      temp = wn[k]
      wn[k] = temp + ";" + v
    except KeyError:
      wn[k] = v  
  return wn

princetonWN = readWNfile('wn-data-eng.tab')
offset = "n#05576222"
offset = offset.split('#')[1]+'-'+ offset.split('#')[0]

print princetonWN.split(";")
print getKey('heatstroke')

Other than using NLTK, another option would be to use the .tab file from the Open Multilingual WordNet http://compling.hss.ntu.edu.sg/omw/ for the Princeton WordNet. Normally i used the recipe below to access wordnet as a dictionary with offset as the key and ; delimited strings as a values:

# Gets first instance of matching key given a value and a dictionary.    
def getKey(dic, value):
  return [k for k,v.split(";") in dic.items() if v in value]

# Read Open Multi WN's .tab file
def readWNfile(wnfile, option="ss"):
  reader = codecs.open(wnfile, "r", "utf8").readlines()
  wn = {}
  for l in reader:
    if l[0] == "#": continue
    if option=="ss":
      k = l.split("\t")[0] #ss as key
      v = l.split("\t")[2][:-1] #word
    else:
      v = l.split("\t")[0] #ss as value
      k = l.split("\t")[2][:-1] #word as key
    try:
      temp = wn[k]
      wn[k] = temp + ";" + v
    except KeyError:
      wn[k] = v  
  return wn

princetonWN = readWNfile('wn-data-eng.tab')
offset = "n#05576222"
offset = offset.split('#')[1]+'-'+ offset.split('#')[0]

print princetonWN.split(";")
print getKey('heatstroke')
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文