如何从 Wordnet 中获取按出现概率排序的同义词

发布于 2024-09-09 04:32:49 字数 232 浏览 4 评论 0原文

我正在 Wordnet 中搜索一大堆单词的同义词。按照我的方式,当某个单词有多个同义词时,结果按字母顺序返回。我需要的是让它们按出现概率排序,并且我只取前 1 个同义词。

我使用了prolog wordnet数据库和Syns2Index将其转换为Lucene类型索引来查询同义词。有没有办法让它们按这种方式按概率排序,或者我应该使用另一种方法?

速度并不重要,同义词查找不会在线完成。

I am searching in Wordnet for synonyms for a big list of words. The way I have it done it, when some word has more than one synonym, the results are returned in alphabetical order. What I need is to have them ordered by their probability of occurrence, and I would take just the top 1 synonym.

I have used the prolog wordnet database and Syns2Index to convert it into Lucene type index for querying synonyms. Is there a way to get them ordered by their probabilities in this way, or I should use another approach?

Speed not important, this synonym lookup will not be done online.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

〃安静 2024-09-16 04:32:49

如果有人偶然发现这个线程,这就是要走的路(至少是我需要的):

http://lyle.smu.edu/~tspell/jaws/doc/edu/smu/tspell/wordnet /impl/file/ReferenceSynset.html#getTagCount%28java.lang.String%29

tagCount 方法给出每个单词最可能的同义词集组。问题又是概率最高的同义词集可以有几个单词。但我想没有机会避免这个

In case someone stumbles upon this thread, this was the way to go(at least what i needed):

http://lyle.smu.edu/~tspell/jaws/doc/edu/smu/tspell/wordnet/impl/file/ReferenceSynset.html#getTagCount%28java.lang.String%29

tagCount method gives the most likely synset group for every word. The problem again is that synset with highes probability again can have several words. But i guess theres no chance to avoid this

清醇 2024-09-16 04:32:49

我认为你应该再做一步(前提是速度并不重要)。

从 Lucene 索引中,您应该构建另一个字典,其中每个单词都映射到一个小对象,该对象包含唯一的同义词,其含义具有更高的出现概率,其含义和出现概率。即,给定以下代码:

class Synonym {
public:
    String name;
    double probability;
    String meaning;
}

Map<String, Synonym> m = new HashMap<String, Synonym>();

...您只需从 Lucene 索引中填充它即可。

I think that you should do another step (provided that speed is not important).

From the Lucene index, you should build another dictionary in which each word is mapped to a small object that contains the only synonym that its meaning has higher probability of appearance, its meaning, and probability of appearance. I.e., given this code:

class Synonym {
public:
    String name;
    double probability;
    String meaning;
}

Map<String, Synonym> m = new HashMap<String, Synonym>();

... you just have to fill it from the Lucene index.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文