nltk.corpus.wordnet 的哪个相似度函数适合查找两个单词的相似度?

发布于 2024-12-04 10:54:00 字数 300 浏览 4 评论 0原文

nltk.corpus.wordnet 中的哪个相似度函数适合查找两个单词的相似度?

 path_similarity()?
    lch_similarity()?
    wup_similarity()?
    res_similarity()?
    jcn_similarity()?
    lin_similarity()?

我想使用单词聚类函数和yarowsky算法在大文本中查找相似的搭配

which similarity function in nltk.corpus.wordnet is Appropriate for find similarity of two words?

 path_similarity()?
    lch_similarity()?
    wup_similarity()?
    res_similarity()?
    jcn_similarity()?
    lin_similarity()?

I want use a function for word clustering and yarowsky algorightm for find similar collocation in a large text.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

仅此而已 2024-12-11 10:54:00

这些衡量标准实际上是针对词义(或概念)而不是词的。这种区别可能很重要。换句话说,“火车”一词可以表示“机车”或“被教导做某事”。要使用这些措施,您需要知道其意图是什么。

如果您想做单词聚类,这些措施可能并不完全是您想要的......

These measure are actually for word senses (or concepts) not words. That distinction might matter. In other words, the word "train" can mean "locomotive" or "being taught to do something". To use these measures you'd need to know which sense was intended.

If you want to do word clustering, these measures might not be exactly what you want...

浅黛梨妆こ 2024-12-11 10:54:00

我自己一直在使用 NLTK/wordnet,试图以某种自动方式匹配一些文本。正如 Ted Pedersen 的回答所指出的那样,很快就会清楚,nltk.corpus.wordnet 中的相似性函数只会为具有可靠 IS-A 谱系的非常密切相关的术语产生非零相似性。

我最终做的是获取文本中的词汇,然后使用 lemma->synset->lemmas 和 lemma->similar_tos 来生成我自己的单词链接图(graph_tool 对此非常棒),然后计算 链接 2 个单词所需的最小跳数它们之间的某种(不)相似性度量(将它们打印出来非常有趣;就像观看一个非常奇怪的单词联想游戏)。即使没有尝试考虑 POS/意义,这实际上也足以满足我的目的。

I've been playing with NLTK/wordnet myself for the purposes of trying to match up some texts in some automatic way. As Ted Pedersen's answer notes, it pretty quickly becomes clear that the similarity functions in nltk.corpus.wordnet only produce non-zero similarities for quite closely related terms with a solid IS-A pedigree.

What I ended up doing was taking the vocabulary in my texts, and then using lemma->synset->lemmas and lemma->similar_tos to grow my own word linkage graph (graph_tool fantastic for this) and then counting the minimum number of hops needed to link 2 words to get some sort of (dis-)similarity measure between them (quite entertaining to print these out; like watching a very bizarre word-association game). This did actually work well enough for my purposes even without any attempt to take POS/sense into account.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文