当前位置：文江博客话题详情

从句子列表中查找与示例句子具有相似相对含义的句子

发布于 2024-11-04 04:50:27 字数 450 浏览 3 评论 0原文

我希望能够找到具有相同含义的句子。我有一个查询句子，以及一长串数百万个其他句子。句子是单词，或者是一种特殊类型的单词，称为符号，它只是象征正在谈论的某个对象的单词类型。

例如，我的查询语句是：

示例：将 (x) 添加到 (y) 给出 (z)

我的数据库中可能存在一个句子列表，例如： 1. (x) 和 (y) 的总和为(z) 2. (x) 加 (y) 等于 (z) 3. (x) 乘以 (y) 不等于 (z) 4. (z) 是 (x) 和 (y) 之和

示例应该匹配我的数据库中的句子 1、2、4，但不匹配 3。此外，句子匹配应该有一定的权重。

它不仅仅是数学句子，它是任何可以根据单词含义与任何其他句子进行比较的句子。我需要某种方法来比较一个句子和许多其他句子，以找到具有最密切相关含义的句子。即根据句子的含义在句子之间进行映射。

谢谢！（该标签是语言设计的，因为我无法创建任何新标签）

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

稳稳的幸福 2024-11-11 04:50:28

没那么容易^^
您应该首先使用停用词过滤器，以从中去除不包含信息的单词。这里有一些不错的

然后你想处理同义词。这实际上是一个非常复杂的主题，因为你需要某种词义消歧才能做到这一点。大多数最先进的方法只比最简单的解决方案好一点。也就是说，您采用一个词最常用的含义。您可以使用 WordNet 来做到这一点。您可以获得一个单词的同义词集，其中包含所有同义词。然后，您可以概括该单词（称为上位词）并采用最常用的含义并用它替换搜索词。

顺便说一句，在 NLP 中处理同义词相当困难。如果你只是想处理不同的词形，例如添加和添加，你可以使用词干分析器，但没有词干分析器可以帮助你从添加到总和（wsd是唯一的方法）

然后你的句子中有不同的词序，如果您想要确切的答案（x+y=z 与 x+z=y 不同），也不应该忽略它。因此，您还需要单词依赖关系，以便您可以查看哪些单词相互依赖。如果您想使用英语，斯坦福解析器实际上是完成该任务的最佳选择。

也许您应该从句子中取出名词和动词，并对它们进行所有预处理，并询问搜索索引中的依赖关系。
依赖关系看起来像

x (sum, y)
y (sum, x)
sum (x, y)

您可以用于搜索的依赖关系，

因此您需要标记化、概括、获取依赖关系、过滤不重要的单词才能获得结果。如果你想用德语来做，你还需要一个单词分解器。

Not that easy ^^
You should use a stopword filter first, to get non-information-bearing words out of it. Here are some good ones

Then you wanna handle synonyms. Thats actually a really complex theme, cause you need some kind of word sense disambiguation to do it. And most state of the art methods are just a little bit better then the easiest solution. That would be, that you take the most used meaning of a word. That you can do with WordNet. You can get synsets for a word, where all synonyms are in it. Then you can generalize that word (its called a hyperonym) and take the most used meaning and replace the search term with it.

Just to say it, handling synonyms is pretty hard in NLP. If you just wanna handle different wordforms like add and adding for example, you could use a stemmer, but no stemmer would help you to get from add to sum (wsd is the only way there)

And then you have different word orderings in your sentences, which shouldnt be ignored aswell, if you want exact answers (x+y=z is different from x+z=y). So you need word dependencies aswell, so you can see which words depend on each other. The Stanford Parser is actually the best for that task if you wanna use english.

Perhaps you should just get nouns and verbs out of a sentence and make all the preprocessing on them and ask for the dependencies in your search index.
A dependency would look like

x (sum, y)
y (sum, x)
sum (x, y)

which you could use for ur search

So you need to tokenize, generalize, get dependencies, filter unimportant words to get your result. And if you wanna do it in german, you need a word decompounder aswell.

回复收藏 0 原文

左秋 2024-11-11 04:50:27

首先：你想要解决的是一个非常的难题。根据数据集中的内容，它可能是AI 完整的。

您需要您的程序知道或了解add、plus和sum引用相同的概念，而乘法 em> 是一个不同的概念。您可以通过测量 WordNet/FrameNet 中单词的同义词集之间的距离来做到这一点，但如果您不想找到乘法，则距离计算必须非常精细。否则，您可能需要手动建立一些单词概念映射（例如 {'add' : 'addition', 'plus' : 'addition', 'sum' : 'addition', 'times' : 'multiplication '}）。

如果您想要完整的句子语义，您还必须解析句子并从解析树/依赖图导出含义。 Stanford 解析器是一种流行的解析选择。

您还可以在问答研究中找到该问题的灵感。在那里，一种常见的方法是解析句子，然后将解析树的片段存储在索引中，并通过常见的搜索引擎技术（例如 tf-idf，在 Lucene 中实现）搜索它们。这也会给你每个句子的分数。