是否存在算法来帮助检测“主要主题”?一个英文句子?
我试图找出是否有一种已知的算法可以检测句子的“关键概念”。
用例如下:
- 用户输入一个句子作为查询(鸡肉味道像火鸡吗?)
- 我们的系统识别句子的概念(鸡肉,火鸡)
- ,并且它对我们的语料库内容进行搜索
我们正在搜索的区域缺乏的是确定句子的核心“主题”到底是什么。 “鸡肉味道像火鸡吗”这句话的主要主题是“鸡肉”,因为用户正在询问鸡肉的味道。而“火鸡”则是一个不太重要的辅助话题。
所以...我试图找出是否有一种算法可以帮助我识别句子的主要主题...如果您知道的话请告诉我!
I'm trying to find out if there is a known algorithm that can detect the "key concept" of a sentence.
The use case is as follows:
- User enters a sentence as a query (Does chicken taste like turkey?)
- Our system identifies the concepts of the sentence (chicken, turkey)
- And it runs a search of our corpus content
The area that we're lacking in is identifying what the core "topic" of the sentence is really about. The sentence "Does chicken taste like turkey" has a primary topic of "chicken", because the user is asking about the taste of chicken. While "turkey" is a helper topic of less importance.
So... I'm trying to find out if there is an algorithm that will help me identify the primary topic of a sentence... Let me know if you are aware of any!!!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
发布评论
评论(12)
大多数基本的 NLP 解析技术将能够提取句子的基本方面 - 即,chicken 和 turkey 是 NP,它们通过形容词“like”等联系起来。将它们转化为“主题”或“概念” ” 是更困难的
技术,例如潜在语义分析及其许多衍生技术将这些信息转换为向量(有些方法在某些部分保留词性之间的层次结构/关系),然后将它们与现有的、通常预先分类的进行比较概念,向量。请参阅 http://en.wikipedia.org/wiki/Latent_semantic_analysis 开始使用。
编辑这是一个示例 LSA 应用程序,您可以试用一下,看看您是否想进一步研究它。 http://lsi.research.telcordia.com/lsi/demos.html
如果您愿意花钱,http://www.connexor.com/ 应该是能够对包括英语在内的多种语言进行这种类型的语义分析。我从未直接使用过他们的产品,因此无法评论其效果如何。
本月的麻省理工学院计算语言学期刊上有一篇关于解析名词短语的文章:http://www.mitpressjournals.org/doi/pdf/10.1162/COLI_a_00076
您应该查看 Google 的 Cloud Natural Language API。这是他们的 NLP 服务。
一个选择是首先研究类似的内容:
http://www.abisource。 com/projects/link-grammar/
但是如何从这些链接中导出主题本身就是另一个问题。但由于 Abiword 试图检测语法问题,您可能可以使用它来确定主题。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
事实上,我对此做了一个研究项目,并赢得了两项比赛,并且正在参加全国比赛。
该方法有两个步骤:
例如,“I ate pie”有 2 个名词:“I”和“pie”。查看解析树,“pie”位于动词短语内部,因此它不能是主语。然而,“I”仅位于类 NP 成分的内部。作为唯一的候选主题,它就是主题。在 http://www.candlemind.com 上找到此程序的早期副本。请注意,词汇仅限于基本单数词,并且没有动词变形,因此它有“man”但没有“men”,有“eat”但没有“ate”。另外,我使用的CFG是手工制作的限量版。我将很快更新这个程序。
不管怎样,这个程序有限制。我的导师指出,在目前的状态下,它无法识别主语为“真正的”NP(语法实际上称为 NP)的句子。例如,“月亮是平的不再是一个争论。”主题实际上是“月亮是平的”。然而,该程序会将“月亮”识别为主题。我很快就会解决这个问题。
不管怎样,这对于大多数句子来说已经足够了……
我的研究论文也可以在那里找到。请转至第 11 页阅读方法。
希望这有帮助。
I actually did a research project on this and won two competitions and am competing in nationals.
There are two steps to the method:
For example, "I ate pie" has 2 nouns: "I" and "pie". Looking at the parse tree, "pie" is inside of a Verb Phrase, so it cannot be a subject. "I", however, is only inside of NP-like constituents. being the only subject candidate, it is the subject. Find an early copy of this program on http://www.candlemind.com. Note that the vocabulary is limited to basic singular words, and there are no verb conjugations, so it has "man" but not "men", has "eat" but not "ate." Also, the CFG I used was hand-made an limited. I will be updating this program shortly.
Anyway, there are limitations to this program. My mentor pointed out in its currents state, it cannot recognize sentences with subjects that are "real" NPs (what grammar actually calls NPs). For example, "that the moon is flat is not a debate any longer." The subject is actually "that the moon is flat." However, the program would recognize "moon" as the subject. I will be fixing this shortly.
Anyway, this is good enough for most sentences...
My research paper can be found there too. Go to page 11 of it to read the methods.
Hope this helps.