是否存在算法来帮助检测“主要主题”?一个英文句子?

发布于 10-30 08:15 字数 304 浏览 7 评论 0原文

我试图找出是否有一种已知的算法可以检测句子的“关键概念”。

用例如下:

  1. 用户输入一个句子作为查询(鸡肉味道像火鸡吗?)
  2. 我们的系统识别句子的概念(鸡肉,火鸡)
  3. ,并且它对我们的语料库内容进行搜索

我们正在搜索的区域缺乏的是确定句子的核心“主题”到底是什么。 “鸡肉味道像火鸡吗”这句话的主要主题是“鸡肉”,因为用户正在询问鸡肉的味道。而“火鸡”则是一个不太重要的辅助话题。

所以...我试图找出是否有一种算法可以帮助我识别句子的主要主题...如果您知道的话请告诉我!

I'm trying to find out if there is a known algorithm that can detect the "key concept" of a sentence.

The use case is as follows:

  1. User enters a sentence as a query (Does chicken taste like turkey?)
  2. Our system identifies the concepts of the sentence (chicken, turkey)
  3. And it runs a search of our corpus content

The area that we're lacking in is identifying what the core "topic" of the sentence is really about. The sentence "Does chicken taste like turkey" has a primary topic of "chicken", because the user is asking about the taste of chicken. While "turkey" is a helper topic of less importance.

So... I'm trying to find out if there is an algorithm that will help me identify the primary topic of a sentence... Let me know if you are aware of any!!!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(12

下壹個目標2024-11-06 08:15:09

事实上,我对此做了一个研究项目,并赢得了两项比赛,并且正在参加全国比赛。

该方法有两个步骤

  1. 使用上下文无关语法解析句子
  2. 在生成的解析树中,查找所有名词仅从属于名词短语类成分

例如,“I ate pie”有 2 个名词:“I”和“pie”。查看解析树,“pie”位于动词短语内部,因此它不能是主语。然而,“I”仅位于类 NP 成分的内部。作为唯一的候选主题,它就是主题。在 http://www.candlemind.com 上找到此程序的早期副本。请注意,词汇仅限于基本单数词,并且没有动词变形,因此它有“man”但没有“men”,有“eat”但没有“ate”。另外,我使用的CFG是手工制作的限量版。我将很快更新这个程序。

不管怎样,这个程序有限制。我的导师指出,在目前的状态下,它无法识别主语为“真正的”NP(语法实际上称为 NP)的句子。例如,“月亮是平的不再是一个争论。”主题实际上是“月亮是平的”。然而,该程序会将“月亮”识别为主题。我很快就会解决这个问题。

不管怎样,这对于大多数句子来说已经足够了……

我的研究论文也可以在那里找到。请转至第 11 页阅读方法。

希望这有帮助。

I actually did a research project on this and won two competitions and am competing in nationals.

There are two steps to the method:

  1. Parse the sentence with a Context-Free Grammar
  2. In the resulting parse trees, find all nouns which are only subordinate to Noun-Phrase-like constituents

For example, "I ate pie" has 2 nouns: "I" and "pie". Looking at the parse tree, "pie" is inside of a Verb Phrase, so it cannot be a subject. "I", however, is only inside of NP-like constituents. being the only subject candidate, it is the subject. Find an early copy of this program on http://www.candlemind.com. Note that the vocabulary is limited to basic singular words, and there are no verb conjugations, so it has "man" but not "men", has "eat" but not "ate." Also, the CFG I used was hand-made an limited. I will be updating this program shortly.

Anyway, there are limitations to this program. My mentor pointed out in its currents state, it cannot recognize sentences with subjects that are "real" NPs (what grammar actually calls NPs). For example, "that the moon is flat is not a debate any longer." The subject is actually "that the moon is flat." However, the program would recognize "moon" as the subject. I will be fixing this shortly.

Anyway, this is good enough for most sentences...

My research paper can be found there too. Go to page 11 of it to read the methods.

Hope this helps.

神回复2024-11-06 08:15:09

大多数基本的 NLP 解析技术将能够提取句子的基本方面 - 即,chicken 和 turkey 是 NP,它们通过形容词“like”等联系起来。将它们转化为“主题”或“概念” ” 是更困难的

技术,例如潜在语义分析及其许多衍生技术将这些信息转换为向量(有些方法在某些部分保留词性之间的层次结构/关系),然后将它们与现有的、通常预先分类的进行比较概念,向量。请参阅 http://en.wikipedia.org/wiki/Latent_semantic_analysis 开始使用。

编辑这是一个示例 LSA 应用程序,您可以试用一下,看看您是否想进一步研究它。 http://lsi.research.telcordia.com/lsi/demos.html

Most of your basic NLP parsing techniques will be able to extract the basic aspects of the sentence - i.e., that chicken and turkey a NPs and they are linked by and adjective 'like', etc. Getting these to a 'topic' or 'concept' is more difficult

Technique such as Latent Semantic Analysis and its many derivatives transform this information into a vector (some have methods of retaining in some part the hierarchy/relations between parts of speech) and then compares them to existing, usually pre-classified by concept, vectors. See http://en.wikipedia.org/wiki/Latent_semantic_analysis to get started.

Edit Here's an example LSA app you can play around with to see if you might want to pursue it further . http://lsi.research.telcordia.com/lsi/demos.html

猛虎独行2024-11-06 08:15:09

对于许多较长的句子,很难说出到底什么是主题,而且可能有多个主题。

获得近似 an 的一种方法是

1.) 首先使用 openNLP、stanford Parser 或任何一种来标记句子。
2.) 然后从句子中删除所有停用词。
3.) 学习名词(专有名词、单数和复数)。

其他方法是

1.) 通过任何解析器将句子插入短语。
2.) 选出所有名词短语。
3.) 删除没有子代名词的名词短语。
4.) 仅保留形容词和名词,删除剩余名词短语中的所有单词。

这可能会给出大约。猜测。

For many longer sentences its difficult to say what exactly is a topic and also there may be more than one.

One way to get approximate ans is

1.) First tag the sentence using openNLP, stanford Parser or any one.
2.) Then remove all the stop words from the sentence.
3.) Pick up Nouns( proper, singular and plural).

Other way is

1.) chuck the sentence into phrases by any parser.
2.) Pick up all the noun phrases.
3.) Remove the Noun phrases that doesn't have the Nouns as a child.
4.) Keep only adjectives and Nouns, remove all words from remaining Noun Phrases.

This might give approx. guessing.

破晓2024-11-06 08:15:09

“关键概念”在语言学中并不是一个明确定义的术语,但这可能是一个起点:解析句子,在得到的解析树或依存结构中找到主语。 (这并不总是有效;例如,“正在下雨吗?”的主题是“它”,而关键概念可能是“雨”。另外,“意大利面条和千层面是同一件事吗?”中的关键概念是什么?”)

此类问题(NLP + 搜索)通过诸如 之类的方法处理更合适LSA,但这是一个相当高级的话题。

"Key concept" is not a well-defined term in linguistics, but this may be a starting point: parse the sentence, find the subject in the parse tree or dependency structure that you get. (This doesn't always work; for example, the subject of "Is it raining?" is "it", while the key concept is likely "rain". Also, what's the key concept in "Are spaghetti and lasagna the same thing?")

This kind of problem (NLP + search) is more properly dealt with by methods such as LSA, but that's quite an advanced topic.

岁吢2024-11-06 08:15:09

在最基本的层面上,英语问题通常采用 的形式。 <主题> ... ?<代词>; <动词> <主题> ...?。这绝不是一个好的算法,特别是考虑到主题可能跨越多个单词,但根据您需要的解决方案的复杂程度,它可能是一个有用的起点。

如果您需要精确度,请忽略此答案。

On the most basic level, a question in English is usually in the form of <verb> <subject> ... ? or <pronoun> <verb> <subject> ... ?. This is by no means a good algorithm, especially considering that the subject could span several words, but depending on how sophisticated a solution you need, it might be a useful starting point.

If you need precision, ignore this answer.

战皆罪2024-11-06 08:15:09

如果您愿意花钱,http://www.connexor.com/ 应该是能够对包括英语在内的多种语言进行这种类型的语义分析。我从未直接使用过他们的产品,因此无法评论其效果如何。

If you're willing to shell out money, http://www.connexor.com/ is supposed to be able to do this type of semantic analysis for a wide variety of languages, including English. I have never directly used their product, and so can't comment on how well it works.

近箐2024-11-06 08:15:09

本月的麻省理工学院计算语言学期刊上有一篇关于解析名词短语的文章:http://www.mitpressjournals.org/doi/pdf/10.1162/COLI_a_00076

There's an article about Parsing Noun Phrases in the MIT Computational Linguistics journal of this month: http://www.mitpressjournals.org/doi/pdf/10.1162/COLI_a_00076

旧夏天2024-11-06 08:15:09

复合句或复合句可能具有多个句子的关键概念。

您可以使用 stanfordNLP 或 MaltParser 来给出句子的依存结构。它还给出了词性标记,包括主语、动词、宾语等。

我认为大多数时候宾语将是句子的关键概念。

Compound or complex sentences may have more than one key concept of a sentence.

You can use stanfordNLP or MaltParser which can give the dependency structure of a sentence. It also gives the parts of speech tagging including subject, verb , object etc.

I think most of the times the object will be the key concept of the sentence.

眼眸印温柔2024-11-06 08:15:09

您应该查看 Google 的 Cloud Natural Language API。这是他们的 NLP 服务。

https://cloud.google.com/natural-language/

You should look at Google's Cloud Natural Language API. It's their NLP service.

https://cloud.google.com/natural-language/

多彩岁月2024-11-06 08:15:09

简单的解决方案是使用词性标注器(例如来自 Python 的 NLTK 库)来标记你的句子,然后找到与一些预定义的词性模式相匹配,可以清楚地看出句子的主语在哪里

Simple solution is to tag your sentence with part-of-speach tagger (e.g. from NLTK library for Python) then find matches with some predefined part-of-speach patterns in which it's clear where is main subject of the sentence

小傻瓜2024-11-06 08:15:09

一个选择是首先研究类似的内容:

http://www.abisource。 com/projects/link-grammar/

但是如何从这些链接中导出主题本身就是另一个问题。但由于 Abiword 试图检测语法问题,您可能可以使用它来确定主题。

One option is to look into something like this as a first step:

http://www.abisource.com/projects/link-grammar/

But how you derive the topic from these links is another problem in itself. But as Abiword is trying to detect grammatical problems, you might be able to use it to determine the topic.

你的笑2024-11-06 08:15:09

通过“主要主题”,您指的是句子的主题

可以通过自然语言处理理解句子来识别主题。

这个问题的答案与如何确定主语、宾语等词? - 这是目前尚未解决的问题。

By "primary topic" you're referring to what is termed the subject of the sentence.

The subject can be identified by understanding a sentence through natural language processing.

The answer to this question is the same as that for How to determine subject, object and other words? - this is a currently unsolved problem.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文