如何用java获取句子的逻辑部分?
假设有一句话:
On March 1, he was born.
将其更改为
He was born on March 1.
不会破坏该句子的含义,并且它仍然有效。以任何其他方式打乱单词都会产生奇怪甚至无效的句子。所以基本上,我谈论的是句子的某些部分,这使得信息更加具体,但删除它们并不会破坏整个句子。是否有任何 NLP 库可以识别这些部分?
Let's say there is a sentence:
On March 1, he was born.
Changing it to
He was born on March 1.
doesn't break the sense of the sentence and it is still valid. Shuffling words in any other way would produce weird to invalid sentences. So basically, I'm talking about parts of the sentence, which make the information more specific, but removing them doesn't break the whole sentence. Is there any NLP library in which identifying such parts is available?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
成分
听起来您想识别句子的成分,它们是根据语言语法作为单个单元运行的单词组。
事实上,当语言学家试图发现一种语言的语法时,他们部分地通过查看
成分可以是单个单词、短语,甚至是更大的组,例如整个子句。在一个句子中,它们具有嵌套的层次结构。例如,您给出的第一个例句可以分析为:
整个句子由 介词组成短语,后跟一个名词短语,然后是一个动词短语。介词短语可以进一步分解为由单个单词“On”后面跟着一个名词短语组成的单元。
短语结构解析器
要自动查找成分,您可能需要使用短语结构解析器。有许多这样的解析器可供选择,它们都是开源的,包括:
斯坦福和伯克利解析器可能是最容易安装和使用的。正如Cer 等人所见。 2010,最准确的解析器是 Berkeley 和 Charniak。 Bikel 解析器比其他解析器更慢且不太准确。
在线演示
此处有斯坦福解析器的在线演示。我使用演示来生成上面给出的示例句子的解析。
关于删除的说明
在每个成分中,都会有一个中心词。例如,以名词短语为例:
(NP (DT The) (JJ big) (JJ blue) (NN ball))
这里的中心词是名词
ball
,它由形容词big
和blue
修饰。如果这个名词短语嵌入在句子中,您可以删除这些修饰语,但仍然有一些与原始句子的含义一致但不太具体的内容。在名词短语中,您通常可以删除形容词、非中心名词以及嵌套的介词短语。
在动词短语和完整从句中,事情变得更加棘手,因为删除用作动词参数的材料可以完全改变句子的解释。例如,从
He sell Jim the book
中删除the book
会导致He sell Jim
。Constituents
It sounds like you want to identify the sentence's constituents, which are groups of words that operate as a single unit according to the grammar of a language.
In fact, when linguistics are trying to discover a language's grammar, they do it in part by looking at movement. As in your example, this is where a group of words can be moved to a different position in a sentence while still preserving the meaning of the sentence.
Constituents can be individual words, phrases, or even larger groups such as whole clauses. Within a sentence, they have a nested hierarchical structure. For instance, the first example sentence you gave could be analyzed as:
The whole sentence is made up of a prepositional phrase, followed by a noun phrase, and then a verb phrase. The prepositional phrase can be further decomposed into a unit consisting of the single word 'On' followed by a noun phrase.
Phrase Structure Parsers
To find constituents automatically, you will probably want to use a phrase structure parser. There are many such parses to choose from that are available as open source, including:
The Stanford and Berkeley parsers are probably the easiest to install and use. As seen in Cer et al. 2010, the most accurate parsers are Berkeley and Charniak. The Bikel parser is slower and less accurate than the others.
Online Demo
There's an online demo for the Stanford parser here. I used the demo to produce the parse given above of your example sentence.
A Note About Deletion
Within each constituent, there will be a head word. For example, take the noun phrase:
(NP (DT The) (JJ big) (JJ blue) (NN ball))
The head word here is the noun
ball
, and it is modified by the adjectivesbig
andblue
. If this noun phrase was embedded in a sentence, you could delete those modifiers and still have something that was consistent with, but less specific than, the meaning of the original sentence.Within noun phrases, you can generally delete the adjectives, nouns that are not the head, and nested prepositional phrases.
Within verb phrases and complete clauses, things get more tricky since deleting material that servers as an argument to the verb can completely change the interpretation a sentence. For example, deleting
the book
fromHe sold Jim the book
results inHe sold Jim
.OpenNLP 可以为您做一些事情。 短语分块和解析应该可以帮助你。然而,这并不是一个特别简单的问题,随着句子结构变得更加复杂和模糊,算法往往会变得混乱。有时您应该能够对句子中的短语进行重新排序并保持含义。
OpenNLP may do some of this for you. Phrase chunking and parsing should help you with this. However, this is not a particularly simple problem, and algorithms will tend to get confused as sentence structure becomes more complex and ambiguous. You should sometimes be able to reorder phrases within a sentence and maintain meaning.