如何用java获取句子的逻辑部分？

发布于 2024-08-30 00:47:08 字数 246 浏览 7 评论 0原文

假设有一句话：

On March 1, he was born.

将其更改为

He was born on March 1.

不会破坏该句子的含义，并且它仍然有效。以任何其他方式打乱单词都会产生奇怪甚至无效的句子。所以基本上，我谈论的是句子的某些部分，这使得信息更加具体，但删除它们并不会破坏整个句子。是否有任何 NLP 库可以识别这些部分？

原文

Let's say there is a sentence:

On March 1, he was born.

Changing it to

He was born on March 1.

doesn't break the sense of the sentence and it is still valid. Shuffling words in any other way would produce weird to invalid sentences. So basically, I'm talking about parts of the sentence, which make the information more specific, but removing them doesn't break the whole sentence. Is there any NLP library in which identifying such parts is available?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

睫毛溺水了 2024-09-06 00:47:08

成分

听起来您想识别句子的成分，它们是根据语言语法作为单个单元运行的单词组。

事实上，当语言学家试图发现一种语言的语法时，他们部分地通过查看

成分可以是单个单词、短语，甚至是更大的组，例如整个子句。在一个句子中，它们具有嵌套的层次结构。例如，您给出的第一个例句可以分析为：

(S  (PP (IN On) (NP (NNP March) (CD 1)))
    (NP (PRP he))
    (VP (VBD was) (VP (VBN born))))

整个句子由介词组成短语，后跟一个名词短语，然后是一个动词短语。介词短语可以进一步分解为由单个单词“On”后面跟着一个名词短语组成的单元。

短语结构解析器

要自动查找成分，您可能需要使用短语结构解析器。有许多这样的解析器可供选择，它们都是开源的，包括：

Stanford Parser (Java)
伯克利解析器 (Java)
BLLIP (Charniak-Johnson) 解析器 (C++)
Bikel Parser（这是用 Java 编写的 Collins 解析器的重新实现和改进版本）
柯林斯解析器 (C++)
OpenNLP 解析器 (Java)
SharpNLP 解析器 (C#)

斯坦福和伯克利解析器可能是最容易安装和使用的。正如Cer 等人所见。 2010，最准确的解析器是 Berkeley 和 Charniak。 Bikel 解析器比其他解析器更慢且不太准确。

在线演示

此处有斯坦福解析器的在线演示。我使用演示来生成上面给出的示例句子的解析。

关于删除的说明

在每个成分中，都会有一个中心词。例如，以名词短语为例：

(NP (DT The) (JJ big) (JJ blue) (NN ball))

这里的中心词是名词ball，它由形容词big和blue修饰。如果这个名词短语嵌入在句子中，您可以删除这些修饰语，但仍然有一些与原始句子的含义一致但不太具体的内容。

在名词短语中，您通常可以删除形容词、非中心名词以及嵌套的介词短语。

在动词短语和完整从句中，事情变得更加棘手，因为删除用作动词参数的材料可以完全改变句子的解释。例如，从 He sell Jim the book 中删除 the book 会导致 He sell Jim。

Constituents

It sounds like you want to identify the sentence's constituents, which are groups of words that operate as a single unit according to the grammar of a language.

In fact, when linguistics are trying to discover a language's grammar, they do it in part by looking at movement. As in your example, this is where a group of words can be moved to a different position in a sentence while still preserving the meaning of the sentence.

Constituents can be individual words, phrases, or even larger groups such as whole clauses. Within a sentence, they have a nested hierarchical structure. For instance, the first example sentence you gave could be analyzed as:

(S  (PP (IN On) (NP (NNP March) (CD 1)))
    (NP (PRP he))
    (VP (VBD was) (VP (VBN born))))

The whole sentence is made up of a prepositional phrase, followed by a noun phrase, and then a verb phrase. The prepositional phrase can be further decomposed into a unit consisting of the single word 'On' followed by a noun phrase.

Phrase Structure Parsers

To find constituents automatically, you will probably want to use a phrase structure parser. There are many such parses to choose from that are available as open source, including:

Stanford Parser (Java)
Berkeley Parser (Java)
BLLIP (Charniak-Johnson) Parser (C++)
Bikel Parser (this is a reimplemented and improved version of the Collins parser write in Java)
Collins Parser (C++)
OpenNLP Parser (Java)
SharpNLP Parser (C#)

The Stanford and Berkeley parsers are probably the easiest to install and use. As seen in Cer et al. 2010, the most accurate parsers are Berkeley and Charniak. The Bikel parser is slower and less accurate than the others.

Online Demo

There's an online demo for the Stanford parser here. I used the demo to produce the parse given above of your example sentence.

A Note About Deletion

Within each constituent, there will be a head word. For example, take the noun phrase:

(NP (DT The) (JJ big) (JJ blue) (NN ball))

The head word here is the noun ball, and it is modified by the adjectives big and blue. If this noun phrase was embedded in a sentence, you could delete those modifiers and still have something that was consistent with, but less specific than, the meaning of the original sentence.

Within noun phrases, you can generally delete the adjectives, nouns that are not the head, and nested prepositional phrases.

Within verb phrases and complete clauses, things get more tricky since deleting material that servers as an argument to the verb can completely change the interpretation a sentence. For example, deleting the book from He sold Jim the book results in He sold Jim.

回复收藏 0 原文