如何解析句子列表?

发布于 2024-10-17 05:25:29 字数 170 浏览 7 评论 0原文

我想使用斯坦福 NLP 解析器解析句子列表。 我的列表是一个 ArrayList,如何使用 LexicalizedParser 解析所有列表?

我想从每个句子中得到这种形式:

Tree parse =  (Tree) lp1.apply(sentence);

I want to parse a list of sentences with the Stanford NLP parser.
My list is an ArrayList, how can I parse all the list with LexicalizedParser?

I want to get from each sentence this form:

Tree parse =  (Tree) lp1.apply(sentence);

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

岁吢 2024-10-24 05:25:29

虽然人们可以深入研究文档,但我将在这里提供代码,特别是因为链接移动和/或死亡。这个特定的答案使用整个管道。如果对整个管道不感兴趣,我将在一秒钟内提供替代答案。

下面的例子是使用Stanford pipeline的完整方式。如果对共指解析不感兴趣,请从第 3 行代码中删除 dcoref。因此,在下面的示例中,如果您只是将其输入到文本正文(文本变量)中,管道就会为您执行句子分割(ssplit 注释器)。只有一句话吗?好吧,没关系,您可以将其作为文本变量输入。

   // creates a StanfordCoreNLP object, with POS tagging, lemmatization, NER, parsing, and coreference resolution 
    Properties props = new Properties();
    props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

    // read some text in the text variable
    String text = ... // Add your text here!

    // create an empty Annotation just with the given text
    Annotation document = new Annotation(text);

    // run all Annotators on this text
    pipeline.annotate(document);

    // these are all the sentences in this document
    // a CoreMap is essentially a Map that uses class objects as keys and has values with custom types
    List<CoreMap> sentences = document.get(SentencesAnnotation.class);

    for(CoreMap sentence: sentences) {
      // traversing the words in the current sentence
      // a CoreLabel is a CoreMap with additional token-specific methods
      for (CoreLabel token: sentence.get(TokensAnnotation.class)) {
        // this is the text of the token
        String word = token.get(TextAnnotation.class);
        // this is the POS tag of the token
        String pos = token.get(PartOfSpeechAnnotation.class);
        // this is the NER label of the token
        String ne = token.get(NamedEntityTagAnnotation.class);       
      }

      // this is the parse tree of the current sentence
      Tree tree = sentence.get(TreeAnnotation.class);

      // this is the Stanford dependency graph of the current sentence
      SemanticGraph dependencies = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class);
    }

    // This is the coreference link graph
    // Each chain stores a set of mentions that link to each other,
    // along with a method for getting the most representative mention
    // Both sentence and token offsets start at 1!
    Map<Integer, CorefChain> graph = 
      document.get(CorefChainAnnotation.class);

Although one can dig into the documentation, I am going to provide code here on SO, especially since links move and/or die. This particular answer uses the whole pipeline. If not interested in the whole pipeline, I will provide an alternative answer in just a second.

The below example is the complete way of using the Stanford pipeline. If not interested in coreference resolution, remove dcoref from the 3rd line of code. So in the example below, the pipeline does the sentence splitting for you (the ssplit annotator) if you just feed it in a body of text (the text variable). Have just one sentence? Well, that is ok, you can feed that in as the text variable.

   // creates a StanfordCoreNLP object, with POS tagging, lemmatization, NER, parsing, and coreference resolution 
    Properties props = new Properties();
    props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

    // read some text in the text variable
    String text = ... // Add your text here!

    // create an empty Annotation just with the given text
    Annotation document = new Annotation(text);

    // run all Annotators on this text
    pipeline.annotate(document);

    // these are all the sentences in this document
    // a CoreMap is essentially a Map that uses class objects as keys and has values with custom types
    List<CoreMap> sentences = document.get(SentencesAnnotation.class);

    for(CoreMap sentence: sentences) {
      // traversing the words in the current sentence
      // a CoreLabel is a CoreMap with additional token-specific methods
      for (CoreLabel token: sentence.get(TokensAnnotation.class)) {
        // this is the text of the token
        String word = token.get(TextAnnotation.class);
        // this is the POS tag of the token
        String pos = token.get(PartOfSpeechAnnotation.class);
        // this is the NER label of the token
        String ne = token.get(NamedEntityTagAnnotation.class);       
      }

      // this is the parse tree of the current sentence
      Tree tree = sentence.get(TreeAnnotation.class);

      // this is the Stanford dependency graph of the current sentence
      SemanticGraph dependencies = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class);
    }

    // This is the coreference link graph
    // Each chain stores a set of mentions that link to each other,
    // along with a method for getting the most representative mention
    // Both sentence and token offsets start at 1!
    Map<Integer, CorefChain> graph = 
      document.get(CorefChainAnnotation.class);
痴者 2024-10-24 05:25:29

实际上,斯坦福大学 NLP 的文档提供了如何解析句子的示例。

您可以在此处找到文档

Actually documentation from Stanford NLP provide sample of how to parse sentences.

You can find the documentation here

昨迟人 2024-10-24 05:25:29

因此,正如所承诺的,如果您不想访问完整的斯坦福管道(尽管我相信这是推荐的方法),您可以直接使用 LexicalizedParser 类。在这种情况下,您将下载最新版本的Stanford Parser(而其他版本将使用CoreNLP工具)。确保除了解析器 jar 之外,您还拥有要使用的相应解析器的模型文件。示例代码:

LexicalizedParser lp1 = new LexicalizedParser("englishPCFG.ser.gz", new Options());
String sentence = "It is a fine day today";
Tree parse = lp.parse(sentence);

请注意,这适用于解析器的 3.3.1 版本。

So as promised, if you don't want to access the full Stanford pipeline (although I believe that is the recommended approach), you can work with the LexicalizedParser class directly. In this case, you would download the latest version of Stanford Parser (whereas the other would use CoreNLP tools). Make sure that in addition to the parser jar, you have the model file for the appropriate parser you want to work with. Example code:

LexicalizedParser lp1 = new LexicalizedParser("englishPCFG.ser.gz", new Options());
String sentence = "It is a fine day today";
Tree parse = lp.parse(sentence);

Note this works for version 3.3.1 of the parser.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文