构建词性标注器(POS 标注器)

发布于 2024-11-30 03:14:05 字数 70 浏览 5 评论 0原文

我需要用 Java 构建一个 POS 标注器,并且需要知道如何开始。是否有代码示例或其他资源可以帮助说明词性标注器的工作原理?

I need to build a POS tagger in Java and need to know how to get started. Are there code examples or other resources that help illustrate how POS taggers work?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

看轻我的陪伴 2024-12-07 03:14:05

尝试 Apache OpenNLP。它包括 POS Tagger 工具。您可以从此处下载即用型英文模型。

该文档提供了有关如何从 Java 应用程序使用它的详细信息。基本上,您需要以下内容:

加载 POS 模型

InputStream modelIn = null;

try {
  modelIn = new FileInputStream("en-pos-maxent.bin");
  POSModel model = new POSModel(modelIn);
}
catch (IOException e) {
  // Model loading failed, handle the error
  e.printStackTrace();
}
finally {
  if (modelIn != null) {
    try {
      modelIn.close();
    }
    catch (IOException e) {
    }
  }
}

实例化 POS 标记器

POSTaggerME tagger = new POSTaggerME(model);

执行它

String sent[] = new String[]{"Most", "large", "cities", "in", "the", "US", "had", "morning", "and", "afternoon", "newspapers", "."};          
String tags[] = tagger.tag(sent);

请注意,POS 标记器需要一个标记化的句子。 Apache OpenNLP 还提供了工具和模型来帮助完成这些任务。

如果您必须训练自己的模型,请参阅此 文档

Try Apache OpenNLP. It includes a POS Tagger tools. You can download ready-to-use English models from here.

The documentation provides details about how to use it from a Java application. Basically you need the following:

Load the POS model

InputStream modelIn = null;

try {
  modelIn = new FileInputStream("en-pos-maxent.bin");
  POSModel model = new POSModel(modelIn);
}
catch (IOException e) {
  // Model loading failed, handle the error
  e.printStackTrace();
}
finally {
  if (modelIn != null) {
    try {
      modelIn.close();
    }
    catch (IOException e) {
    }
  }
}

Instantiate the POS tagger

POSTaggerME tagger = new POSTaggerME(model);

Execute it

String sent[] = new String[]{"Most", "large", "cities", "in", "the", "US", "had", "morning", "and", "afternoon", "newspapers", "."};          
String tags[] = tagger.tag(sent);

Note that the POS tagger expects a tokenized sentence. Apache OpenNLP also provides tools and models to help with these tasks.

If you have to train your own model refer to this documentation.

我要还你自由 2024-12-07 03:14:05

您可以检查现有的标记器实现。

例如,请参考 Java 中的斯坦福大学 POS 标记器(由 Kristina Toutanova 编写),它可在 GNU 通用公共许可证(v2 或更高版本)下使用,源代码编写良好且记录清晰:

http://nlp.stanford.edu/software/tagger.shtml

关于标记的好书是:
语音和语言处理(第二版)作者:Daniel Jurafsky、James H. Martin

You can examine existing taggers implementations.

Refer for example to Stanford University POS tagger in Java (by Kristina Toutanova), it is available under GNU General Public License (v2 or later), source code is well written and clearly documented:

http://nlp.stanford.edu/software/tagger.shtml

Good book to read about tagging is:
Speech and Language Processing (2nd Edition) by Daniel Jurafsky, James H. Martin

郁金香雨 2024-12-07 03:14:05

有一些广泛使用的 POS/NER 标注器。

OpenNLP Maxent POS 标注器:使用 Apache OpenNLP。

Open NLP 是 Apache 提供的功能强大的 java NLP 库。它为 NLP 提供了各种工具,其中之一是词性(POS)标记器。通常 POS 标记器用于找出文本中的语法结构,您使用标记数据集,其中每个单词(短语的一部分)都标有标签,您可以根据该数据集构建 NLP 模型,然后对于新文本,您可以使用模型为文本中的每个单词生成标签。

示例代码:

public void doTagging(POSModel model, String input) {
    input = input.trim();
    POSTaggerME tagger = new POSTaggerME(model);
    Sequence[] sequences = tagger.topKSequences(input.split(" "));
    for (Sequence s : sequences) {
        List<String> tags = s.getOutcomes();
        System.out.println(Arrays.asList(input.split(" ")) +" =>" + tags);
    }
}

详细博客,其中包含有关如何使用它的完整代码:

https://dataturks.com/blog/opennlp-pos-tagger-training-java-example.php?s=so

基于斯坦福 CoreNLP 的 NER tagger:

斯坦福核心 NLP 是迄今为止经过最久经考验的 NLP 库。在某种程度上,它是当今 NLP 性能的黄金标准。在各种其他功能中,库支持命名实体识别(NER),这允许标记一段文本中的重要实体,例如人名、地点等。

示例代码:

public void doTagging(CRFClassifier model, String input) {
  input = input.trim();
  System.out.println(input + "=>"  +  model.classifyToString(input));
}  

详细博客包含有关如何使用它的完整代码:

https://dataturks.com/blog/stanford-core-nlp-ner-training-java-example.php?s=so

There are a few POS/NER taggers used widely.

OpenNLP Maxent POS taggers: Using Apache OpenNLP.

Open NLP is a powerful java NLP library from Apache. It provides various tools for NLP one of which is Parts-Of-Speech (POS) tagger. Usually POS taggers are used to find out structure grammatical structure in text, you use a tagged dataset where each word (part of a phrase) is tagged with a label, you build an NLP model from this dataset and then for a new text you can use the model to generate tags for each word in the text.

Sample code:

public void doTagging(POSModel model, String input) {
    input = input.trim();
    POSTaggerME tagger = new POSTaggerME(model);
    Sequence[] sequences = tagger.topKSequences(input.split(" "));
    for (Sequence s : sequences) {
        List<String> tags = s.getOutcomes();
        System.out.println(Arrays.asList(input.split(" ")) +" =>" + tags);
    }
}

Detailed blog with the full code on how to use it:

https://dataturks.com/blog/opennlp-pos-tagger-training-java-example.php?s=so

Stanford CoreNLP based NER tagger:

Stanford core NLP is by far the most battle-tested NLP library out there. In a way, it is the golden standard of NLP performance today. Among various other functionalities, named entity recognization (NER) is supported in the library, what this allows is to tag important entities in a piece of text like the name of a person, place etc.

Sample code:

public void doTagging(CRFClassifier model, String input) {
  input = input.trim();
  System.out.println(input + "=>"  +  model.classifyToString(input));
}  

Detailed blog with the full code on how to use it:

https://dataturks.com/blog/stanford-core-nlp-ner-training-java-example.php?s=so

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文