当前位置：文江博客话题详情

构建词性标注器（POS 标注器）

发布于 2024-11-30 03:14:05 字数 70 浏览 9 评论 0原文

我需要用 Java 构建一个 POS 标注器，并且需要知道如何开始。是否有代码示例或其他资源可以帮助说明词性标注器的工作原理？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

看轻我的陪伴 2024-12-07 03:14:05

尝试 Apache OpenNLP。它包括 POS Tagger 工具。您可以从此处下载即用型英文模型。

该文档提供了有关如何从 Java 应用程序使用它的详细信息。基本上，您需要以下内容：

加载 POS 模型

InputStream modelIn = null;

try {
  modelIn = new FileInputStream("en-pos-maxent.bin");
  POSModel model = new POSModel(modelIn);
}
catch (IOException e) {
  // Model loading failed, handle the error
  e.printStackTrace();
}
finally {
  if (modelIn != null) {
    try {
      modelIn.close();
    }
    catch (IOException e) {
    }
  }
}

实例化 POS 标记器

POSTaggerME tagger = new POSTaggerME(model);

执行它

String sent[] = new String[]{"Most", "large", "cities", "in", "the", "US", "had", "morning", "and", "afternoon", "newspapers", "."};          
String tags[] = tagger.tag(sent);

请注意，POS 标记器需要一个标记化的句子。 Apache OpenNLP 还提供了工具和模型来帮助完成这些任务。

如果您必须训练自己的模型，请参阅此文档。

Try Apache OpenNLP. It includes a POS Tagger tools. You can download ready-to-use English models from here.

The documentation provides details about how to use it from a Java application. Basically you need the following:

Load the POS model

InputStream modelIn = null;

try {
  modelIn = new FileInputStream("en-pos-maxent.bin");
  POSModel model = new POSModel(modelIn);
}
catch (IOException e) {
  // Model loading failed, handle the error
  e.printStackTrace();
}
finally {
  if (modelIn != null) {
    try {
      modelIn.close();
    }
    catch (IOException e) {
    }
  }
}

Instantiate the POS tagger

POSTaggerME tagger = new POSTaggerME(model);

Execute it

String sent[] = new String[]{"Most", "large", "cities", "in", "the", "US", "had", "morning", "and", "afternoon", "newspapers", "."};          
String tags[] = tagger.tag(sent);

Note that the POS tagger expects a tokenized sentence. Apache OpenNLP also provides tools and models to help with these tasks.

If you have to train your own model refer to this documentation.

回复收藏 0 原文

我要还你自由 2024-12-07 03:14:05

您可以检查现有的标记器实现。

例如，请参考 Java 中的斯坦福大学 POS 标记器（由 Kristina Toutanova 编写），它可在 GNU 通用公共许可证（v2 或更高版本）下使用，源代码编写良好且记录清晰：

http://nlp.stanford.edu/software/tagger.shtml

关于标记的好书是：
语音和语言处理（第二版）作者：Daniel Jurafsky、James H. Martin

回复收藏 0 原文

郁金香雨 2024-12-07 03:14:05

有一些广泛使用的 POS/NER 标注器。

OpenNLP Maxent POS 标注器：使用 Apache OpenNLP。

Open NLP 是 Apache 提供的功能强大的 java NLP 库。它为 NLP 提供了各种工具，其中之一是词性（POS）标记器。通常 POS 标记器用于找出文本中的语法结构，您使用标记数据集，其中每个单词（短语的一部分）都标有标签，您可以根据该数据集构建 NLP 模型，然后对于新文本，您可以使用模型为文本中的每个单词生成标签。

示例代码：

public void doTagging(POSModel model, String input) {
    input = input.trim();
    POSTaggerME tagger = new POSTaggerME(model);
    Sequence[] sequences = tagger.topKSequences(input.split(" "));
    for (Sequence s : sequences) {
        List<String> tags = s.getOutcomes();
        System.out.println(Arrays.asList(input.split(" ")) +" =>" + tags);
    }
}

详细博客，其中包含有关如何使用它的完整代码：

https://dataturks.com/blog/opennlp-pos-tagger-training-java-example.php?s=so

基于斯坦福 CoreNLP 的 NER tagger：

斯坦福核心 NLP 是迄今为止经过最久经考验的 NLP 库。在某种程度上，它是当今 NLP 性能的黄金标准。在各种其他功能中，库支持命名实体识别（NER），这允许标记一段文本中的重要实体，例如人名、地点等。

示例代码：

public void doTagging(CRFClassifier model, String input) {
  input = input.trim();
  System.out.println(input + "=>"  +  model.classifyToString(input));
}

详细博客包含有关如何使用它的完整代码：

https://dataturks.com/blog/stanford-core-nlp-ner-training-java-example.php?s=so

There are a few POS/NER taggers used widely.

OpenNLP Maxent POS taggers: Using Apache OpenNLP.

Open NLP is a powerful java NLP library from Apache. It provides various tools for NLP one of which is Parts-Of-Speech (POS) tagger. Usually POS taggers are used to find out structure grammatical structure in text, you use a tagged dataset where each word (part of a phrase) is tagged with a label, you build an NLP model from this dataset and then for a new text you can use the model to generate tags for each word in the text.

Sample code:

public void doTagging(POSModel model, String input) {
    input = input.trim();
    POSTaggerME tagger = new POSTaggerME(model);
    Sequence[] sequences = tagger.topKSequences(input.split(" "));
    for (Sequence s : sequences) {
        List<String> tags = s.getOutcomes();
        System.out.println(Arrays.asList(input.split(" ")) +" =>" + tags);
    }
}

Detailed blog with the full code on how to use it:

https://dataturks.com/blog/opennlp-pos-tagger-training-java-example.php?s=so

Stanford CoreNLP based NER tagger:

Stanford core NLP is by far the most battle-tested NLP library out there. In a way, it is the golden standard of NLP performance today. Among various other functionalities, named entity recognization (NER) is supported in the library, what this allows is to tag important entities in a piece of text like the name of a person, place etc.

Sample code:

public void doTagging(CRFClassifier model, String input) {
  input = input.trim();
  System.out.println(input + "=>"  +  model.classifyToString(input));
}

Detailed blog with the full code on how to use it:

https://dataturks.com/blog/stanford-core-nlp-ner-training-java-example.php?s=so

回复收藏 0 原文

~没有更多了~