Your question is really two questions: how to use GATE to find named entities and maybe how to embed GATE into your application.
Named entity recognition or classification is a huge field of research and depending on what named entities you want to find, different approaches may be most effective. GATE provides a very basic gazetteer list and rule based approach for finding some categories of named entities in English text: ANNIE. If the categories found by ANNIE are those interesting to you, one way to start might be to understand and improve what is already provided by ANNIE. The ANNIE pipeline will create annotations for Person, Organization etc in your document and you only need to use or write a PR that accesses those annotations and does whatever you need with the features or the text of those annotations. Look at the GATE manual http://gate.ac.uk/sale/tao/split.html it explains ANNIE and also has some documentation on how to embed GATE (how to use GATE directly from your Java program without running the GUI).
发布评论
评论(2)
您的问题实际上是两个问题:如何使用 GATE 查找命名实体以及如何将 GATE 嵌入到您的应用程序中。
命名实体识别或分类是一个巨大的研究领域,根据您想要查找的命名实体,不同的方法可能是最有效的。 GATE 提供了一个非常基本的地名词典列表和基于规则的方法,用于在英文文本中查找某些类别的命名实体:ANNIE。
如果 ANNIE 找到的类别是您感兴趣的类别,那么一种开始方法可能是了解并改进 ANNIE 已提供的内容。
ANNIE 管道将为您的文档中的人员、组织等创建注释,您只需使用或编写一个 PR 即可访问这些注释,并根据这些注释的功能或文本执行您需要的任何操作。
查看 GATE 手册 http://gate.ac.uk/sale/tao/split。 html 它解释了 ANNIE,并且还有一些关于如何嵌入 GATE 的文档(如何直接从 Java 程序使用 GATE,而不运行 GUI)。
Your question is really two questions: how to use GATE to find named entities and maybe how to embed GATE into your application.
Named entity recognition or classification is a huge field of research and depending on what named entities you want to find, different approaches may be most effective. GATE provides a very basic gazetteer list and rule based approach for finding some categories of named entities in English text: ANNIE.
If the categories found by ANNIE are those interesting to you, one way to start might be to understand and improve what is already provided by ANNIE.
The ANNIE pipeline will create annotations for Person, Organization etc in your document and you only need to use or write a PR that accesses those annotations and does whatever you need with the features or the text of those annotations.
Look at the GATE manual http://gate.ac.uk/sale/tao/split.html it explains ANNIE and also has some documentation on how to embed GATE (how to use GATE directly from your Java program without running the GUI).
以下是一些可用于命名实体识别 NER 的 CREOLE 插件的列表:
如果您对医学 NER 感兴趣,可以使用:
这些外部插件:
Here is a list of some CREOLE plugin that can be used for named entity recognition NER:
if you intrested in medical NER you can use:
There is also these external plugins: