与 Gate ANNIE 一起学习语法

发布于 2024-11-29 03:38:48 字数 672 浏览 0 评论 0原文

你好,我从事信息检索工作已经有一段时间了,但遇到了一些困难。 最近,我从以下链接下载了 StandAloneAnnie.java

http:// /gate.ac.uk/wiki/code-repository/src/sheffield/examples/StandAloneAnnie.java 尽管我已经能够执行它并查看输出,但我有一两个查询。

  1. 此程序注释人员和位置,其中存储用于注释此类实体的语法。

  2. 如何编写自己的简单语法来提取一些数据并在我的 StandAloneAnnie.java 副本中使用它?

以前的帖子 一个字符串上有数百个正则表达式 NLP 新手,关于注释的问题

Hello I have been trying to work on information retrieval for quite sometime and have been facing some difficulties.
Recently I downloaded StandAloneAnnie.java from following link

http://gate.ac.uk/wiki/code-repository/src/sheffield/examples/StandAloneAnnie.java
Though I have been able to execute it and see the output I have a query or two.

  1. This program annotates people and locations, where is the grammar stored for annotating such entities.

  2. How can I write my own simple grammar to extract some data and use it in my copy of StandAloneAnnie.java?

Previous posts
Hundreds of RegEx on one string New to NLP, Question about annotation

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

人事已非 2024-12-06 03:38:48

以下是标记人身高的简单语法

Phase: Meaurements
Input: Token Number 
Options: control=appelt debug=true



Rule: Height
(
({Number})
( {Token.string=~"[Ff]t"} | {Token.string=~"[Ii]n"} | {Token.string=~"[Cc]m"})
):height
-->
:height.Height= {value= :height.Number.value, unit= :height.Token.string}

这是执行的主要代码,

    public static void main(String arg[]) {

            Gate.init();
            gate.Corpus corpus= (Corpus) Factory.createResource("gate.corpora.CorpusImpl");

//You need to register the plugin before you load it.

            Gate.getCreoleRegister().registerDirectories(new File(Gate.getPluginsHome(), ANNIEConstants.PLUGIN_DIR).toURI().toURL());
            Gate.getCreoleRegister().registerDirectories(new URL("file:///GATE_HOME/plugins/Tagger_Numbers"));//change this path


            Document doc = new DocumentImpl();
//The string to be annotated.

String str = "Height is 60 in. Weight is 150 lbs pulse rate 90 Pulse rate 90";
DocumentContentImpl impl = new DocumentContentImpl(str);
doc.setContent(impl);

//Loading processing resources. refer http://gate.ac.uk/gate/doc/plugins.html for what class the plugin belongs to

            ProcessingResource token = (ProcessingResource) Factory.createResource("gate.creole.tokeniser.DefaultTokeniser", Factory.newFeatureMap());
            ProcessingResource sspliter = (ProcessingResource) Factory.createResource("gate.creole.splitter.SentenceSplitter", Factory.newFeatureMap());
            ProcessingResource number = (ProcessingResource) Factory.createResource("gate.creole.numbers.NumbersTagger", Factory.newFeatureMap());


/*pipeline is an application that needs to be created to use resources loaded above.
Reasources must be added in a particular order eg. below the 'number' resource requires the document to be tokenised. */

corpus.add(doc);
SerialAnalyserController pipeline = (SerialAnalyserController) Factory.createResource("gate.creole.SerialAnalyserController", Factory.newFeatureMap(), Factory.newFeatureMap(), "ANNIE");
pipeline.setCorpus(corpus);
pipeline.add(token);
pipeline.add(sspliter);
pipeline.add(number);
pipeline.execute();

//Extract info from an annotated document.

AnnotationSetImpl ann=(AnnotationSetImpl)doc.getAnnotations();
Iterator<Annotation>i = ann.get(vital).iterator();
Annotation annotation = i.next();
long start = annotation.getStartNode().getOffset();
long end =  annotation.getEndNode().getOffset();
System.out.println(doc.toString().substring((int)start, (int)end));

}

注意:-
在上面的代码中,Height 的语法将写入 .jape 文件中。您需要使用 JAPE(JAPE Plus) 转换器来运行此语法。我们只需要在主代码中执行应用程序('pipeline')。您可以在 gateway.ac.uk/sale/tao 找到编写 jape 的教程

Following is a simple grammar for tagging Height of a person

Phase: Meaurements
Input: Token Number 
Options: control=appelt debug=true



Rule: Height
(
({Number})
( {Token.string=~"[Ff]t"} | {Token.string=~"[Ii]n"} | {Token.string=~"[Cc]m"})
):height
-->
:height.Height= {value= :height.Number.value, unit= :height.Token.string}

This is the main code that gets executed,

    public static void main(String arg[]) {

            Gate.init();
            gate.Corpus corpus= (Corpus) Factory.createResource("gate.corpora.CorpusImpl");

//You need to register the plugin before you load it.

            Gate.getCreoleRegister().registerDirectories(new File(Gate.getPluginsHome(), ANNIEConstants.PLUGIN_DIR).toURI().toURL());
            Gate.getCreoleRegister().registerDirectories(new URL("file:///GATE_HOME/plugins/Tagger_Numbers"));//change this path


            Document doc = new DocumentImpl();
//The string to be annotated.

String str = "Height is 60 in. Weight is 150 lbs pulse rate 90 Pulse rate 90";
DocumentContentImpl impl = new DocumentContentImpl(str);
doc.setContent(impl);

//Loading processing resources. refer http://gate.ac.uk/gate/doc/plugins.html for what class the plugin belongs to

            ProcessingResource token = (ProcessingResource) Factory.createResource("gate.creole.tokeniser.DefaultTokeniser", Factory.newFeatureMap());
            ProcessingResource sspliter = (ProcessingResource) Factory.createResource("gate.creole.splitter.SentenceSplitter", Factory.newFeatureMap());
            ProcessingResource number = (ProcessingResource) Factory.createResource("gate.creole.numbers.NumbersTagger", Factory.newFeatureMap());


/*pipeline is an application that needs to be created to use resources loaded above.
Reasources must be added in a particular order eg. below the 'number' resource requires the document to be tokenised. */

corpus.add(doc);
SerialAnalyserController pipeline = (SerialAnalyserController) Factory.createResource("gate.creole.SerialAnalyserController", Factory.newFeatureMap(), Factory.newFeatureMap(), "ANNIE");
pipeline.setCorpus(corpus);
pipeline.add(token);
pipeline.add(sspliter);
pipeline.add(number);
pipeline.execute();

//Extract info from an annotated document.

AnnotationSetImpl ann=(AnnotationSetImpl)doc.getAnnotations();
Iterator<Annotation>i = ann.get(vital).iterator();
Annotation annotation = i.next();
long start = annotation.getStartNode().getOffset();
long end =  annotation.getEndNode().getOffset();
System.out.println(doc.toString().substring((int)start, (int)end));

}

Note:-
In the above code, the grammar for Height will be written in a .jape file. You need to run this grammar using a JAPE(JAPE Plus) transducer. We just need to execute the application('pipeline') in our main code. You can find tutorial for writing jape at gate.ac.uk/sale/tao

轮廓§ 2024-12-06 03:38:48

有一个 Annie 简介 幻灯片,解释了语法是如何存储的。它们位于 Jape 文件中。

There is an Introduction to Annie powerpoint which explains how the grammar is stored. They are in Jape files.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文