如何使用antlr显示句子中的所有代词及其人称

发布于 2024-08-23 03:29:59 字数 2309 浏览 6 评论 0 原文

根据 WayneH 的语法进行编辑

这是我的语法文件中的内容。

grammar pfinder;

options {
  language = Java;
}
sentence
    : ((words | pronoun) SPACE)* ((words | pronoun) ('.' | '?'))
    ;

words 
    :   WORDS {System.out.println($text);};

pronoun returns [String value] 
    : sfirst {$value = $sfirst.value; System.out.println($sfirst.text + '(' + $sfirst.value + ')');}
    | ssecond {$value = $ssecond.value; System.out.println($ssecond.text + '(' + $ssecond.value + ')');}
    | sthird {$value = $sthird.value; System.out.println($sthird.text + '(' + $sthird.value + ')');}
    | pfirst {$value = $pfirst.value; System.out.println($pfirst.text + '(' + $pfirst.value + ')');}
    | psecond {$value = $psecond.value; System.out.println($psecond.text + '(' + $psecond.value + ')');}
    | pthird{$value = $pthird.value; System.out.println($pthird.text + '(' + $pthird.value + ')');};

sfirst returns [String value] :  ('i'   | 'me'  | 'my'   | 'mine') {$value = "s1";};
ssecond returns [String value] : ('you' | 'your'| 'yours'| 'yourself') {$value = "s2";};
sthird returns [String value] :  ('he'  | 'she' | 'it'   | 'his' | 'hers' | 'its' | 'him' | 'her' | 'himself' | 'herself') {$value = "s3";};
pfirst returns [String value] :  ('we'  | 'us'  | 'our'  | 'ours') {$value = "p1";};
psecond returns [String value] : ('yourselves') {$value = "p2";};
pthird returns [String value] :  ('they'| 'them'| 'their'| 'theirs' | 'themselves') {$value = "p3";};

WORDS : LETTER*;// {$channel=HIDDEN;}; 
SPACE : (' ')?;
fragment LETTER :  ('a'..'z' | 'A'..'Z');

这里,我在 java 测试类上有什么,

import java.util.Scanner;
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import java.util.List;

public class test2 {
    public static void main(String[] args) throws RecognitionException {
        String s;
        Scanner input = new Scanner(System.in);
        System.out.println("Eter a Sentence: ");
        s=input.nextLine().toLowerCase();
        ANTLRStringStream in = new ANTLRStringStream(s);
        pfinderLexer lexer = new pfinderLexer(in);
        TokenStream tokenStream = new CommonTokenStream(lexer);
        pfinderParser parser = new pfinderParser(tokenStream); 
        parser.pronoun(); 
    }
}

我需要在测试文件中放入什么,以便它显示句子中的所有代词及其各自的值(s1,s2,...)?

EDITED according to WayneH's grammar

Here's what i have in my grammar file.

grammar pfinder;

options {
  language = Java;
}
sentence
    : ((words | pronoun) SPACE)* ((words | pronoun) ('.' | '?'))
    ;

words 
    :   WORDS {System.out.println($text);};

pronoun returns [String value] 
    : sfirst {$value = $sfirst.value; System.out.println($sfirst.text + '(' + $sfirst.value + ')');}
    | ssecond {$value = $ssecond.value; System.out.println($ssecond.text + '(' + $ssecond.value + ')');}
    | sthird {$value = $sthird.value; System.out.println($sthird.text + '(' + $sthird.value + ')');}
    | pfirst {$value = $pfirst.value; System.out.println($pfirst.text + '(' + $pfirst.value + ')');}
    | psecond {$value = $psecond.value; System.out.println($psecond.text + '(' + $psecond.value + ')');}
    | pthird{$value = $pthird.value; System.out.println($pthird.text + '(' + $pthird.value + ')');};

sfirst returns [String value] :  ('i'   | 'me'  | 'my'   | 'mine') {$value = "s1";};
ssecond returns [String value] : ('you' | 'your'| 'yours'| 'yourself') {$value = "s2";};
sthird returns [String value] :  ('he'  | 'she' | 'it'   | 'his' | 'hers' | 'its' | 'him' | 'her' | 'himself' | 'herself') {$value = "s3";};
pfirst returns [String value] :  ('we'  | 'us'  | 'our'  | 'ours') {$value = "p1";};
psecond returns [String value] : ('yourselves') {$value = "p2";};
pthird returns [String value] :  ('they'| 'them'| 'their'| 'theirs' | 'themselves') {$value = "p3";};

WORDS : LETTER*;// {$channel=HIDDEN;}; 
SPACE : (' ')?;
fragment LETTER :  ('a'..'z' | 'A'..'Z');

and here,s what i have on a java test class

import java.util.Scanner;
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import java.util.List;

public class test2 {
    public static void main(String[] args) throws RecognitionException {
        String s;
        Scanner input = new Scanner(System.in);
        System.out.println("Eter a Sentence: ");
        s=input.nextLine().toLowerCase();
        ANTLRStringStream in = new ANTLRStringStream(s);
        pfinderLexer lexer = new pfinderLexer(in);
        TokenStream tokenStream = new CommonTokenStream(lexer);
        pfinderParser parser = new pfinderParser(tokenStream); 
        parser.pronoun(); 
    }
}

what do I need to put in the test file so that the it will display all the pronouns in a sentence and their respective values(s1,s2,...)?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

鹿港巷口少年归 2024-08-30 03:29:59

如果您尝试对口语/书面语言进行某种高级分析,您可能会考虑使用某种自然语言处理工具。例如, TagHelper Tools 将告诉您哪些元素是代词(和动词) 、名词、副词和其他深奥的语法结构)。 (THT 是我熟悉的唯一此类工具,因此不要将其视为对卓越的特定认可)。

In case you are trying to do some sort of high-level analysis of spoken/written language, you might consider using some sort of natural language processing tool. For example, TagHelper Tools will tell you which elements are pronouns (and verbs, and nouns, and adverbs, and other esoteric grammatical constructs). (THT is the only tool of that sort that I'm familiar with, so don't take that as a particular endorsement of awesomeness).

茶花眉 2024-08-30 03:29:59

片段不会创建标记,将它们放入解析器规则中不会给出理想的结果。

在我的测试盒上,这产生了(我认为!)期望的结果:

program :
        PRONOUN+
    ;

PRONOUN :
        'i'   | 'me'  | 'my'   | 'mine'
    |   'you' | 'your'| 'yours'| 'yourself'
    |   'he'  | 'she' | 'it'   | 'his' | 'hers' | 'its' | 'him' | 'her' | 'himself' | 'herself'
    |   'we'  | 'us'  | 'our'  | 'ours'
    |   'yourselves'
    |   'they'| 'them'| 'their'| 'theirs' | 'themselves'
    ;

WS  :   ' ' { $channel = HIDDEN; };

WORD    :   ('A'..'Z'|'a'..'z')+ { $channel = HIDDEN; };

在 Antlrworks 中,示例“我踢了你”返回了树结构:program ->; [我,你]

我觉得有必要指出,Antlr 从句子中剥离代词的做法有些过分了。考虑使用正则表达式。该语法不区分大小写。扩展 WORD 来消耗除代词词典(例如标点符号等)之外的所有内容可能有点乏味。需要对输入进行清理。

--- 编辑:响应第二个OP:

  • 我更改了原始语法以方便解析。新语法是:

    语法查找器;
    
    选项 {
        回溯=真;
        输出 = AST;
    }
    
    代币{
        程序;
    }
    
    程序 :
            (单词* p+=代词+单词*)*
            -> ^(程序$p*)
        ;
    
    
    代词:
            '我' | '我' | '我的' | '矿'
        | '你' | '你的'| '你的'| '你自己'
        | '他' | '她' | '它' | '他的' | '她的' | '它' | '他' | '她'| '他自己'| ‘她自己’
        | '我们' | '我们' | '我们的' | '我们的' | ‘你们自己’
        | ‘他们’| ‘他们’| '他们的'| '他们的' | ‘他们自己’
    ;
    
    WS : ' ' { $channel = 隐藏; };
    
    字:('A'..'Z'|'a'..'z')+;
    

我将解释这些变化:

  • 现在需要回溯来解决解析器规则程序。也许有更好的方法来编写它,不需要回溯,但这是我首先想到的事情。
  • 定义了一个虚构标记程序来对我们的代词进行分组。
  • 每个匹配的程序都被添加到 Antlr var $p 中,并根据虚数规则在 AST 中重写。
  • 解释器代码现在可以使用 CommonTree 来收集匹配的代词
  • 以下内容是用 C# 编写的(我不懂 Java),但我编写它的目的是让您能够阅读和理解它。

    静态对象[] ReadTokens( string text )
    {
        ArrayList 结果 = new ArrayList();
        pfinderLexer Lexer = new pfinderLexer(new Antlr.Runtime.ANTLRStringStream(text));
        pfinderParser 解析器 = new pfinderParser(new CommonTokenStream(Lexer));
        //语法树是虚构的标记{PROGRAM},
        // 它的子代词是 $p 在语法中收集的代词。
        CommonTree语法Tree = Parser.program().Tree as CommonTree;
        if (syntaxTree == null) 返回 null;
        foreach(syntaxTree.Children 中的对象代词)
        {
            结果.Add(代词.ToString());
        }
        返回结果。ToArray();
    }
    
  • 调用 ReadTokens("iickedyouandthem") 返回数组 ["i", "you", "them"]

fragments don't create tokens, and placing them in parser rules will not give desirable results.

On my test box, this produced (I think!) the desired result:

program :
        PRONOUN+
    ;

PRONOUN :
        'i'   | 'me'  | 'my'   | 'mine'
    |   'you' | 'your'| 'yours'| 'yourself'
    |   'he'  | 'she' | 'it'   | 'his' | 'hers' | 'its' | 'him' | 'her' | 'himself' | 'herself'
    |   'we'  | 'us'  | 'our'  | 'ours'
    |   'yourselves'
    |   'they'| 'them'| 'their'| 'theirs' | 'themselves'
    ;

WS  :   ' ' { $channel = HIDDEN; };

WORD    :   ('A'..'Z'|'a'..'z')+ { $channel = HIDDEN; };

In Antlrworks, a sample "i kicked you" returned the tree structure: program -> [i, you].

I feel compelled to point out that Antlr is overkill for stripping the pronouns out of a sentence. Consider using a regular expression. This grammar is not case insensitive. Expanding WORD to consume everything except your dictionary of PRONOUNs (such as puncuation, etc) may be a bit tedious. Will require sanitization of input.

--- Edit: In response to the second OP:

  • I have altered the original grammar to make ease of parsing. The new grammar is:

    grammar pfinder;
    
    options {
        backtrack=true;
        output = AST;
    }
    
    tokens {
        PROGRAM;
    }
    
    program :
            (WORD* p+=PRONOUN+ WORD*)*
            -> ^(PROGRAM $p*)
        ;
    
    
    PRONOUN :
            'i'   | 'me'  | 'my'   | 'mine'
        |   'you' | 'your'| 'yours'| 'yourself'
        |   'he'  | 'she' | 'it'   | 'his' | 'hers' | 'its' | 'him' | 'her' | 'himself' | 'herself'
        |   'we'  | 'us'  | 'our'  | 'ours' | 'yourselves'
        |   'they'| 'them'| 'their'| 'theirs' | 'themselves'
    ;
    
    WS  :   ' ' { $channel = HIDDEN; };
    
    WORD    :   ('A'..'Z'|'a'..'z')+;
    

I'll explain the changes:

  • Backtracking is now required to solve the parser rule program. Perhaps there's a better way to write it which doesn't require backtracking but this is the first thing that popped in to my mind.
  • An imaginary token PROGRAM has been defined to group our pronouns.
  • Each matched program is added to Antlr var $p and is rewritten in AST under the imaginary rule.
  • The interpreter code may now use a CommonTree to collect matched pronouns
  • The following is written in C# (I don't know Java) but I wrote it with the intent that you'll be able to read and understand it.

    static object[] ReadTokens( string text )
    {
        ArrayList results = new ArrayList();
        pfinderLexer Lexer = new pfinderLexer(new Antlr.Runtime.ANTLRStringStream(text));
        pfinderParser Parser = new pfinderParser(new CommonTokenStream(Lexer));
        // syntaxTree is imaginary token {PROGRAM},
        // its children are the pronouns collected by $p in grammar.
        CommonTree syntaxTree = Parser.program().Tree as CommonTree;
        if ( syntaxTree == null ) return null;
        foreach ( object pronoun in syntaxTree.Children )
        {
            results.Add(pronoun.ToString());
        }
        return results.ToArray();
    }
    
  • Calling ReadTokens("i kicked you and them") returns array ["i", "you", "them"]

水染的天色ゝ 2024-08-30 03:29:59

我认为您需要了解有关 ANTLR 中的词法分析器规则的更多信息,词法分析器规则以大写字母开头,并为解析器将查看的流生成标记。词法分析器片段规则不会为流生成标记,但会帮助其他词法分析器规则生成标记,请查看词法分析器规则 WORDS 和 LETTER(LETTER 不是标记,但确实帮助 WORDS 创建标记)。

现在,当将文本文字放入解析器规则(规则名称将以小写字母开头)时,该文本文字也是词法分析器将识别和传递的有效标记(至少当您使用 ANTLR 时 - 我没有使用任何其他类似于 ANTLR 的工具来回答它们)。

我注意到的下一件事是你的“s”和“代词”规则似乎是同一件事。我注释掉了's'规则并将所有内容放入'代词'规则中

然后最后一件事是学习如何将动作放入语法中,你在's'规则中设置了返回值。我使代词规则返回一个字符串值,这样如果您想要“句子”规则中的操作,您将能够轻松完成“-i 代词”评论/答案。

现在,由于我不知道您的确切结果是什么,所以我与您的语法进行了一些细微的修改并重新组织(将我认为是解析器规则的内容移至顶部,并将所有词法分析器规则保留在底部)并添加了一些操作我想会告诉你你需要什么。另外,可能有几种不同的方法来完成此任务,我认为我的解决方案对于您可能想要的任何结果来说都不是完美的,但这是我能够在 ANTLRWorks 中使用的语法:

grammar pfinder;

options {
  language = Java;
}
sentence
    : ((words | pronoun) SPACE)* ((words | pronoun) ('.' | '?'))
    ;

words 
    :   WORDS {System.out.println($text);};

pronoun returns [String value] 
    : sfirst {$value = $sfirst.value; System.out.println($sfirst.text + '(' + $sfirst.value + ')');}
    | ssecond {$value = $ssecond.value; System.out.println($ssecond.text + '(' + $ssecond.value + ')');}
    | sthird {$value = $sthird.value; System.out.println($sthird.text + '(' + $sthird.value + ')');}
    | pfirst {$value = $pfirst.value; System.out.println($pfirst.text + '(' + $pfirst.value + ')');}
    | psecond {$value = $psecond.value; System.out.println($psecond.text + '(' + $psecond.value + ')');}
    | pthird{$value = $pthird.value; System.out.println($pthird.text + '(' + $pthird.value + ')');};

//s returns [String value]
//    :  exp=sfirst  {$value = "s1";}
//    |  exp=ssecond {$value = "s2";}
//    |  exp=sthird  {$value = "s3";}
//    |  exp=pfirst  {$value = "p1";}
//    |  exp=psecond {$value = "p2";}
//    |  exp=pthird  {$value = "p3";}
//    ;

sfirst returns [String value] :  ('i'   | 'me'  | 'my'   | 'mine') {$value = "s1";};
ssecond returns [String value] : ('you' | 'your'| 'yours'| 'yourself') {$value = "s2";};
sthird returns [String value] :  ('he'  | 'she' | 'it'   | 'his' | 'hers' | 'its' | 'him' | 'her' | 'himself' | 'herself') {$value = "s3";};
pfirst returns [String value] :  ('we'  | 'us'  | 'our'  | 'ours') {$value = "p1";};
psecond returns [String value] : ('yourselves') {$value = "p2";};
pthird returns [String value] :  ('they'| 'them'| 'their'| 'theirs' | 'themselves') {$value = "p3";};

WORDS : LETTER*;// {$channel=HIDDEN;}; 
SPACE : (' ')?;
fragment LETTER :  ('a'..'z' | 'A'..'Z');

我认为最终结果是这样的语法将向您展示如何完成您正在尝试做的事情,并且无论最终结果是什么都需要修改。

祝你好运。

我认为你只需要在测试类中更改一行,
parser.代词();
到:
parser.sentence();

您可能还想更改语法中的其他一些内容:
空间 : ' ';
句子:(单词 | 代词)(空格(单词 | 代词))* ('.' | '?'); // 那么你可能想在句子和单词/代词之间放置一条规则。

I think you need to learn more about lexer rules within ANTLR, lexer rules start with uppercase letter and generate tokens for the stream the parser will look at. Lexer fragment rules will not generate a token for the stream but will help other lexer rules generate tokens, look at lexer rules WORDS and LETTER (LETTER is not a token but does help WORDS create a token).

Now, when a text literal is put into a parser rule (rule name will start with a lowercase letter) that text literal is also a valid token that the lexer will identify and pass (at least when you use ANTLR - I have not used any other tools similar to ANTLR to answer for them).

The next thing I was noticing is that your 's' and 'pronoun' rules appear to be the same thing. I commented out the 's' rule and put everything into the 'pronoun' rule

And then the last thing is to learn how to put actions into the grammer, you have some in the 's' rule setting the return value. I made the pronoun rule return a string value so that if you wanted the actions in your 'sentence' rule you would easily be able to accomplish your "-i pronoun" comment/answer.

Now since I do not know what your exact results are, I played with your grammer and made some slight modifications and reorganized (moving what I thought were parser rules to the top with keep all lexer rules at the bottom) and put in some actions that I think will show you what you need. Also, there could be several different ways to accomplish this and I don't think my solution is perfect for any of your possible wanted results, but here is a grammer I was able to get working in ANTLRWorks:

grammar pfinder;

options {
  language = Java;
}
sentence
    : ((words | pronoun) SPACE)* ((words | pronoun) ('.' | '?'))
    ;

words 
    :   WORDS {System.out.println($text);};

pronoun returns [String value] 
    : sfirst {$value = $sfirst.value; System.out.println($sfirst.text + '(' + $sfirst.value + ')');}
    | ssecond {$value = $ssecond.value; System.out.println($ssecond.text + '(' + $ssecond.value + ')');}
    | sthird {$value = $sthird.value; System.out.println($sthird.text + '(' + $sthird.value + ')');}
    | pfirst {$value = $pfirst.value; System.out.println($pfirst.text + '(' + $pfirst.value + ')');}
    | psecond {$value = $psecond.value; System.out.println($psecond.text + '(' + $psecond.value + ')');}
    | pthird{$value = $pthird.value; System.out.println($pthird.text + '(' + $pthird.value + ')');};

//s returns [String value]
//    :  exp=sfirst  {$value = "s1";}
//    |  exp=ssecond {$value = "s2";}
//    |  exp=sthird  {$value = "s3";}
//    |  exp=pfirst  {$value = "p1";}
//    |  exp=psecond {$value = "p2";}
//    |  exp=pthird  {$value = "p3";}
//    ;

sfirst returns [String value] :  ('i'   | 'me'  | 'my'   | 'mine') {$value = "s1";};
ssecond returns [String value] : ('you' | 'your'| 'yours'| 'yourself') {$value = "s2";};
sthird returns [String value] :  ('he'  | 'she' | 'it'   | 'his' | 'hers' | 'its' | 'him' | 'her' | 'himself' | 'herself') {$value = "s3";};
pfirst returns [String value] :  ('we'  | 'us'  | 'our'  | 'ours') {$value = "p1";};
psecond returns [String value] : ('yourselves') {$value = "p2";};
pthird returns [String value] :  ('they'| 'them'| 'their'| 'theirs' | 'themselves') {$value = "p3";};

WORDS : LETTER*;// {$channel=HIDDEN;}; 
SPACE : (' ')?;
fragment LETTER :  ('a'..'z' | 'A'..'Z');

I think the end result is this grammer will show you how to accomplish what you are trying to do and will require modification no matter what that end result is.

Good luck.

I think you only have to change one line in your test class,
parser.pronoun();
to:
parser.sentence();

You might want to change a few other things in the grammer as well:
SPACE : ' ';
sentence: (words | pronoun) (SPACE (words | pronoun))* ('.' | '?'); // then you might want to put a rule between sentence and words/pronoun.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文