CoCo 到 ANTLR 转换器中的表达式

发布于 2024-10-19 19:45:51 字数 1110 浏览 2 评论 0原文

我正在一个实用程序中解析 CoCo/R 语法以自动化 CoCo -> ANTLR 翻译。核心 ANTLR 语法是:

rule '=' expression '.' ;

expression
     : term ('|' term)*
         -> ^( OR_EXPR term term* )
     ;
term
     : (factor (factor)*)? ;

factor
     : symbol
     | '(' expression ')'
         -> ^( GROUPED_EXPR expression )
     | '[' expression']'
         -> ^( OPTIONAL_EXPR expression)
     | '{' expression '}'
         -> ^( SEQUENCE_EXPR expression)
     ;

symbol
     : IF_ACTION
     | ID (ATTRIBUTES)?
     | STRINGLITERAL
     ;

我的问题是这样的结构:

CS = { ExternAliasDirective }
         { UsingDirective }
         EOF .

CS 结果是带有 OR_EXPR 节点的 AST,尽管没有“|”特点 实际上出现了。我确信这是由于定义 表达式,但我看不到任何其他方式来编写规则。

我确实对此进行了实验以解决歧义。

// explicitly test for the presence of an '|' character
expression
@init { bool ored = false; }
     : term {ored = (input.LT(1).Type == OR); } (OR term)*
         ->  {ored}? ^(OR_EXPR term term*)
         ->            ^(LIST term term*)

它确实有效,但这次黑客攻击让我更加坚信某些根本性的问题是错误的。

非常感谢任何提示。

I'm parsing CoCo/R grammars in a utility to automate CoCo -> ANTLR translation. The core ANTLR grammar is:

rule '=' expression '.' ;

expression
     : term ('|' term)*
         -> ^( OR_EXPR term term* )
     ;
term
     : (factor (factor)*)? ;

factor
     : symbol
     | '(' expression ')'
         -> ^( GROUPED_EXPR expression )
     | '[' expression']'
         -> ^( OPTIONAL_EXPR expression)
     | '{' expression '}'
         -> ^( SEQUENCE_EXPR expression)
     ;

symbol
     : IF_ACTION
     | ID (ATTRIBUTES)?
     | STRINGLITERAL
     ;

My problem is with constructions such as these:

CS = { ExternAliasDirective }
         { UsingDirective }
         EOF .

CS results in an AST with a OR_EXPR node although no '|' character
actually appears. I'm sure this is due to the definition of
expression but I cannot see any other way to write the rules.

I did experiment with this to resolve the ambiguity.

// explicitly test for the presence of an '|' character
expression
@init { bool ored = false; }
     : term {ored = (input.LT(1).Type == OR); } (OR term)*
         ->  {ored}? ^(OR_EXPR term term*)
         ->            ^(LIST term term*)

It works but the hack reinforces my conviction that something fundamental is wrong.

Any tips much appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

卷耳 2024-10-26 19:45:51

您的规则:

expression
  : term ('|' term)*
      -> ^( OR_EXPR term term* )
  ;

始终导致重写规则创建根类型为 OR_EXPR 的树。您可以像这样创建“子重写规则”:

expression
  :  (term -> REWRITE_RULE_X) ('|' term -> ^(REWRITE_RULE_Y))*
  ;

为了解决语法中的歧义,最简单的方法是启用全局回溯,这可以在语法的 options { ... } 部分中完成。

一个快速演示:

grammar CocoR;

options {
  output=AST;
  backtrack=true;
}

tokens {
  RULE;
  GROUP;
  SEQUENCE;
  OPTIONAL;
  OR;
  ATOMS;
}

parse
  :  rule EOF -> rule
  ;

rule
  :  ID '=' expr* '.' -> ^(RULE ID expr*)
  ;

expr
  :  (a=atoms -> $a) ('|' b=atoms -> ^(OR $expr $b))*
  ;

atoms
  :  atom+ -> ^(ATOMS atom+)
  ;

atom
  :  ID
  |  '(' expr ')' -> ^(GROUP expr)
  |  '{' expr '}' -> ^(SEQUENCE expr)
  |  '[' expr ']' -> ^(OPTIONAL expr)
  ;

ID
  :  ('a'..'z' | 'A'..'Z') ('a'..'z' | 'A'..'Z' | '0'..'9')*
  ;

Space
  :  (' ' | '\t' | '\r' | '\n') {skip();}
  ;

使用输入:

CS = { ExternAliasDirective }
     { UsingDirective }
     EOF .

生成 AST:

在此处输入图像描述

,输入:

foo = a | b ({c} | d [e f]) .

生成:

在此处输入图像描述

测试此类:

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;

public class Main {
    public static void main(String[] args) throws Exception {
        /*
        String source = 
                "CS = { ExternAliasDirective } \n" +
                "{ UsingDirective }            \n" + 
                "EOF .                           ";
        */
        String source = "foo = a | b ({c} | d [e f]) .";
        ANTLRStringStream in = new ANTLRStringStream(source);
        CocoRLexer lexer = new CocoRLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        CocoRParser parser = new CocoRParser(tokens);
        CocoRParser.parse_return returnValue = parser.parse();
        CommonTree tree = (CommonTree)returnValue.getTree();
        DOTTreeGenerator gen = new DOTTreeGenerator();
        StringTemplate st = gen.toDOT(tree);
        System.out.println(st);
    }
}

并使用该类生成的输出,我使用以下网站创建 AST 图像:< a href="http://graph.gafol.net/" rel="nofollow noreferrer">http://graph.gafol.net/

HTH


编辑

要考虑 OR 表达式,您可以尝试这样的操作(快速测试!):

expr
  :  (a=atoms -> $a) ( ( '|' b=atoms -> ^(OR $expr $b)
                       | '|'         -> ^(OR $expr NOTHING)
                       )
                     )*
  ;

它将源:解析

foo = a | b | .

为以下 AST:

在此处输入图像描述

Your rule:

expression
  : term ('|' term)*
      -> ^( OR_EXPR term term* )
  ;

always causes the rewrite rule to create a tree with a root of type OR_EXPR. You can create "sub rewrite rules" like this:

expression
  :  (term -> REWRITE_RULE_X) ('|' term -> ^(REWRITE_RULE_Y))*
  ;

And to resolve the ambiguity in your grammar, it's easiest to enable global backtracking which can be done in the options { ... } section of your grammar.

A quick demo:

grammar CocoR;

options {
  output=AST;
  backtrack=true;
}

tokens {
  RULE;
  GROUP;
  SEQUENCE;
  OPTIONAL;
  OR;
  ATOMS;
}

parse
  :  rule EOF -> rule
  ;

rule
  :  ID '=' expr* '.' -> ^(RULE ID expr*)
  ;

expr
  :  (a=atoms -> $a) ('|' b=atoms -> ^(OR $expr $b))*
  ;

atoms
  :  atom+ -> ^(ATOMS atom+)
  ;

atom
  :  ID
  |  '(' expr ')' -> ^(GROUP expr)
  |  '{' expr '}' -> ^(SEQUENCE expr)
  |  '[' expr ']' -> ^(OPTIONAL expr)
  ;

ID
  :  ('a'..'z' | 'A'..'Z') ('a'..'z' | 'A'..'Z' | '0'..'9')*
  ;

Space
  :  (' ' | '\t' | '\r' | '\n') {skip();}
  ;

with input:

CS = { ExternAliasDirective }
     { UsingDirective }
     EOF .

produces the AST:

enter image description here

and the input:

foo = a | b ({c} | d [e f]) .

produces:

enter image description here

The class to test this:

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;

public class Main {
    public static void main(String[] args) throws Exception {
        /*
        String source = 
                "CS = { ExternAliasDirective } \n" +
                "{ UsingDirective }            \n" + 
                "EOF .                           ";
        */
        String source = "foo = a | b ({c} | d [e f]) .";
        ANTLRStringStream in = new ANTLRStringStream(source);
        CocoRLexer lexer = new CocoRLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        CocoRParser parser = new CocoRParser(tokens);
        CocoRParser.parse_return returnValue = parser.parse();
        CommonTree tree = (CommonTree)returnValue.getTree();
        DOTTreeGenerator gen = new DOTTreeGenerator();
        StringTemplate st = gen.toDOT(tree);
        System.out.println(st);
    }
}

and with the output this class produces, I used the following website to create the AST-images: http://graph.gafol.net/

HTH


EDIT

To account for epsilon (empty string) in your OR expressions, you might try something (quickly tested!) like this:

expr
  :  (a=atoms -> $a) ( ( '|' b=atoms -> ^(OR $expr $b)
                       | '|'         -> ^(OR $expr NOTHING)
                       )
                     )*
  ;

which parses the source:

foo = a | b | .

into the following AST:

enter image description here

一张白纸 2024-10-26 19:45:51

expression 的产生式明确表示它只能返回一个 OR_EXPR 节点。您可以尝试以下操作:

expression
     : 
     term
     |
     term ('|' term)+
         -> ^( OR_EXPR term term* )
     ;

再往下,您可以使用:

term
     : factor*;

The production for expression explicitly says that it can only return an OR_EXPR node. You can try something like:

expression
     : 
     term
     |
     term ('|' term)+
         -> ^( OR_EXPR term term* )
     ;

Further down, you could use:

term
     : factor*;
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文