如何避免使用 ANTLR3 构建中间和无用的 AST 节点？

发布于 2024-11-07 14:00:15 字数 759 浏览 4 评论 0原文

我编写了一个 ANTLR3 语法，该语法细分为更小的规则以提高可读性。例如：

messageSequenceChart:
  'msc' mscHead bmsc 'endmsc' end
;

# Where mscHead is a shortcut to :
mscHead:
  mscName mscParameterDecl? timeOffset? end
  mscInstInterface? mscGateInterface
;

我知道内置的 ANTLR AST 构建功能允许用户声明不会出现在最终 AST 中的中间 AST 节点。但是如果您手动构建 AST 会怎样？

messageSequenceChart returns [msc::MessageSequenceChart* n = 0]:
  'msc' mscHead bmsc'endmsc' end
  {
    $n = new msc::MessageSequenceChart(/* mscHead subrules accessors like $mscHead.mscName.n ? */
                                       $bmsc.n);
  }
;

mscHead:
  mscName mscParameterDecl? timeOffset? end
;

文档中没有谈论这样的事情。因此，看起来我必须为每个中间规则创建节点才能访问其子规则结果。

有谁知道更好的解决方案？

谢谢。

原文

I wrote an ANTLR3 grammar subdivided into smaller rules to increase readability.
For example:

messageSequenceChart:
  'msc' mscHead bmsc 'endmsc' end
;

# Where mscHead is a shortcut to :
mscHead:
  mscName mscParameterDecl? timeOffset? end
  mscInstInterface? mscGateInterface
;

I know the built-in ANTLR AST building feature allows the user to declare intermediate AST nodes that won't be in the final AST. But what if you build the AST by hand?

messageSequenceChart returns [msc::MessageSequenceChart* n = 0]:
  'msc' mscHead bmsc'endmsc' end
  {
    $n = new msc::MessageSequenceChart(/* mscHead subrules accessors like $mscHead.mscName.n ? */
                                       $bmsc.n);
  }
;

mscHead:
  mscName mscParameterDecl? timeOffset? end
;

The documentation does not talk about such a thing. So it looks like I will have to create nodes for every intermediate rules to be able to access their subrules result.

Does anyone know a better solution ?

Thank you.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

因为看清所以看轻 2024-11-14 14:00:15

您可以通过让子规则返回多个值并仅访问您感兴趣的值来解决此问题。

以下演示演示了如何执行此操作。虽然它不是用 C 语言编写的，但我相信您能够调整它以满足您的需求：

grammar Test;

parse
  :  sub EOF {System.out.printf("second=\%s\n", $sub.second);}
  ;

sub returns [String first, String second, String third]
  :  a=INT b=INT c=INT
     {
       $first = $a.text;
       $second = $b.text;
       $third = $c.text;
     }
  ;

INT
  :  '0'..'9'+
  ;

SPACE
  :  ' ' {$channel=HIDDEN;}
  ;

如果您使用生成的解析器解析输入 "12 34 56"，< code>second=34 被打印到控制台，正如您在运行后所看到的：

import org.antlr.runtime.*;

public class Main {
  public static void main(String[] args) throws Exception {
    TestLexer lex = new TestLexer(new ANTLRStringStream("12 34 56"));
    TokenStream tokens = new TokenRewriteStream(lex);
    TestParser parser = new TestParser(tokens);
    parser.parse();
  }
}

因此，来自 parse 规则的快捷方式，如 $sub.INT，或 $sub.$a 到不幸的是，访问三个 INT 令牌之一，不可能。

You can solve this by letting your sub-rule(s) return multiple values and accessing only those you're interested in.

The following demo shows how to do it. Although it is not in C, I am confident that you'll be able to adjust it so that it fits your needs:

grammar Test;

parse
  :  sub EOF {System.out.printf("second=\%s\n", $sub.second);}
  ;

sub returns [String first, String second, String third]
  :  a=INT b=INT c=INT
     {
       $first = $a.text;
       $second = $b.text;
       $third = $c.text;
     }
  ;

INT
  :  '0'..'9'+
  ;

SPACE
  :  ' ' {$channel=HIDDEN;}
  ;

And if your parse the input "12 34 56" with the generated parser, second=34 is printed to the console, as you can see after running:

import org.antlr.runtime.*;

public class Main {
  public static void main(String[] args) throws Exception {
    TestLexer lex = new TestLexer(new ANTLRStringStream("12 34 56"));
    TokenStream tokens = new TokenRewriteStream(lex);
    TestParser parser = new TestParser(tokens);
    parser.parse();
  }
}

So, a shortcut from the parse rule like $sub.INT, or $sub.$a to access one of the three INT tokens, in not possible, unfortunately.

回复收藏 0 原文

~没有更多了~