使 AST 节点成为递归规则的最低后代

发布于 2024-09-29 16:11:12 字数 1582 浏览 2 评论 0原文

我正在尝试制定一个解析器规则,该规则允许在第二条规则之前有零个或多个标记,并且每个连续的标记(属于闭包的一部分)在 AST 中是前一个标记的子级,第二条规则也是最后一个符号的子项。

通过示例更容易解释...

expression11 : ((NOT | COMPLEMENT)^)* expression12;

例如,给定上述解析器规则,如果我有表达式 !!x (其中 x 是 ID),我希望在我的 AST 中,x 成为第二个 bang 的子级运算符是第一个运算符的子运算符。

期望:

!
  \ child
    !
      \ child
       x

上面的行生成了一个 AST,其中第二个 bang 操作符是第一个操作符的子级,但 x 是第一个 bang 操作符的子级(第二个操作符的同级),而不是我期望的行为。显然这不是我想要的一元运算符。

遇到的行为:

        !
child /   \ child
    x -sib- !

如果我添加第三个运算符(如“!!!x”中),则第三个运算符将成为第二个运算符的子级,如预期的那样,并且 x 仍然是第一个运算符的子级,即第二个运算符的兄弟级。

我想也许我可以通过用括号包围整个运算符部分并添加另一个插入符来解决这个问题,例如

expression11 : (((NOT | COMPLEMENT)^)*)^ expression12;

试图强制 expression12 成为整个运算符闭包的子级,徒劳地希望这会被解释为“整个闭包的孩子意味着最下层的孩子”,但事实并非如此,这样做根本没有改变行为。

我的问题是“如何让解析器以这样的方式处理规则,使得 expression12 的结果成为降序最高的‘NOT’或‘COMPLMENT’节点的子节点,而不是最高祖先节点的子节点?”

我本以为这很简单,但我无法从 antlr.org 上的 Antlr 资源或向 Google 恳求中弄清楚这一点。它必须一直完成,或者是否有一种不同的方式来完全构建我忽略的规则?

以下是为了完整性而制定的规则。它们还没有完成,将会被修改,但它们已经完成并且可以进行测试,一切都很好 - 正如预期的那样,因为它们很简单。 12 用于数组长度和方法调用,13 用于新类和数组,14 用于数组索引,15 用于终结符/括号。

expression12 : expression13 (DOT (LENGTH | (ID LPAREN (expression (COMMA expression)*)? RPAREN)))?;
expression13 : expression14 | (NEW^ ((ID LPAREN RPAREN) | (INTTYPE LSQBRACK expression RSQBRACK)));
expression14 : expression15 (LSQBRACK expression RSQBRACK)*;
expression15 : (LPAREN expression RPAREN) | INTLIT | TRUE | FALSE | ID | THIS;

感谢任何能够提供帮助的人;非常感谢您的宝贵时间。

I am trying to make a parser rule which allows for zero or more of a token before a second rule and for which each successive token - of those which were part of the closure - is, in the AST, a child of the previous token, and the second rule is also a child of the last symbol.

easier to explain by example...

expression11 : ((NOT | COMPLEMENT)^)* expression12;

For example, given the above parser rule, if I have the expression !!x (where x is an ID), I want, in my AST, the x to be the child of the second bang operator which is the child of the first.

Desired:

!
  \ child
    !
      \ child
       x

Instead of my desired behavior, the above line produces an AST for which the second bang operator is a child of the first, but the x is a child of the first bang operator, a sibling of the second one. Obviously not what I want for a unary operator.

Encountered behavior:

        !
child /   \ child
    x -sib- !

If I add a third operator (as in "!!!x") the third one becomes a child of the second, as expected, and x remains a child of the first, sibling of the second.

I thought perhaps I could fix this by surrounding the entire operator part with parenthesis and adding another caret, such as

expression11 : (((NOT | COMPLEMENT)^)*)^ expression12;

in an effort to force expression12 to be a child of the entire closure of operators, hoping in vain that this would be interpreted as "The child of the entire closure means the child of the most-descended," but that was not the case and doing this did not change the behavior at all.

My question is "How do I get the parser to process the rule in such a way that the result of expression12 becomes the child of the most-descended 'NOT' or 'COMPLEMENT' node instead of the highest ancestor one?"

I would have thought this would be simple, but I cannot figure it out from the Antlr resources on antlr.org nor by pleading with Google. It must be done all the time, or is there a different way to structure the rule entirely which I am overlooking?

Here are the following rules for completeness. They are not finished yet and will be modified, but they are complete and working for testing and all is well with them - as expected since they are straightforward. 12 is for array length and method calls, 13 is for new classes and arrays, 14 for array indexing, and 15 for terminals/parenthesis.

expression12 : expression13 (DOT (LENGTH | (ID LPAREN (expression (COMMA expression)*)? RPAREN)))?;
expression13 : expression14 | (NEW^ ((ID LPAREN RPAREN) | (INTTYPE LSQBRACK expression RSQBRACK)));
expression14 : expression15 (LSQBRACK expression RSQBRACK)*;
expression15 : (LPAREN expression RPAREN) | INTLIT | TRUE | FALSE | ID | THIS;

Thank you to anyone who can provide assistance; your time is much appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

债姬 2024-10-06 16:11:12

如果您不希望运算符显示为同级,则不得使用 Kleene 星号。尝试类似的东西(未经测试)

expression11 : (NOT | COMPLEMENT)^ expression11
             | expression12;

You must not use the Kleene star if you don't want operators to appear as siblings. Try something like (untested)

expression11 : (NOT | COMPLEMENT)^ expression11
             | expression12;
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文