Antlr 左递归

发布于 2024-12-14 19:28:38 字数 3088 浏览 5 评论 0原文

我正在尝试将后缀、中缀和前缀规则从 EBNF 形式的 scala 转换为 ANTLR，但在 infixExpression 规则上看到与左递归相关的错误。

有问题的规则是：

public symbolOrID
:   ID
|   Symbol
;

public postfixExpression
:   infixExpression symbolOrID? -> ^(R__PostfixExpression infixExpression symbolOrID?)
;

public infixExpression
:   prefixExpression
|   infixExpression (symbolOrID infixExpression)? -> ^(R__InfixExpression infixExpression symbolOrID? infixExpression?)
;

public prefixExpression
:   prefixCharacter? simpleExpression -> ^(R__PrefixExpression prefixCharacter? simpleExpression)
;

public prefixCharacter
:   '-' | '+' | '~' | '!' | '#'
;

public simpleExpression
:   constant
;

如果我将 infixExpression 规则更改为：

public infixExpression
:   prefixExpression (symbolOrID infixExpression)? -> ^(R__InfixExpression prefixExpression symbolOrID? infixExpression?)
;

那么它会抱怨：

warning(200): Hydra.g3:108:26: Decision can match input such as "{ID, Symbol} {'!'..'#', '+', '-', '~'} String" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input
warning(200): Hydra.g3:108:26: Decision can match input such as "{ID, Symbol} {'!'..'#', '+', '-', '~'} Number" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input
warning(200): Hydra.g3:108:26: Decision can match input such as "{ID, Symbol} {'!'..'#', '+', '-', '~'} Boolean" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input
warning(200): Hydra.g3:108:26: Decision can match input such as "{ID, Symbol} {'!'..'#', '+', '-', '~'} Regex" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input
warning(200): Hydra.g3:108:26: Decision can match input such as "{ID, Symbol} {'!'..'#', '+', '-', '~'} Null" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input

最后，是否有一种方法可以有条件地在 AST 中创建节点，这样如果只有规则的左侧部分为 true，则它不会添加该级别？例如：

conditional_or_expression:
    conditional_and_expression  ('||' conditional_or_expression)?
;

假设我创建的语法遵循如下层次结构：

conditional_and_expression
  conditional_or_expression
    null_coalescing_expression

如果解析的表达式是 a || b，目前为该表达式创建的 AST 将是

conditional_and_expression
  conditional_or_expression

如何获取它，以便它只获取 conditional_or_expression 部分？

在JavaCC中，你可以只设置节点数量，例如：#ConditionalOrExpression(>1)

编辑：昨晚有点晚了，中缀表达式现在已经正确修改了！

最终编辑：我最终让它发挥作用的方式是以下规则：

public symbolOrID
:   ID
|   Symbol
;

public postfixExpression
:   infixExpression (symbolOrID^)?
;

public infixExpression
:   (prefixExpression symbolOrID)=> prefixExpression symbolOrID^ infixExpression
|   prefixExpression
;

public prefixExpression
:   prefixCharacter^ simpleExpression
|   simpleExpression
;

public prefixCharacter
:   '-' | '+' | '~' | '!' | '#'
;

public simpleExpression
:   constant
;

原文

I'm trying to convert the postfix, infix and prefix rules from scala in EBNF form to ANTLR but am seeing an error relating to left-recursion on the infixExpression rule.

The rules in question are:

public symbolOrID
:   ID
|   Symbol
;

public postfixExpression
:   infixExpression symbolOrID? -> ^(R__PostfixExpression infixExpression symbolOrID?)
;

public infixExpression
:   prefixExpression
|   infixExpression (symbolOrID infixExpression)? -> ^(R__InfixExpression infixExpression symbolOrID? infixExpression?)
;

public prefixExpression
:   prefixCharacter? simpleExpression -> ^(R__PrefixExpression prefixCharacter? simpleExpression)
;

public prefixCharacter
:   '-' | '+' | '~' | '!' | '#'
;

public simpleExpression
:   constant
;

If I change the infixExpression rule to:

public infixExpression
:   prefixExpression (symbolOrID infixExpression)? -> ^(R__InfixExpression prefixExpression symbolOrID? infixExpression?)
;

Then it instead complains:

warning(200): Hydra.g3:108:26: Decision can match input such as "{ID, Symbol} {'!'..'#', '+', '-', '~'} String" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input
warning(200): Hydra.g3:108:26: Decision can match input such as "{ID, Symbol} {'!'..'#', '+', '-', '~'} Number" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input
warning(200): Hydra.g3:108:26: Decision can match input such as "{ID, Symbol} {'!'..'#', '+', '-', '~'} Boolean" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input
warning(200): Hydra.g3:108:26: Decision can match input such as "{ID, Symbol} {'!'..'#', '+', '-', '~'} Regex" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input
warning(200): Hydra.g3:108:26: Decision can match input such as "{ID, Symbol} {'!'..'#', '+', '-', '~'} Null" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input

Lastly, is there a way to conditionally create the nodes in the AST so that if only the left part of the rule is true then it doesn't add that level in? E.g.:

conditional_or_expression:
    conditional_and_expression  ('||' conditional_or_expression)?
;

As in, lets say I create the grammar which follows a hierarchy like:

conditional_and_expression
  conditional_or_expression
    null_coalescing_expression

if the expresion that is parsed is a || b, currently the AST that is created is for this expression would be

conditional_and_expression
  conditional_or_expression

How could I get it so it just gets the conditional_or_expression part?

In JavaCC, you could just set the node arity, e.g.: #ConditionalOrExpression(>1)

EDIT: it was a bit late last night, infix expression is now propery modified!

Final edit: The way I got it to work in the end were the following rules:

public symbolOrID
:   ID
|   Symbol
;

public postfixExpression
:   infixExpression (symbolOrID^)?
;

public infixExpression
:   (prefixExpression symbolOrID)=> prefixExpression symbolOrID^ infixExpression
|   prefixExpression
;

public prefixExpression
:   prefixCharacter^ simpleExpression
|   simpleExpression
;

public prefixCharacter
:   '-' | '+' | '~' | '!' | '#'
;

public simpleExpression
:   constant
;

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

套路撩心 2024-12-21 19:28:38

Darkzaelus 写道：
我正在尝试将后缀、中缀和前缀规则从 EBNF 形式的 scala 转换为 ANTLR，但看到与左递归相关的错误

正如我所说在我的评论中：您发布的规则中没有左递归。

Darkzaelus 写道：
我怎样才能得到它，以便它只得到条件或表达式部分？

我假设您正在使用 ANTLRWorks 的解释器或调试器，在这种情况下，树：

conditional_and_expression
            \
  conditional_or_expression

仅像这样显示（显示解析树，而不是 AST）。如果您正确地将 orExpression 转换为 AST，则表达式 a || b 将变为：

  ||
 /  \
a    b

（即 || 作为根节点，a 和 b 作为子节点）

例如，采用以下语法：

grammar T;

options {
  output=AST;
}

parse
  :  expr EOF -> expr
  ;

expr
  :  or_expr
  ;

or_expr
  :  and_expr ('||'^ and_expr)*
  ;

and_expr
  :  add_expr ('&&'^ add_expr)*
  ;

add_expr
  :  atom (('+' | '-')^ atom)*
  ;

atom
  :  NUMBER
  |  '(' expr ')' -> expr
  ;

NUMBER : '0'..'9'+;

如果您现在使用从上述语法生成的解析器解析 12+34，ANTLRWorks（或 Eclipse ANTLR IDE）将显示以下解析树：

在此处输入图像描述

但这不是解析器创建的 AST。 AST 实际上看起来像：

在此处输入图像描述

（即 or_expr，< code>and_expr“层”不在那里）

Darkzaelus 写道：
不幸的是，对于该语言来说，这是一个相当关键但还处于早期阶段，因此我被迫保守语法的全部细节。

没问题，但您必须意识到，如果您隐瞒重要信息，人们将无法正确回答您的问题。您不需要发布整个语法，但如果您需要左递归方面的帮助，您必须发布实际导致您提到的错误的（部分）语法。如果我不能复制它，它就不存在！ :)

Darkzaelus wrote:
I'm trying to convert the postfix, infix and prefix rules from scala in EBNF form to ANTLR but am seeing an error relating to left-recursion

As I said in my comment: there's no left recursion in the rules you posted.

Darkzaelus wrote:
How could I get it so it just gets the conditional_or_expression part?

I'm assuming you're using ANTLRWorks' interpreter or debugger, in which case the tree:

conditional_and_expression
            \
  conditional_or_expression

is only being displayed like that (the parse tree is shown, not the AST). If you properly transform your orExpression into an AST, the expression a || b will become:

  ||
 /  \
a    b

(i.e. || as root, and a and b as child nodes)

For example, take the following grammar:

grammar T;

options {
  output=AST;
}

parse
  :  expr EOF -> expr
  ;

expr
  :  or_expr
  ;

or_expr
  :  and_expr ('||'^ and_expr)*
  ;

and_expr
  :  add_expr ('&&'^ add_expr)*
  ;

add_expr
  :  atom (('+' | '-')^ atom)*
  ;

atom
  :  NUMBER
  |  '(' expr ')' -> expr
  ;

NUMBER : '0'..'9'+;

If you now parse 12+34 with a parser generated from the grammar above, ANTLRWorks (or the Eclipse ANTLR IDE) will show the following parse tree:

enter image description here

but this is not the AST the parser creates. The AST actually looks like:

enter image description here

(i.e. the or_expr, and_expr "layers" are not in there)

Darkzaelus wrote:
Unfortunately, this is a fairly critical but early stage for the language, so I'm forced to keep full details of the grammar secret.

No problem, but you must realize that people can't answer your questions properly if you withhold crucial information. You don't need to post the entire grammar, but if you want help with the left-recursion, you must post a (partial) grammar that actually causes the error(s) you mention. If I can't reproduce it, it doesn't exist! :)

回复收藏 0 原文

森林很绿却致人迷途 2024-12-21 19:28:38

这个产生式：

infixExpr ::= PrefixExpr
            | InfixExpr id [nl] InfixExpr

可以重写为

infixExpr ::= PrefixExpr
            | PrefixExpr id [nl] InfixExpr

事实上，我敢打赌这只是语法中的一个错误。我们举个例子说明一下是可以的。让我们用第一个语法（部分地）减少一些东西，然后尝试第二个语法。

InfixExpr id [nl] InfixExpr                      
// Apply the second reduction to the first InfixExpr
InfixExpr id [nl] InfixExpr id [nl] InfixExpr
// Apply the first reduction to the (new) first InfixExpr
PrefixExpr id [nl] InfixExpr id [nl] InfixExpr
// Apply the first reduction to the new first InfixExpr
PrefixExpr id [nl] PrefixExpr id [nl] InfixExpr
// Apply the first reduction to the new first InfixExpr
PrefixExpr id [nl] PrefixExpr id [nl] PrefixExpr

让我们用第二个语法来简化它：

PrefixExpr id [nl] InfixExpr                      
// Apply the second reduction to the first InfixExpr
PrefixExpr id [nl] PrefixExpr id [nl] InfixExpr
// Apply the first reduction to the new first InfixExpr
PrefixExpr id [nl] PrefixExpr id [nl] PrefixExpr

如您所见，在两种情况下都以等效的 AST 结束。

This production:

infixExpr ::= PrefixExpr
            | InfixExpr id [nl] InfixExpr

Can be rewritten as

infixExpr ::= PrefixExpr
            | PrefixExpr id [nl] InfixExpr

In fact, I bet this is just an error in the grammar. Let's show an example that it is ok. Let's reduce (partially) something with the first grammar, and then try the second one.

InfixExpr id [nl] InfixExpr                      
// Apply the second reduction to the first InfixExpr
InfixExpr id [nl] InfixExpr id [nl] InfixExpr
// Apply the first reduction to the (new) first InfixExpr
PrefixExpr id [nl] InfixExpr id [nl] InfixExpr
// Apply the first reduction to the new first InfixExpr
PrefixExpr id [nl] PrefixExpr id [nl] InfixExpr
// Apply the first reduction to the new first InfixExpr
PrefixExpr id [nl] PrefixExpr id [nl] PrefixExpr

Let's reduce it with the second grammar:

PrefixExpr id [nl] InfixExpr                      
// Apply the second reduction to the first InfixExpr
PrefixExpr id [nl] PrefixExpr id [nl] InfixExpr
// Apply the first reduction to the new first InfixExpr
PrefixExpr id [nl] PrefixExpr id [nl] PrefixExpr

As you see, you end with equivalent ASTs in both cases.

回复收藏 0 原文

~没有更多了~