帮助对语法进行左分解以删除左递归

发布于 2024-11-18 22:29:01 字数 2584 浏览 5 评论 0原文

我有一个小型的自定义脚本语言,我正在尝试更新它以允许布尔表达式,例如 a > 2a> 2和(b<3或c>5)。我在这里遇到麻烦的是括号表达式。

这是(根据 @Bart Kiers 的答案对原始帖子进行编辑的)完整语法,展示了该问题。这是我实际语法的精简版本,但问题也出现在这里。

grammar test;


options {
    language = 'JavaScript'; 
    output = AST;
} 


statement 
    :   value_assignment_statement  
        EOF
    ;


value_assignment_statement 
    :   IDENT
        '='
        expression                      
    ;

value_expression 
    :   value_list_expression           
    |   IDENT                           
    ;


value_list_expression 
    :   value_enumerated_list       
    ;


value_enumerated_list : '{' unary+ '}'
    ;



term 
    :   LPAREN expression RPAREN        
    |   INTEGER                         
    |   value_expression                
    ;

unary : ( '+' | '-' )* term
    ;

mult :  unary ( ('*' | '/') unary)*
    ;

expression : mult ( ('+' | '-') mult )*
    ;


boolean 
    :   boolean_expression
        EOF
    ;

boolean_expression
    :   boolean_or_expression
    ;

boolean_or_expression 
    :   boolean_and_expression (OR boolean_and_expression)*
    ;

boolean_and_expression 
    :   boolean_rel_expression (AND boolean_rel_expression)*
    ;

boolean_rel_expression
    :   boolean_neg_expression relational_operator boolean_neg_expression
    ;

boolean_neg_expression 
    :   (NOT)? atom
    ;

atom
    :   LPAREN boolean_expression RPAREN
    //| expression
    ;


relational_operator : '=' | '>' | '<';


LPAREN      :   '(';
RPAREN      :   ')';
AND         :   'and';
OR          :   'or';
NOT         :   'not';
IDENT       :   LETTER LETTER+;
INTEGER     :   DIGIT+;
WS          :   (' ' | '\n' | '\r' | '\t')+     { $channel = HIDDEN; };

fragment DIGIT      : '0'..'9';
fragment LETTER     : ('a'..'z' | 'A'..'Z');

我尝试容纳括号布尔表达式,例如 a > 2 或 (b < 3) 位于 atom 规则中的注释行中。当我取消注释这一行并将其包含在语法中时,ANTLR 给出以下错误:

[致命]规则原子由于可从 alts 1,2 到达的递归规则调用而具有非 LL(*) 决策。通过左因子分解或使用语法谓词或使用 backtrack=true 选项进行解析。

我想通过删除递归来解决这个问题,但我似乎无法从 Wikipedia 描述进行转换关于如何删除我自己的东西的左递归

在使用此语法时,我有时希望使用 statement 作为输入的根,例如 abc = 2 + 3,它将一个值分配给名为 abc 的变量。其他时候,我想使用语法来计算以 boolean 作为根的表达式,输入如 abc > > 3且(xyz<5或xyz>10)。当我尝试使用 @Bart 的答案作为模型时,它工作得很好,直到我尝试将 statement 使用的语法部分与 boolean 使用的部分合并。他们都应该能够使用表达式,但这就是我遇到左递归错误的地方。

那么,如何既处理括号又避免左递归问题呢?

I have a small custom scripting language, and I am trying to update it to allow boolean expressions such as a > 2 and a > 2 and (b < 3 or c > 5). It's the parenthetical expressions that I am having trouble with here.

Here is a (edited since the original post based on the answer from @Bart Kiers) full grammar that exhibits the problem. This is a pared-down version of my actual grammar, but the problem occurs here too.

grammar test;


options {
    language = 'JavaScript'; 
    output = AST;
} 


statement 
    :   value_assignment_statement  
        EOF
    ;


value_assignment_statement 
    :   IDENT
        '='
        expression                      
    ;

value_expression 
    :   value_list_expression           
    |   IDENT                           
    ;


value_list_expression 
    :   value_enumerated_list       
    ;


value_enumerated_list : '{' unary+ '}'
    ;



term 
    :   LPAREN expression RPAREN        
    |   INTEGER                         
    |   value_expression                
    ;

unary : ( '+' | '-' )* term
    ;

mult :  unary ( ('*' | '/') unary)*
    ;

expression : mult ( ('+' | '-') mult )*
    ;


boolean 
    :   boolean_expression
        EOF
    ;

boolean_expression
    :   boolean_or_expression
    ;

boolean_or_expression 
    :   boolean_and_expression (OR boolean_and_expression)*
    ;

boolean_and_expression 
    :   boolean_rel_expression (AND boolean_rel_expression)*
    ;

boolean_rel_expression
    :   boolean_neg_expression relational_operator boolean_neg_expression
    ;

boolean_neg_expression 
    :   (NOT)? atom
    ;

atom
    :   LPAREN boolean_expression RPAREN
    //| expression
    ;


relational_operator : '=' | '>' | '<';


LPAREN      :   '(';
RPAREN      :   ')';
AND         :   'and';
OR          :   'or';
NOT         :   'not';
IDENT       :   LETTER LETTER+;
INTEGER     :   DIGIT+;
WS          :   (' ' | '\n' | '\r' | '\t')+     { $channel = HIDDEN; };

fragment DIGIT      : '0'..'9';
fragment LETTER     : ('a'..'z' | 'A'..'Z');

My attempt to accommodate parenthetical boolean expressions such as a > 2 or (b < 3) is in the commented-out line in the atom rule. When I uncomment this line and include it in the grammar, ANTLR gives me this error:

[fatal] rule atom has non-LL(*) decision due to recursive rule invocations reachable from alts 1,2. Resolve by left-factoring or using syntactic predicates or using backtrack=true option.

I would like to address this by removing the recursion, but I can't seem to make the transition from the Wikipedia description on how to remove left recursion to my own stuff.

In using this grammar, I want sometimes to use statement as a root with input such as abc = 2 + 3, which assigns a value to a variable named abc. Other times I want to use the grammar to evaluate an expression with boolean as the root with input such as abc > 3 and (xyz < 5 or xyz > 10). When I tried to use @Bart's answer as a model, it worked fine until I tried to merge the parts of the grammar used by statement with the parts used by boolean. They should both be able to use an expression, but that's where I'm stuck with this left recursion error.

So, how can I both handle parentheses and avoid the left recursion problem?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

苍暮颜 2024-11-25 22:29:01

布尔表达式与加法和乘法表达式相同,因此不应将它们分开。以下是如何考虑所有类型的表达式:

grammar test;

parse
  :  expression EOF
  ;

expression 
  :  or
  ;

or
  :  and (OR and)*
  ;

and
  :  rel (AND rel)*
  ;

rel
  :  add (('=' | '>' | '<') add)*
  ;

add
  :  mult (('+' | '-') mult)*
  ;

mult
  :  unary (('*' | '/') unary)*
  ;

unary 
  :  '-' term
  |  '+' term
  |  NOT term
  |  term
  ;

term 
  :  INTEGER  
  |  IDENT       
  |  list
  |  '(' expression ')'
  ;

list 
  :  '{' (expression (',' expression)*)? '}'
  ;

AND     :  'and';
OR      :  'or';
NOT     :  'not';
IDENT   :  LETTER LETTER*;
INTEGER :  DIGIT+;
WS      :  (' ' | '\n' | '\r' | '\t')+  { $channel = HIDDEN; };

fragment DIGIT   : '0'..'9';
fragment LETTER  : ('a'..'z' | 'A'..'Z');

它将把示例输入:解析

abc > 3 and (xyz < 5 or xyz > {1, 2, 3})

到以下解析树中:

在此处输入图像描述

Boolean expressions are just the same as the additive- and multiplicative expression, and should therefore not be separated from them. Here's how to account for all types of expressions:

grammar test;

parse
  :  expression EOF
  ;

expression 
  :  or
  ;

or
  :  and (OR and)*
  ;

and
  :  rel (AND rel)*
  ;

rel
  :  add (('=' | '>' | '<') add)*
  ;

add
  :  mult (('+' | '-') mult)*
  ;

mult
  :  unary (('*' | '/') unary)*
  ;

unary 
  :  '-' term
  |  '+' term
  |  NOT term
  |  term
  ;

term 
  :  INTEGER  
  |  IDENT       
  |  list
  |  '(' expression ')'
  ;

list 
  :  '{' (expression (',' expression)*)? '}'
  ;

AND     :  'and';
OR      :  'or';
NOT     :  'not';
IDENT   :  LETTER LETTER*;
INTEGER :  DIGIT+;
WS      :  (' ' | '\n' | '\r' | '\t')+  { $channel = HIDDEN; };

fragment DIGIT   : '0'..'9';
fragment LETTER  : ('a'..'z' | 'A'..'Z');

which will parse the example input:

abc > 3 and (xyz < 5 or xyz > {1, 2, 3})

into the following parse tree:

enter image description here

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文