如何解决这个不明确的语法?

发布于 2025-01-06 17:25:04 字数 505 浏览 1 评论 0原文

我写了这个语法:

expr        : multExpr ( ('+' | '-') multExpr )*;
multExpr    : atom ( ('*' | '/') atom )*;
atom    : INT | FLOAT | ID | '(' expr ')';
condition   : cond ('or' cond)*;
cond    : c1 ('and' c1)*;
c1      : ('not')? c2;
c2      : '(' condition ')' | boolean;
boolean : expr (relop expr | ²) | 'true' | 'false';
relop   : '<' | '<=' | '>' | '>=' | '==' | '!=';

我省略了 INT、FLOAT、ID 的词法分析器规则,因为它是显而易见的。

问题是c2规则,由于'('而含糊不清,我找不到解决方案,你能给我一个解决方案吗?

I have written this grammar:

expr        : multExpr ( ('+' | '-') multExpr )*;
multExpr    : atom ( ('*' | '/') atom )*;
atom    : INT | FLOAT | ID | '(' expr ')';
condition   : cond ('or' cond)*;
cond    : c1 ('and' c1)*;
c1      : ('not')? c2;
c2      : '(' condition ')' | boolean;
boolean : expr (relop expr | ²) | 'true' | 'false';
relop   : '<' | '<=' | '>' | '>=' | '==' | '!=';

I have omitted the lexer rules for INT,FLOAT,ID as it is obvious.

The problem is c2 rule, it is ambiguous because of '(', I could not find the solution, can you offer me a solution?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

浅暮の光 2025-01-13 17:25:04

为什么不简单地这样做:

expr      : orExpr; 
orExpr    : andExpr ('or' andExpr)*;
andExpr   : relExpr ('and' relExpr)*;
relExpr   : addExpr (relop addExpr)?;
relop     : '<' | '<=' | '>' | '>=' | '==' | '!=';
addExpr   : multExpr (('+' | '-') multExpr)*;
multExpr  : unaryExpr (('*' | '/') unaryExpr)*;
unaryExpr : 'not'? atom;
atom      : INT | FLOAT | ID | 'true' | 'false' | '(' expr ')';

一元 not 通常具有比您现在尝试执行的更高的优先级。

这将允许像 42 > 这样的表达式。 true,但是当您遍历 AST/树时可以检查此类语义。

编辑

输入“not(a+b >= 2 * foo/3.14159) == false”现在将像这样解析(忽略空格):

在此处输入图像描述

如果将输出设置为 AST 并混合一些树重写运算符(^!):

options {
  output=AST;
}

// ...

expr      : orExpr; 
orExpr    : andExpr ('or'^ andExpr)*;
andExpr   : relExpr ('and'^ relExpr)*;
relExpr   : addExpr (relop^ addExpr)?;
relop     : '<' | '<=' | '>' | '>=' | '==' | '!=';
addExpr   : multExpr (('+' | '-')^ multExpr)*;
multExpr  : unaryExpr (('*' | '/')^ unaryExpr)*;
unaryExpr : 'not'^ atom | atom;
atom      : INT | FLOAT | ID | 'true' | 'false' | '('! expr ')'!;

您会得到:

在此处输入图像描述

Why not simply do:

expr      : orExpr; 
orExpr    : andExpr ('or' andExpr)*;
andExpr   : relExpr ('and' relExpr)*;
relExpr   : addExpr (relop addExpr)?;
relop     : '<' | '<=' | '>' | '>=' | '==' | '!=';
addExpr   : multExpr (('+' | '-') multExpr)*;
multExpr  : unaryExpr (('*' | '/') unaryExpr)*;
unaryExpr : 'not'? atom;
atom      : INT | FLOAT | ID | 'true' | 'false' | '(' expr ')';

The unary not usually has a higher precedence than you're trying to do now.

This will allow for expressions like 42 > true, but checking such semantics can come when you're walking the AST/tree.

EDIT

The input "not(a+b >= 2 * foo/3.14159) == false" would now be parsed like this (ignoring spaces):

enter image description here

And if you set the output to AST and mix in some tree rewrite operators (^ and !):

options {
  output=AST;
}

// ...

expr      : orExpr; 
orExpr    : andExpr ('or'^ andExpr)*;
andExpr   : relExpr ('and'^ relExpr)*;
relExpr   : addExpr (relop^ addExpr)?;
relop     : '<' | '<=' | '>' | '>=' | '==' | '!=';
addExpr   : multExpr (('+' | '-')^ multExpr)*;
multExpr  : unaryExpr (('*' | '/')^ unaryExpr)*;
unaryExpr : 'not'^ atom | atom;
atom      : INT | FLOAT | ID | 'true' | 'false' | '('! expr ')'!;

you'd get:

enter image description here

只为一人 2025-01-13 17:25:04

您的问题源于这样一个事实:“(”可能是 c2 的第一个替代方案的开始,也可能是 atom 的最后一个替代方案的开始。例如,给定输入如下((x+y) > (a+b)),第一个左括号是 c2 的开头,但第二个是 c2 的开头>atom. [编辑:解析器没有指示要走哪条路,直到稍后某个任意点 - 例如,它无法知道第一个左括号是 c2 的开头,直到遇到 >例如,如果它是 *,那么两个左括号都是 atom 的开头。]

处理它的一种可能方法是统一算术和布尔表达式的规则,所以你只有一个规则与'('表达式'),并且表达式可能是算术或布尔值,但是,这通常会产生相当松散的类型的副作用,并且相对自由。算术表达式和布尔表达式之间的转换(至少在解析器级别——然后您可以在语义中按照您喜欢的方式严格强制执行类型)。

编辑:例如,在 Pascal 中,规则运行如下(稍微简化了一点):

expression: simple_expression ( rel_op simple_expression )*

simple_expression: ( '+' | '-')? term ( ('+' | '-' | 'or' ) term )*

term: factor ( ( '/' | '*' | 'div' | 'mod' | 'and') factor )*

factor: constant | variable | function_call | '(' expression ')' | 'not' factor

You problem stems from the fact that the '(' could be the start of either the first alternative for c2 or the last alternative for atom. Just for example, given input like ((x+y) > (a+b)), the first open paren is the beginning of a c2, but the second is the beginning of an atom. [edit: And the parser has no indication of which way to go until some arbitrary point later -- for example, it can't know that the first open paren is the beginning of a c2 until it encounters the >. For example, if that were a * instead, then both the opening parens would be beginnings of atoms.]

One possible way to handle it would be to unify the rules for arithmetic and Boolean expressions, so you only have one rule with '(' expression '), and the expression might be arithmetic or Boolean. This often, however, has the side-effect of producing rather loose typing, with relatively free conversion between arithmetic and Boolean expressions (at least at the parser level -- you can then enforce the types as rigidly as you like in the semantics).

Edit: In Pascal, for example, the rules run something like this (simplifying a tiny bit):

expression: simple_expression ( rel_op simple_expression )*

simple_expression: ( '+' | '-')? term ( ('+' | '-' | 'or' ) term )*

term: factor ( ( '/' | '*' | 'div' | 'mod' | 'and') factor )*

factor: constant | variable | function_call | '(' expression ')' | 'not' factor
ゃ人海孤独症 2025-01-13 17:25:04

您不能将 c1 定义如下吗?

('not')? (('(' condition ')') | boolean)

Couldn't you define c1 as the following?

('not')? (('(' condition ')') | boolean)
余生一个溪 2025-01-13 17:25:04

解决这个问题的一种方法是将其分成两组词法分析器规则,并按顺序将它们应用于输入(一组用于数学内容,另一组用于布尔值)。

One way to approach this problem is to split it into two sets of lexer rules and apply them sequentially to the input (one for the math stuff, the other for the boolean).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文