如何解决这个不明确的语法?
我写了这个语法:
expr : multExpr ( ('+' | '-') multExpr )*;
multExpr : atom ( ('*' | '/') atom )*;
atom : INT | FLOAT | ID | '(' expr ')';
condition : cond ('or' cond)*;
cond : c1 ('and' c1)*;
c1 : ('not')? c2;
c2 : '(' condition ')' | boolean;
boolean : expr (relop expr | ²) | 'true' | 'false';
relop : '<' | '<=' | '>' | '>=' | '==' | '!=';
我省略了 INT、FLOAT、ID 的词法分析器规则,因为它是显而易见的。
问题是c2规则,由于'('而含糊不清,我找不到解决方案,你能给我一个解决方案吗?
I have written this grammar:
expr : multExpr ( ('+' | '-') multExpr )*;
multExpr : atom ( ('*' | '/') atom )*;
atom : INT | FLOAT | ID | '(' expr ')';
condition : cond ('or' cond)*;
cond : c1 ('and' c1)*;
c1 : ('not')? c2;
c2 : '(' condition ')' | boolean;
boolean : expr (relop expr | ²) | 'true' | 'false';
relop : '<' | '<=' | '>' | '>=' | '==' | '!=';
I have omitted the lexer rules for INT,FLOAT,ID as it is obvious.
The problem is c2 rule, it is ambiguous because of '(', I could not find the solution, can you offer me a solution?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
为什么不简单地这样做:
一元
not
通常具有比您现在尝试执行的更高的优先级。这将允许像
42 > 这样的表达式。 true
,但是当您遍历 AST/树时可以检查此类语义。编辑
输入
“not(a+b >= 2 * foo/3.14159) == false”
现在将像这样解析(忽略空格):如果将输出设置为 AST 并混合一些树重写运算符(
^
和!
):您会得到:
Why not simply do:
The unary
not
usually has a higher precedence than you're trying to do now.This will allow for expressions like
42 > true
, but checking such semantics can come when you're walking the AST/tree.EDIT
The input
"not(a+b >= 2 * foo/3.14159) == false"
would now be parsed like this (ignoring spaces):And if you set the output to AST and mix in some tree rewrite operators (
^
and!
):you'd get:
您的问题源于这样一个事实:“(”可能是
c2
的第一个替代方案的开始,也可能是atom
的最后一个替代方案的开始。例如,给定输入如下((x+y) > (a+b))
,第一个左括号是c2
的开头,但第二个是c2 的开头>atom
. [编辑:解析器没有指示要走哪条路,直到稍后某个任意点 - 例如,它无法知道第一个左括号是c2
的开头,直到遇到>
例如,如果它是*
,那么两个左括号都是atom
的开头。]处理它的一种可能方法是统一算术和布尔表达式的规则,所以你只有一个规则与
'('表达式')
,并且表达式
可能是算术或布尔值,但是,这通常会产生相当松散的类型的副作用,并且相对自由。算术表达式和布尔表达式之间的转换(至少在解析器级别——然后您可以在语义中按照您喜欢的方式严格强制执行类型)。编辑:例如,在 Pascal 中,规则运行如下(稍微简化了一点):
You problem stems from the fact that the '(' could be the start of either the first alternative for
c2
or the last alternative foratom
. Just for example, given input like((x+y) > (a+b))
, the first open paren is the beginning of ac2
, but the second is the beginning of anatom
. [edit: And the parser has no indication of which way to go until some arbitrary point later -- for example, it can't know that the first open paren is the beginning of ac2
until it encounters the>
. For example, if that were a*
instead, then both the opening parens would be beginnings ofatom
s.]One possible way to handle it would be to unify the rules for arithmetic and Boolean expressions, so you only have one rule with
'(' expression ')
, and theexpression
might be arithmetic or Boolean. This often, however, has the side-effect of producing rather loose typing, with relatively free conversion between arithmetic and Boolean expressions (at least at the parser level -- you can then enforce the types as rigidly as you like in the semantics).Edit: In Pascal, for example, the rules run something like this (simplifying a tiny bit):
您不能将 c1 定义如下吗?
Couldn't you define c1 as the following?
解决这个问题的一种方法是将其分成两组词法分析器规则,并按顺序将它们应用于输入(一组用于数学内容,另一组用于布尔值)。
One way to approach this problem is to split it into two sets of lexer rules and apply them sequentially to the input (one for the math stuff, the other for the boolean).