ANTLR4语法:失配输入错误
我已经定义了以下语法:
grammar Test;
parse: expr EOF;
expr : IF comparator FROM field THEN #comparatorExpr
;
dateTime : DATE_TIME;
number : (INT|DECIMAL);
field : FIELD_IDENTIFIER;
op : (GT | GE | LT | LE | EQ);
comparator : op (number|dateTime);
fragment LETTER : [a-zA-Z];
fragment DIGIT : [0-9];
IF : '$IF';
FROM : '$FROM';
THEN : '$THEN';
OR : '$OR';
GT : '>' ;
GE : '>=' ;
LT : '<' ;
LE : '<=' ;
EQ : '=' ;
INT : DIGIT+;
DECIMAL : INT'.'INT;
DATE_TIME : (INT|DECIMAL)('M'|'y'|'d');
FIELD_IDENTIFIER : (LETTER|DIGIT)(LETTER|DIGIT|' ')*;
WS : [ \r\t\u000C\n]+ -> skip;
我尝试解析以下输入:
$IF >=15 $FROM AgeInYears $THEN
它给我以下错误:
line 1:6 mismatched input '15 ' expecting {INT, DECIMAL, DATE_TIME}
我发现的所有帖子都指出了此错误的相同原因 - 相同的Lexer规则。但是我看不出为什么15
可以匹配十进制
- 它需要。 > - 它也具有
m | d | y
后缀。
任何指针都将在这里受到赞赏。
I have defined the following grammar:
grammar Test;
parse: expr EOF;
expr : IF comparator FROM field THEN #comparatorExpr
;
dateTime : DATE_TIME;
number : (INT|DECIMAL);
field : FIELD_IDENTIFIER;
op : (GT | GE | LT | LE | EQ);
comparator : op (number|dateTime);
fragment LETTER : [a-zA-Z];
fragment DIGIT : [0-9];
IF : '$IF';
FROM : '$FROM';
THEN : '$THEN';
OR : '$OR';
GT : '>' ;
GE : '>=' ;
LT : '<' ;
LE : '<=' ;
EQ : '=' ;
INT : DIGIT+;
DECIMAL : INT'.'INT;
DATE_TIME : (INT|DECIMAL)('M'|'y'|'d');
FIELD_IDENTIFIER : (LETTER|DIGIT)(LETTER|DIGIT|' ')*;
WS : [ \r\t\u000C\n]+ -> skip;
And I try to parse the following input:
$IF >=15 $FROM AgeInYears $THEN
it gives me the following error:
line 1:6 mismatched input '15 ' expecting {INT, DECIMAL, DATE_TIME}
All SO posts I found point out to the same reason for this error - identical LEXER rules. But I cannot see why 15
can be matched to either DECIMAL
- it requires .
between 2 ints, or to DATE_TIME
- it has m|d|y
suffix as well.
Any pointers would be appreciated here.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
运行总是一个好主意,看看Lexer所产生的令牌流:
在这里我们看到“ 15”(
1
5
5space> space
)已通过field_identifier
规则匹配。由于这是三个输入字符,因此ANTLR更喜欢Lexer规则而不是仅与2个字符匹配的int
规则。对于此特定输入,您可以解决此问题正在重新设计
field_identifier
规则:也就是说,我怀疑试图在
field_iendifier
中允许空格(没有某种启动/停止标记),当您处理此过程时,很可能是持续的痛苦来源。 (这是有原因的,为什么您看不到这是大多数语言,也不是没有人认为允许多词标识符是方便的。它需要一个可能比其他规则优先的贪婪的Lexer规则(如它在这里做到了))。It's always a good idea to run take a look at the token stream that your Lexer produces:
Here we see that "15 " (
1
5
space
) has been matched by theFIELD_IDENTIFIER
rule. Since that's three input characters long, ANTLR will prefer that Lexer rule to theINT
rule that only matches 2 characters.For this particular input, you can solve this be reworking the
FIELD_IDENTIFIER
rule to be:That said, I suspect that attempting to allow spaces within your
FIELD_IDENTIFIER
(without some sort of start/stop markers), is likely to be a continuing source of pain as you work on this. (There's a reason why you don't see this is most languages, and it's not that nobody thought it would be handy to allow for multi-word identifiers. It requires a greedy lexer rule that is likely to take precedence over other rules (as it did here)).