ANTLR4语法：失配输入错误

发布于 2025-02-03 22:16:45 字数 1313 浏览 4 评论 0原文

我已经定义了以下语法：

grammar Test;

parse: expr EOF;

expr :  IF comparator FROM field THEN                                                                   #comparatorExpr
;

dateTime        :   DATE_TIME;
number          :   (INT|DECIMAL);
field           :   FIELD_IDENTIFIER;
op              :   (GT | GE | LT | LE | EQ);
comparator      :   op (number|dateTime);

fragment LETTER : [a-zA-Z];
fragment DIGIT  : [0-9];

IF                   : '$IF';
FROM                 : '$FROM';
THEN                 : '$THEN';
OR                   : '$OR';
GT                   : '>' ;
GE                   : '>=' ;
LT                   : '<' ;
LE                   : '<=' ;
EQ                   : '=' ;
INT                  : DIGIT+;
DECIMAL              : INT'.'INT;
DATE_TIME            : (INT|DECIMAL)('M'|'y'|'d');
FIELD_IDENTIFIER     : (LETTER|DIGIT)(LETTER|DIGIT|' ')*;
WS                   : [ \r\t\u000C\n]+ -> skip;

我尝试解析以下输入：

$IF >=15 $FROM AgeInYears $THEN

它给我以下错误：

line 1:6 mismatched input '15 ' expecting {INT, DECIMAL, DATE_TIME}

我发现的所有帖子都指出了此错误的相同原因 - 相同的Lexer规则。但是我看不出为什么15可以匹配十进制 - 它需要。 > - 它也具有m | d | y后缀。

任何指针都将在这里受到赞赏。

原文

I have defined the following grammar:

grammar Test;

parse: expr EOF;

expr :  IF comparator FROM field THEN                                                                   #comparatorExpr
;

dateTime        :   DATE_TIME;
number          :   (INT|DECIMAL);
field           :   FIELD_IDENTIFIER;
op              :   (GT | GE | LT | LE | EQ);
comparator      :   op (number|dateTime);

fragment LETTER : [a-zA-Z];
fragment DIGIT  : [0-9];

IF                   : '$IF';
FROM                 : '$FROM';
THEN                 : '$THEN';
OR                   : '$OR';
GT                   : '>' ;
GE                   : '>=' ;
LT                   : '<' ;
LE                   : '<=' ;
EQ                   : '=' ;
INT                  : DIGIT+;
DECIMAL              : INT'.'INT;
DATE_TIME            : (INT|DECIMAL)('M'|'y'|'d');
FIELD_IDENTIFIER     : (LETTER|DIGIT)(LETTER|DIGIT|' ')*;
WS                   : [ \r\t\u000C\n]+ -> skip;

And I try to parse the following input:

$IF >=15 $FROM AgeInYears $THEN

it gives me the following error:

line 1:6 mismatched input '15 ' expecting {INT, DECIMAL, DATE_TIME}

All SO posts I found point out to the same reason for this error - identical LEXER rules. But I cannot see why 15 can be matched to either DECIMAL - it requires . between 2 ints, or to DATE_TIME - it has m|d|y suffix as well.

Any pointers would be appreciated here.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

暮倦 2025-02-10 22:16:45

运行总是一个好主意，看看Lexer所产生的令牌流：

 grun Test parse -tokens -tree Test.txt
[@0,0:2='$IF',<'$IF'>,1:0]
[@1,4:5='>=',<'>='>,1:4]
[@2,6:8='15 ',<FIELD_IDENTIFIER>,1:6]
[@3,9:13='$FROM',<'$FROM'>,1:9]
[@4,15:25='AgeInYears ',<FIELD_IDENTIFIER>,1:15]
[@5,26:30='$THEN',<'$THEN'>,1:26]
[@6,31:30='<EOF>',<EOF>,1:31]
line 1:6 mismatched input '15 ' expecting {INT, DECIMAL, DATE_TIME}
(parse (expr $IF (comparator (op >=) 15 ) $FROM (field AgeInYears ) $THEN) <EOF>)

在这里我们看到“ 15”（1 5 5 space> space）已通过field_identifier规则匹配。由于这是三个输入字符，因此ANTLR更喜欢Lexer规则而不是仅与2个字符匹配的int规则。

对于此特定输入，您可以解决此问题正在重新设计field_identifier规则：

FIELD_IDENTIFIER: (LETTER | DIGIT)+ (' '+ (LETTER | DIGIT))*;

grun Test parse -tokens -tree Test.txt
[@0,0:2='$IF',<'$IF'>,1:0]
[@1,4:5='>=',<'>='>,1:4]
[@2,6:7='15',<INT>,1:6]
[@3,9:13='$FROM',<'$FROM'>,1:9]
[@4,15:24='AgeInYears',<FIELD_IDENTIFIER>,1:15]
[@5,26:30='$THEN',<'$THEN'>,1:26]
[@6,31:30='<EOF>',<EOF>,1:31]
(parse (expr $IF (comparator (op >=) (number 15)) $FROM (field AgeInYears) $THEN) <EOF>)

也就是说，我怀疑试图在field_iendifier中允许空格（没有某种启动/停止标记），当您处理此过程时，很可能是持续的痛苦来源。（这是有原因的，为什么您看不到这是大多数语言，也不是没有人认为允许多词标识符是方便的。它需要一个可能比其他规则优先的贪婪的Lexer规则（如它在这里做到了））。

It's always a good idea to run take a look at the token stream that your Lexer produces:

 grun Test parse -tokens -tree Test.txt
[@0,0:2='$IF',<'$IF'>,1:0]
[@1,4:5='>=',<'>='>,1:4]
[@2,6:8='15 ',<FIELD_IDENTIFIER>,1:6]
[@3,9:13='$FROM',<'$FROM'>,1:9]
[@4,15:25='AgeInYears ',<FIELD_IDENTIFIER>,1:15]
[@5,26:30='$THEN',<'$THEN'>,1:26]
[@6,31:30='<EOF>',<EOF>,1:31]
line 1:6 mismatched input '15 ' expecting {INT, DECIMAL, DATE_TIME}
(parse (expr $IF (comparator (op >=) 15 ) $FROM (field AgeInYears ) $THEN) <EOF>)

Here we see that "15 " (1 5 space) has been matched by the FIELD_IDENTIFIER rule. Since that's three input characters long, ANTLR will prefer that Lexer rule to the INT rule that only matches 2 characters.

For this particular input, you can solve this be reworking the FIELD_IDENTIFIER rule to be:

FIELD_IDENTIFIER: (LETTER | DIGIT)+ (' '+ (LETTER | DIGIT))*;

grun Test parse -tokens -tree Test.txt
[@0,0:2='$IF',<'$IF'>,1:0]
[@1,4:5='>=',<'>='>,1:4]
[@2,6:7='15',<INT>,1:6]
[@3,9:13='$FROM',<'$FROM'>,1:9]
[@4,15:24='AgeInYears',<FIELD_IDENTIFIER>,1:15]
[@5,26:30='$THEN',<'$THEN'>,1:26]
[@6,31:30='<EOF>',<EOF>,1:31]
(parse (expr $IF (comparator (op >=) (number 15)) $FROM (field AgeInYears) $THEN) <EOF>)

That said, I suspect that attempting to allow spaces within your FIELD_IDENTIFIER (without some sort of start/stop markers), is likely to be a continuing source of pain as you work on this. (There's a reason why you don't see this is most languages, and it's not that nobody thought it would be handy to allow for multi-word identifiers. It requires a greedy lexer rule that is likely to take precedence over other rules (as it did here)).

回复收藏 0 原文

~没有更多了~