为什么令牌规则(在ANTLR中)“标识:字母(字母|数字)*; ”无法识别“xy z”?
假设我有一段 ANTLR 语法(词法分析器部分),
fragment LETTER : ('a'..'z' | 'A'..'Z') ;
fragment DIGIT : '0'..'9';
INTEGER : DIGIT+ ;
Ident : LETTER (LETTER | DIGIT)*;
WS : (' ' | '\t' | '\n' | '\r' | '\f')+ {$channel = HIDDEN;};
COMMENT : '//' .* ('\n'|'\r') {$channel = HIDDEN;};
我想,由于 WS 吃掉了标记之间的所有空格,因此“xy z”和“xyz”都应该被识别为 Ident 的相同标记。但显然只有“xy z”会被视为3个Ident。因此,当词法分析器规则遇到空格时,我真的对行为感到困惑。
更具体地说,我有一条规则,
VARIABLE: ('A'..'Z')+ DIGIT* ;
我希望它能够识别 X3、Y4、XX55 等变量身份。但令人惊讶的是,这条规则识别“X Y”,所以这似乎完全无法理解。你的想法是什么?
Say I have a piece of ANTLR grammar (lexer part)
fragment LETTER : ('a'..'z' | 'A'..'Z') ;
fragment DIGIT : '0'..'9';
INTEGER : DIGIT+ ;
Ident : LETTER (LETTER | DIGIT)*;
WS : (' ' | '\t' | '\n' | '\r' | '\f')+ {$channel = HIDDEN;};
COMMENT : '//' .* ('\n'|'\r') {$channel = HIDDEN;};
I am thinking that, since WS eats all the white spaces between token, both "x y z" and "xyz" should have been recognizied as the same token of Ident. But apparently only "x y z" will be considered as 3 Ident. So I really feel confused about the behavior when white space is encountered for a lexer rule.
More concretely, I have a rule
VARIABLE: ('A'..'Z')+ DIGIT* ;
I want it to recognize variables identities like X3, Y4, XX55, etc. But surprisingly, this rule recognizes " X Y" So this seems to be totally incomprehensible. What is your idea?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
Ident : LETTER (LETTER | DIGIT)*;
表示 Ident 是一个字母后跟零个或多个字母或数字。 没有空格!这就是为什么“xy z”被识别为 3 Ident
Ident : LETTER (LETTER | DIGIT)*;
means that an Ident is a letter followed by zero or more letters or digits. NO whitespaces!That's why "x y z" are recognized like 3 Ident
尽管您已将
WS
放在HIDDEN
通道上,但"xy z"
是三个Ident
令牌,因为 < code>WS 标记仅在解析器规则中被丢弃,不在词法分析器规则中。不,规则
VARIABLE
与" X Y"
不匹配(包括空格):您一定做错了什么。Although you've put
WS
on theHIDDEN
channel,"x y z"
are threeIdent
tokens since theWS
tokens are only discarded in parser rules, not inside lexer rules.No, the rule
VARIABLE
does not match" X Y"
(including spaces): you must be doing something wrong.