意外的Antlr4解析器错误
我在Antlr4中发现了一种奇怪的行为(我尝试了版本4.10和4.10.1,结果相同)。
当我尝试
grammar Paths;
cfg: NL? (entry (NL | EOF))* EOF;
entry: path ':' value;
path: SEGMENT ('.' SEGMENT)*;
value: USTRING;
SEGMENT: [a-zA-Z0-9]+;
USTRING: [a-zA-Z0-9]+;
NL: [\n\r]+;
WS: [ \t]+ -> skip;
在字符串上使用语法“ key1:value1 \ nkey2.sub:value2 \ nkey3.sub1.sub2:value3”
,我会看到错误消息:
line 1:5 mismatched input 'value1' expecting {':', '.'}
line 2:9 mismatched input 'value2' expecting {':', '.'}
line 3:15 mismatched input 'value3' expecting {':', '.'}
如果我替换 value
/code>带有
值的定义:segment
,一切都按预期工作。
第一个定义有什么问题?
在两种情况下,树的输出都是相同的:
(cfg (entry (path key1) : (value value1)) \n (entry (path key2 . sub) : (value value2)) \n (entry (path key3 . sub1 . sub2) : (value value3)) <EOF> <EOF>)
我试图简化语法:
grammar Paths;
cfg: NL? (entry (NL | EOF))* EOF;
entry: path ':' value;
path: SEGMENT;
value: USTRING;
SEGMENT: [a-zA-Z0-9]+;
USTRING: [a-zA-Z0-9]+;
NL: [\n\r]+;
WS: [ \t]+ -> skip;
在这种情况下,我有错误(解析的字符串为
“ key1:value1:value1 \ nkey2:value2 \ nkey3:value3:value 3:value3”
) :
line 1:5 mismatched input 'value1' expecting USTRING
line 2:5 mismatched input 'value2' expecting USTRING
line 3:5 mismatched input 'value3' expecting USTRING
如果我在 value
定义中替换为 segment
,那么一切都很好。
输出是
(cfg (entry (path key1) : (value value1)) \n (entry (path key2) : (value value2)) \n (entry (path key3) : (value value3)) <EOF> <EOF>)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这是因为Antlr的Lexer处理字符的输入流以产生令牌流,而解析器则处理令牌流。 ANTLR中的递归下降正在处理令牌流,对Lexer的观点输入没有影响。
结果,Lexer规则
段
和USTRING
相同,因此,这两个规则都将与输入字符完全相同。发生这种情况时,ANTLR将与第一个规则匹配,因此它们都将是“段令牌”。如果您已经通过标准设置运行(并创建
grun
别名,则可以使用`tokens选项运行它来获取令牌流的转储。这通常是验证该验证的好主意您正在创建您期望的令牌流。That’s because ANTLR’s Lexer processes the input stream of characters to produce a token stream, and the parser processes the token stream. The recursive descent parsing in ANTLR is processing the token stream and has no impact on how the Lexer views input.
The Lexer rules
SEGMENT
andUSTRING
are identical, as a result, both rules will match exactly the same run of input characters. When that happens, ANTLR will match the first rule, so they’ll all be `SEGMENTS tokens.If you’ve run through the standard setup (and created the
grun
alias you can run it with the `-tokens option to get a dump of your token stream. This is generally a good idea for validating that you Lexer rules are creating the token stream you expect.之所以发生,是因为ANTLR分配了值类型''segment''。 Lexer忽略语法,如果有可以分配给不同类型的令牌,则将其分配给单个随机类型。
这件代码对我有所帮助:
可能我需要了解有关ANTLR模式的更多信息。
It happens because Antlr assigns to values types '''SEGMENT'''. Lexer ignores grammar and if there are tokens that could be assigned to different types, lexer assigns them to a single random type.
This piece of code helped me:
Probably I need to learn more about Antlr modes.