Antlr(词法分析器):匹配正确的标记
在我的 Antlr3 语法中,我有几个“重叠”的词法分析器规则,如下所示:
NAT: ('0' .. '9')+ ;
INT: ('+' | '-')? ('0' .. '9')+ ;
BITVECTOR: ('0' | '1')* ;
虽然像 100110 和 123 这样的标记可以与多个规则匹配,但它是总是由上下文决定它必须是其中的哪一个。示例:
s: a | b | c ;
a: '<' NAT '>' ;
b: '{' INT '}' ;
c: '[' BITVECTOR ']' ;
输入 {17} 应匹配 {、INT 和 },但词法分析器已经确定 17 是 NAT 令牌。我怎样才能防止这种行为? backtrack 选项已设置为 true,但它似乎只影响解析器规则。
In my Antlr3 grammar, I have several "overlapping" lexer rules, like this:
NAT: ('0' .. '9')+ ;
INT: ('+' | '-')? ('0' .. '9')+ ;
BITVECTOR: ('0' | '1')* ;
Although tokens like 100110 and 123 can be matched by more than one of those rules, it is always determined by context which of them it has to be. Example:
s: a | b | c ;
a: '<' NAT '>' ;
b: '{' INT '}' ;
c: '[' BITVECTOR ']' ;
The input {17} should then match {, INT, and }, but the lexer has already decided that 17 is a NAT-token. How can I prevent this behavior? The backtrack option is already set to true, but it only seems to affect parser rules.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
可能有一种复杂的方法可以使词法分析器对上下文敏感,但一般来说,这就是您希望解析器处理的事情,并且您希望词法分析器仅提供令牌流。我的建议是重构你的词法分析器以返回
DIGITS
和SIGN
并让你的解析器计算出上下文中的数字代表什么类型的数字。There might be a complex way to make the lexer context-sensitive, but in general that's what you want the parser to take care of, and you want your lexer to just provide a stream of tokens. My recommendation is to refactor your lexer to return
DIGITS
andSIGN
and let your parser work out what kind of number the digits represent by the context.