ANTLR 不匹配 unicode 转义字符
我正在为类 C 语言编写一个解析器/解释器,我需要解释转义字符。其中之一是带有此模式“\uXXXX”的 unicode 转义序列,其中 X 是某个十六进制数字。
我的 ANTLR 规则如下所示:
public char returns [char c]
: '\\"' { $c = '"'; }
| '\\\\' { $c = '\\'; }
| '\\/' { $c = '/'; }
| '\\b' { $c = '\b'; }
| '\\f' { $c = '\f'; }
| '\\n' { $c = '\n'; }
| '\\r' { $c = '\r'; }
| '\\t' { $c = '\t'; }
| '\\u' HEXDIGIT HEXDIGIT HEXDIGIT HEXDIGIT { $c = 'e'; }
| ~('\\' | '"') { $c = '/'; }
;
fragment HEXDIGIT
: ('0'..'9'|'a'..'f'|'A'..'F')
我向它提供这个字符串“\u1234”,我期望它是“e”,但我得到的是“/”,这是其他所有内容的后备规则。
是否有一些魔法符咒正在发生,碎片和规则或者我不知道的东西?
I'm writing a parser/interpreter for a C-like language and I need to interpret escaped characters. One of them is the unicode-escaped sequence with this pattern "\uXXXX" where X is some hex number.
My ANTLR rules look like this:
public char returns [char c]
: '\\"' { $c = '"'; }
| '\\\\' { $c = '\\'; }
| '\\/' { $c = '/'; }
| '\\b' { $c = '\b'; }
| '\\f' { $c = '\f'; }
| '\\n' { $c = '\n'; }
| '\\r' { $c = '\r'; }
| '\\t' { $c = '\t'; }
| '\\u' HEXDIGIT HEXDIGIT HEXDIGIT HEXDIGIT { $c = 'e'; }
| ~('\\' | '"') { $c = '/'; }
;
fragment HEXDIGIT
: ('0'..'9'|'a'..'f'|'A'..'F')
I'm feeding it this string "\u1234" for which I expect an 'e' but I'm getting a '/' instead which is the fallback rule for everything else.
Is there some magic juju going on with fragments and rules or something that I'm not aware of?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
正如 Adam 所提到的,
char
目前是一个解析器规则,但应该改为词法分析器规则,在这种情况下,你不能让它返回char
(词法分析器规则始终返回Token
的实例!)。您可以使用其
setText(...)
方法调整令牌的内部文本,如下所示(假设 Java 是目标语言):As mentioned by Adam,
char
is a parser rule at the moment, but should be made a lexer rule instead, in which case you can't let it return achar
(lexer rules always return an instance of aToken
!).You can adjust the inner-text of a token using its
setText(...)
method like this (assuming Java is the target language):