使用 ANTLR4 解析字符串
示例:(CHGA/B234A/B231
String:
a) Designator: 3 LETTERS
b) Message number (OPTIONAL): 1 to 4 LETTERS, followed by A SLASH (/) followed by 1 to 4 LETTERS, followed by 3 NUMBERS indicating the serial number.
c) Reference data (OPTIONAL): 1 to 4 LETTERS, followed by A SLASH (/) followed by 1 to 4 LETTERS, followed by 3 NUMBERS indicating the serial number.
Result:
CHG
A/B234
A/B231
在语法文件中:
/*
* Parser Rules
*/
tipo3: designador idmensaje? idmensaje?;
designador: PARENTHESIS CHG;
idmensaje: LETTER4 SLASH LETTER4 DIGIT3;
/*
* Lexer Rules
*/
CHG : 'CHG' ;
fragment DIGIT : [0-9] ;
fragment LETTER : [a-zA-Z] ;
SLASH : '/' ;
PARENTHESIS : '(' ;
DIGIT3 : DIGIT DIGIT DIGIT ;
LETTER4 : LETTER LETTER? LETTER? LETTER? ;
但是在测试 tipo3
规则时,它给了我以下消息:
第 1:1 行在“CHGA”处缺少“CHG”
我如何解析antlr4中的该字符串?
Example: (CHGA/B234A/B231
String:
a) Designator: 3 LETTERS
b) Message number (OPTIONAL): 1 to 4 LETTERS, followed by A SLASH (/) followed by 1 to 4 LETTERS, followed by 3 NUMBERS indicating the serial number.
c) Reference data (OPTIONAL): 1 to 4 LETTERS, followed by A SLASH (/) followed by 1 to 4 LETTERS, followed by 3 NUMBERS indicating the serial number.
Result:
CHG
A/B234
A/B231
In grammar file:
/*
* Parser Rules
*/
tipo3: designador idmensaje? idmensaje?;
designador: PARENTHESIS CHG;
idmensaje: LETTER4 SLASH LETTER4 DIGIT3;
/*
* Lexer Rules
*/
CHG : 'CHG' ;
fragment DIGIT : [0-9] ;
fragment LETTER : [a-zA-Z] ;
SLASH : '/' ;
PARENTHESIS : '(' ;
DIGIT3 : DIGIT DIGIT DIGIT ;
LETTER4 : LETTER LETTER? LETTER? LETTER? ;
But when testing the tipo3
rule its giving me the following message:
line 1:1 missing 'CHG' at 'CHGA'
How can i parse that string in antlr4?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
当您困惑为什么某个解析器规则不匹配时,请始终从词法分析器开始。转储您的词法分析器在标准输出上生成的标记。具体方法如下:
如果运行上面的 Java 代码,将打印以下内容:
如您所见,
CHGA
变成单个LETTER4
,而不是CHG
+LETTER4
令牌。尝试将LETTER4
更改为LETTER4 : LETTER;
并重新测试。现在您将得到预期的结果。在您当前的语法中,
CHGA
将始终成为单个LETTER4
。这就是 ANTLR 的工作原理(词法分析器尝试为单个规则消耗尽可能多的字符)。你无法改变这一点。您可以做什么,它将多字母规则的构造移至解析器而不是词法分析器:
导致:
When you're confused why a certain parser rule is not being matched, always start with the lexer. Dump what tokens your lexer is producing on the stdout. Here's how you can do that:
If you runt the Java code above, this will be printed:
As you can see,
CHGA
becomes a singleLETTER4
, not aCHG
+LETTER4
token. Try changingLETTER4
intoLETTER4 : LETTER;
and re-test. Now you'll get the expected result.In your current grammar
CHGA
will always become a singleLETTER4
. This is just how ANTLR works (the lexer tries to consume as many chars for a single rule as possible). You cannot change this.What you could do, it move the construction of the multi-letter rule to the parser instead of the lexer:
resulting in: