使用antlr解析|分离的文件
所以我认为这应该很容易,但我遇到了困难。我正在尝试解析 |分隔文件以及任何不以 | 开头的行是一条评论。我想我不明白评论是如何运作的。它总是在注释行出错。这是旧文件,因此无需更改。这是我的语法。
grammar Route;
@header {
package org.benheath.codegeneration;
}
@lexer::header {
package org.benheath.codegeneration;
}
file: line+;
line: route+ '\n';
route: ('|' elt) {System.out.println("element: [" + $elt.text + "]");} ;
elt: (ELEMENT)*;
COMMENT: ~'|' .* '\n' ;
ELEMENT: ('a'..'z'|'A'..'Z'|'0'..'9'|'*'|'_'|'@'|'#') ;
WS: (' '|'\t') {$channel=HIDDEN;} ; // ignore whitespace
数据:
! a comment
Another comment
| a | abc | b | def | ...
So I think this should be easy, but I'm having a tough time with it. I'm trying to parse a | delimited file, and any line that doesn't start with a | is a comment. I guess I don't understand how comments work. It always errors out on a comment line. This is a legacy file, so there's no changing it. Here's my grammar.
grammar Route;
@header {
package org.benheath.codegeneration;
}
@lexer::header {
package org.benheath.codegeneration;
}
file: line+;
line: route+ '\n';
route: ('|' elt) {System.out.println("element: [" + $elt.text + "]");} ;
elt: (ELEMENT)*;
COMMENT: ~'|' .* '\n' ;
ELEMENT: ('a'..'z'|'A'..'Z'|'0'..'9'|'*'|'_'|'@'|'#') ;
WS: (' '|'\t') {$channel=HIDDEN;} ; // ignore whitespace
Data:
! a comment
Another comment
| a | abc | b | def | ...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
其语法如下所示:
要测试它,您只需在语法中添加一些代码,如下所示:
现在通过调用生成词法分析器/解析器:
创建一个类
RouteTest.java
:编译所有源文件:
并运行类
RouteTest
:如果一切顺利,您会看到打印到控制台:
编辑:请注意,我通过只允许小写字母来简化它,您当然可以随时扩展集合。
A grammar for that would look like this:
And to test it, you just need to sprinkle a bit of code in your grammar like this:
Now generate a lexer/parser by invoking:
create a class
RouteTest.java
:Compile all source files:
and run the class
RouteTest
:If all goes well, you see this printed to your console:
Edit: note that I simplified it a bit by only allowing lower case letters, you can always expand the set of course.
使用 ANTLR 来完成这样的工作是一个好主意,尽管我确实认为这是多余的。例如,这将非常容易(在伪代码中):
编辑:好吧,您无法词汇表达注释和行之间的区别,因为没有任何词汇可以区分它们。一个让你朝着一个可行的方向前进的提示。
It's a nice idea to use ANTLR for a job like this, although I do think it's overkill. For example, it would be very easy to (in pseudo-code):
Edit: Well, you can't express the distinction between comments and lines lexically, because there is nothing lexical that distinguishes them. A hint to get you in one workable direction.
这似乎有效,我发誓我尝试过。将注释更改为小写将其切换到解析器与词法分析器,我仍然不明白。
This seems to work, I swear I tried it. Changing comment to lower case switched it to the parser vs the lexer, I still don't get it.