ANTLR 入门
几天前,我在 ANTLR 邮件列表上发布了这个问题,但没有得到任何支持。所以我希望你们能帮助我:
我目前正在尝试深入研究 Antlr,因为我发现这个工具非常有帮助。上次我使用它时,我根据完成的语法生成了一些东西。 这次我想建立自己的语法并真正开始理解正在发生的事情。
为此,我决定为一些类似 Wiki-Notation 的文本构建一个解析器。
这是一个示例(没有 -Start - 和 - End - 行):
------------ Start ---------------
before
More before
And yet even more ...
[Lineup]
[Floor:Main Floor]
Test1
Test2
[Floor:Classics Floor]
Test3
Test4
Test5
Test6
[/Lineup]
after
more After
..
And even more.
------------ End ---------------
如果文本包含“Lineup”块,则应对其进行解析。内容至少是一个“楼层”,后跟一些名字、一个新的“楼层”或结束的“阵容” 如果我更改语法,我管理解析器来解析文本,并且我尝试解析为“[Floor:]”(一个块),但我确实需要其中的名称:(
一旦我将语法更改为支持楼层名称,没有任何作用了。 你能帮我解决这个问题吗?我并不是在寻找不加评论地为我解决问题的人。我真的很想知道为什么我的语法不起作用。 我真的被困住了,我已经为此工作好几天了(好吧……我承认,这只是我下班后的业余时间……但至少是所有这些)。
我的伽玛尔来了。如果我尝试解析全文,我在解析时总是会遇到 EarlyExitExceptions :( :
grammar CalendarEventsJava;
/*------------------------------------------------------------------
* PARSER RULES
*------------------------------------------------------------------*/
event : (
(LINE_CONTENT | NEWLINE)*
(lineup (LINE_CONTENT | NEWLINE)*)?
);
lineup : (LINEUP_OPEN NEWLINE floor+ LINEUP_CLOSE);
floor : (FLOOR_OPEN LINE_CONTENT FLOOR_CLOSE NEWLINE lineupEntry+);
lineupEntry
: (LINE_CONTENT? NEWLINE);
artist : LINE_CONTENT;
/*------------------------------------------------------------------
* LEXER RULES
*------------------------------------------------------------------*/
LINEUP_OPEN
: '[Lineup]';
LINEUP_CLOSE
: '[/Lineup]';
FLOOR_OPEN
: '[Floor:';
FLOOR_CLOSE
: ']';
BLANKS : ( ' ' | '\t' )+;
NONBREAKING
: ~('\r' | '\n' | ']');
NEWLINE : '\r'? '\n';
// the content of a line consists of at least one non-breaking character.
LINE_CONTENT
: (NONBREAKING | ']')+ ;
我真的希望你能帮助我,因为我真的很渴望真正开始使用 ANTLR,因为我认为它真的很棒:)
Chris
a few days ago I posted this question on the ANTLR mailinglist, but didn't recieve any support. So I'm hoping you guys here can help me out:
I am currently trying to dig into Antlr as I find this tool very helpful. The last Time I used it, I generated something based upon a finished grammar.
This time I wanted to build my own grammar and really start understanding what's happening.
For this I decided to build a parser for some Wiki-Notation-Like text.
Here an example (without the -Start - and - End - row):
------------ Start ---------------
before
More before
And yet even more ...
[Lineup]
[Floor:Main Floor]
Test1
Test2
[Floor:Classics Floor]
Test3
Test4
Test5
Test6
[/Lineup]
after
more After
..
And even more.
------------ End ---------------
If the text contains a "Lineup" block, then this should be parsed. The content is at least one "Floor" followed by a number of Names, a new "Floor" or the closing "Lineup"
I managed my parser to parse the text if I change my grammar and the text I am trying to parse to "[Floor:]" (One Block) but I really need a name in there :(
As soon as I change my Grammar to support the Floor-Name, nothing works anymore.
Could you please help me with this? I'm not looking for someone that fixes it for me without a comment. I would really like to know why my grammar doesn't work.
I'm really stuck and I'm working on this for days now (Ok ... I admit, it's just my spare time after work ... but at least all of that).
Here comes my gammar. If I try to parse the full text, I allways get EarlyExitExceptions while parsing the :( :
grammar CalendarEventsJava;
/*------------------------------------------------------------------
* PARSER RULES
*------------------------------------------------------------------*/
event : (
(LINE_CONTENT | NEWLINE)*
(lineup (LINE_CONTENT | NEWLINE)*)?
);
lineup : (LINEUP_OPEN NEWLINE floor+ LINEUP_CLOSE);
floor : (FLOOR_OPEN LINE_CONTENT FLOOR_CLOSE NEWLINE lineupEntry+);
lineupEntry
: (LINE_CONTENT? NEWLINE);
artist : LINE_CONTENT;
/*------------------------------------------------------------------
* LEXER RULES
*------------------------------------------------------------------*/
LINEUP_OPEN
: '[Lineup]';
LINEUP_CLOSE
: '[/Lineup]';
FLOOR_OPEN
: '[Floor:';
FLOOR_CLOSE
: ']';
BLANKS : ( ' ' | '\t' )+;
NONBREAKING
: ~('\r' | '\n' | ']');
NEWLINE : '\r'? '\n';
// the content of a line consists of at least one non-breaking character.
LINE_CONTENT
: (NONBREAKING | ']')+ ;
I really hope you can help me, as I'm really anxious to really get started with ANTLR, cause I think it really rocks :)
Chris
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
问题
如果您在对源进行标记后检查标记流,您将看到以下标记被馈送到解析器:
如您所见,永远不会创建
FLOOR_OPEN
,而是 < code>LINE_CONTENT 标记代替。以下是手动调试令牌流的方法:
解决方案
更改:
为
(然后可以删除
FLOOR_CLOSE
)并更改:
为:
将产生以下解析树:
注释
请注意,词法分析器规则
NONBREAKING
和LINE_CONTENT
是非常相似,您可能不希望NONBREAKING
出现在令牌流中。如果将NONBREAKING
设置为片段规则会更好。片段规则仅由其他词法分析器规则使用,因此永远不会用于创建“真实”令牌:The problem
If you examine the token stream after tokenizing your source, you'll see that the following tokens are fed to the parser:
As you can see, there is never a
FLOOR_OPEN
created butLINE_CONTENT
tokens instead.Here's how you can manually debug your token stream:
The solution
Changing:
to
(
FLOOR_CLOSE
can then be removed)and changing:
to:
will result in the following parse tree:
Comments
Note that the lexer rules
NONBREAKING
andLINE_CONTENT
are very similar, you probably don't wantNONBREAKING
to ever appear in the token stream. It's be better if you makeNONBREAKING
a fragment-rule. Fragment rules are only used by other lexer rules and will therefor never be used to create a "real" token:看起来
正在消耗地板关闭。它将消耗直到行尾的所有字符。尝试从中排除地板关闭字符。
凯特.
It looks like
is consuming the floor close. It will consume all characters up to the end of the line. Try excluding the floor close character from it.
Kate.