ANTLR 入门

发布于 2024-10-19 02:32:39 字数 2158 浏览 8 评论 0原文

几天前,我在 ANTLR 邮件列表上发布了这个问题,但没有得到任何支持。所以我希望你们能帮助我:

我目前正在尝试深入研究 Antlr,因为我发现这个工具非常有帮助。上次我使用它时,我根据完成的语法生成了一些东西。 这次我想建立自己的语法并真正开始理解正在发生的事情。

为此,我决定为一些类似 Wiki-Notation 的文本构建一个解析器。

这是一个示例(没有 -Start - 和 - End - 行):

------------ Start ---------------
before
More before

And yet even more ...
[Lineup]
[Floor:Main Floor]
Test1
Test2
[Floor:Classics Floor]
Test3
Test4
Test5
Test6
[/Lineup]
after
more After
..

And even more.
------------ End ---------------

如果文本包含“Lineup”块,则应对其进行解析。内容至少是一个“楼层”,后跟一些名字、一个新的“楼层”或结束的“阵容” 如果我更改语法,我管理解析器来解析文本,并且我尝试解析为“[Floor:]”(一个块),但我确实需要其中的名称:(

一旦我将语法更改为支持楼层名称,没有任何作用了。 你能帮我解决这个问题吗?我并不是在寻找不加评论地为我解决问题的人。我真的很想知道为什么我的语法不起作用。 我真的被困住了,我已经为此工作好几天了(好吧……我承认,这只是我下班后的业余时间……但至少是所有这些)。

我的伽玛尔来了。如果我尝试解析全文,我在解析时总是会遇到 EarlyExitExceptions :( :

grammar CalendarEventsJava;

/*------------------------------------------------------------------
* PARSER RULES
*------------------------------------------------------------------*/

event    : (
                               (LINE_CONTENT | NEWLINE)*
                               (lineup (LINE_CONTENT | NEWLINE)*)?
               );

lineup   : (LINEUP_OPEN NEWLINE floor+ LINEUP_CLOSE);

floor      : (FLOOR_OPEN LINE_CONTENT FLOOR_CLOSE NEWLINE lineupEntry+);

lineupEntry
                : (LINE_CONTENT? NEWLINE);

artist     : LINE_CONTENT;


/*------------------------------------------------------------------
* LEXER RULES
*------------------------------------------------------------------*/




LINEUP_OPEN
                :              '[Lineup]';
LINEUP_CLOSE
                :              '[/Lineup]';
FLOOR_OPEN
                :              '[Floor:';
FLOOR_CLOSE
                :              ']';

BLANKS               :              ( ' ' | '\t' )+;
NONBREAKING
                :              ~('\r' | '\n' | ']');
NEWLINE            :              '\r'? '\n';


// the content of a line consists of at least one non-breaking character.
LINE_CONTENT
                :              (NONBREAKING | ']')+ ;

我真的希望你能帮助我,因为我真的很渴望真正开始使用 ANTLR,因为我认为它真的很棒:)

Chris

a few days ago I posted this question on the ANTLR mailinglist, but didn't recieve any support. So I'm hoping you guys here can help me out:

I am currently trying to dig into Antlr as I find this tool very helpful. The last Time I used it, I generated something based upon a finished grammar.
This time I wanted to build my own grammar and really start understanding what's happening.

For this I decided to build a parser for some Wiki-Notation-Like text.

Here an example (without the -Start - and - End - row):

------------ Start ---------------
before
More before

And yet even more ...
[Lineup]
[Floor:Main Floor]
Test1
Test2
[Floor:Classics Floor]
Test3
Test4
Test5
Test6
[/Lineup]
after
more After
..

And even more.
------------ End ---------------

If the text contains a "Lineup" block, then this should be parsed. The content is at least one "Floor" followed by a number of Names, a new "Floor" or the closing "Lineup"
I managed my parser to parse the text if I change my grammar and the text I am trying to parse to "[Floor:]" (One Block) but I really need a name in there :(

As soon as I change my Grammar to support the Floor-Name, nothing works anymore.
Could you please help me with this? I'm not looking for someone that fixes it for me without a comment. I would really like to know why my grammar doesn't work.
I'm really stuck and I'm working on this for days now (Ok ... I admit, it's just my spare time after work ... but at least all of that).

Here comes my gammar. If I try to parse the full text, I allways get EarlyExitExceptions while parsing the :( :

grammar CalendarEventsJava;

/*------------------------------------------------------------------
* PARSER RULES
*------------------------------------------------------------------*/

event    : (
                               (LINE_CONTENT | NEWLINE)*
                               (lineup (LINE_CONTENT | NEWLINE)*)?
               );

lineup   : (LINEUP_OPEN NEWLINE floor+ LINEUP_CLOSE);

floor      : (FLOOR_OPEN LINE_CONTENT FLOOR_CLOSE NEWLINE lineupEntry+);

lineupEntry
                : (LINE_CONTENT? NEWLINE);

artist     : LINE_CONTENT;


/*------------------------------------------------------------------
* LEXER RULES
*------------------------------------------------------------------*/




LINEUP_OPEN
                :              '[Lineup]';
LINEUP_CLOSE
                :              '[/Lineup]';
FLOOR_OPEN
                :              '[Floor:';
FLOOR_CLOSE
                :              ']';

BLANKS               :              ( ' ' | '\t' )+;
NONBREAKING
                :              ~('\r' | '\n' | ']');
NEWLINE            :              '\r'? '\n';


// the content of a line consists of at least one non-breaking character.
LINE_CONTENT
                :              (NONBREAKING | ']')+ ;

I really hope you can help me, as I'm really anxious to really get started with ANTLR, cause I think it really rocks :)

Chris

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

英雄似剑 2024-10-26 02:32:39

问题

如果您在对源进行标记后检查标记流,您将看到以下标记被馈送到解析器:

LINEUP_OPEN  :: [Lineup]
NEWLINE      :: \n
LINE_CONTENT :: [Floor:Main Floor]
NEWLINE      :: \n
LINE_CONTENT :: Test1
NEWLINE      :: \n
LINE_CONTENT :: Test2
NEWLINE      :: \n
LINE_CONTENT :: [Floor:Classics Floor]
NEWLINE      :: \n
LINE_CONTENT :: Test3
NEWLINE      :: \n
LINE_CONTENT :: Test4
NEWLINE      :: \n
LINE_CONTENT :: Test5
NEWLINE      :: \n
LINE_CONTENT :: Test6
NEWLINE      :: \n
LINEUP_CLOSE :: [/Lineup]

如您所见,永远不会创建 FLOOR_OPEN,而是 < code>LINE_CONTENT 标记代替。

以下是手动调试令牌流的方法:

String source = 
        "[Lineup]\n" +
        "[Floor:Main Floor]\n" +
        "Test1\n" +
        "Test2\n" +
        "[Floor:Classics Floor]\n" +
        "Test3\n" +
        "Test4\n" +
        "Test5\n" +
        "Test6\n" +
        "[/Lineup]";
ANTLRStringStream in = new ANTLRStringStream(source);
CalendarEventsJavaLexer lexer = new CalendarEventsJavaLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
CalendarEventsJavaParser parser = new CalendarEventsJavaParser(tokens);
for(Object o : tokens.getTokens()) {
    CommonToken t = (CommonToken)o;
    System.out.println(parser.tokenNames[t.getType()] + " :: " + t.getText().replace("\n", "\\n"));
}

解决方案

更改:

FLOOR_OPEN
                :              '[Floor:';

FLOOR_OPEN   : '[Floor:' ~']'* ']';

(然后可以删除FLOOR_CLOSE

并更改:

NONBREAKING
            :              ~('\r' | '\n');

为:

NONBREAKING  : ~('\r' | '\n' | '[' | ']');

将产生以下解析树:

在此处输入图像描述

注释

请注意,词法分析器规则 NONBREAKINGLINE_CONTENT 是非常相似,您可能不希望 NONBREAKING 出现在令牌流中。如果将 NONBREAKING 设置为片段规则会更好。片段规则仅由其他词法分析器规则使用,因此永远不会用于创建“真实”令牌:

fragment NONBREAKING  : ~('\r' | '\n' | '[' | ']');

LINE_CONTENT : NONBREAKING+;

The problem

If you examine the token stream after tokenizing your source, you'll see that the following tokens are fed to the parser:

LINEUP_OPEN  :: [Lineup]
NEWLINE      :: \n
LINE_CONTENT :: [Floor:Main Floor]
NEWLINE      :: \n
LINE_CONTENT :: Test1
NEWLINE      :: \n
LINE_CONTENT :: Test2
NEWLINE      :: \n
LINE_CONTENT :: [Floor:Classics Floor]
NEWLINE      :: \n
LINE_CONTENT :: Test3
NEWLINE      :: \n
LINE_CONTENT :: Test4
NEWLINE      :: \n
LINE_CONTENT :: Test5
NEWLINE      :: \n
LINE_CONTENT :: Test6
NEWLINE      :: \n
LINEUP_CLOSE :: [/Lineup]

As you can see, there is never a FLOOR_OPEN created but LINE_CONTENT tokens instead.

Here's how you can manually debug your token stream:

String source = 
        "[Lineup]\n" +
        "[Floor:Main Floor]\n" +
        "Test1\n" +
        "Test2\n" +
        "[Floor:Classics Floor]\n" +
        "Test3\n" +
        "Test4\n" +
        "Test5\n" +
        "Test6\n" +
        "[/Lineup]";
ANTLRStringStream in = new ANTLRStringStream(source);
CalendarEventsJavaLexer lexer = new CalendarEventsJavaLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
CalendarEventsJavaParser parser = new CalendarEventsJavaParser(tokens);
for(Object o : tokens.getTokens()) {
    CommonToken t = (CommonToken)o;
    System.out.println(parser.tokenNames[t.getType()] + " :: " + t.getText().replace("\n", "\\n"));
}

The solution

Changing:

FLOOR_OPEN
                :              '[Floor:';

to

FLOOR_OPEN   : '[Floor:' ~']'* ']';

(FLOOR_CLOSE can then be removed)

and changing:

NONBREAKING
            :              ~('\r' | '\n');

to:

NONBREAKING  : ~('\r' | '\n' | '[' | ']');

will result in the following parse tree:

enter image description here

Comments

Note that the lexer rules NONBREAKING and LINE_CONTENT are very similar, you probably don't want NONBREAKING to ever appear in the token stream. It's be better if you make NONBREAKING a fragment-rule. Fragment rules are only used by other lexer rules and will therefor never be used to create a "real" token:

fragment NONBREAKING  : ~('\r' | '\n' | '[' | ']');

LINE_CONTENT : NONBREAKING+;
兮颜 2024-10-26 02:32:39

看起来

NONBREAKING
                :              ~('\r' | '\n');

正在消耗地板关闭。它将消耗直到行尾的所有字符。尝试从中排除地板关闭字符。

凯特.

It looks like

NONBREAKING
                :              ~('\r' | '\n');

is consuming the floor close. It will consume all characters up to the end of the line. Try excluding the floor close character from it.

Kate.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文