ANTLR 入门

发布于 2024-10-19 02:32:39 字数 2158 浏览 8 评论 0原文

几天前，我在 ANTLR 邮件列表上发布了这个问题，但没有得到任何支持。所以我希望你们能帮助我：

我目前正在尝试深入研究 Antlr，因为我发现这个工具非常有帮助。上次我使用它时，我根据完成的语法生成了一些东西。这次我想建立自己的语法并真正开始理解正在发生的事情。

为此，我决定为一些类似 Wiki-Notation 的文本构建一个解析器。

这是一个示例（没有 -Start - 和 - End - 行）：

------------ Start ---------------
before
More before

And yet even more ...
[Lineup]
[Floor:Main Floor]
Test1
Test2
[Floor:Classics Floor]
Test3
Test4
Test5
Test6
[/Lineup]
after
more After
..

And even more.
------------ End ---------------

如果文本包含“Lineup”块，则应对其进行解析。内容至少是一个“楼层”，后跟一些名字、一个新的“楼层”或结束的“阵容” 如果我更改语法，我管理解析器来解析文本，并且我尝试解析为“[Floor:]”（一个块），但我确实需要其中的名称:(

一旦我将语法更改为支持楼层名称，没有任何作用了。你能帮我解决这个问题吗？我并不是在寻找不加评论地为我解决问题的人。我真的很想知道为什么我的语法不起作用。我真的被困住了，我已经为此工作好几天了（好吧……我承认，这只是我下班后的业余时间……但至少是所有这些）。

我的伽玛尔来了。如果我尝试解析全文，我在解析时总是会遇到 EarlyExitExceptions :( ：

grammar CalendarEventsJava;

/*------------------------------------------------------------------
* PARSER RULES
*------------------------------------------------------------------*/

event    : (
                               (LINE_CONTENT | NEWLINE)*
                               (lineup (LINE_CONTENT | NEWLINE)*)?
               );

lineup   : (LINEUP_OPEN NEWLINE floor+ LINEUP_CLOSE);

floor      : (FLOOR_OPEN LINE_CONTENT FLOOR_CLOSE NEWLINE lineupEntry+);

lineupEntry
                : (LINE_CONTENT? NEWLINE);

artist     : LINE_CONTENT;


/*------------------------------------------------------------------
* LEXER RULES
*------------------------------------------------------------------*/




LINEUP_OPEN
                :              '[Lineup]';
LINEUP_CLOSE
                :              '[/Lineup]';
FLOOR_OPEN
                :              '[Floor:';
FLOOR_CLOSE
                :              ']';

BLANKS               :              ( ' ' | '\t' )+;
NONBREAKING
                :              ~('\r' | '\n' | ']');
NEWLINE            :              '\r'? '\n';


// the content of a line consists of at least one non-breaking character.
LINE_CONTENT
                :              (NONBREAKING | ']')+ ;

我真的希望你能帮助我，因为我真的很渴望真正开始使用 ANTLR，因为我认为它真的很棒:)

Chris

原文

a few days ago I posted this question on the ANTLR mailinglist, but didn't recieve any support. So I'm hoping you guys here can help me out:

I am currently trying to dig into Antlr as I find this tool very helpful. The last Time I used it, I generated something based upon a finished grammar.
This time I wanted to build my own grammar and really start understanding what's happening.

For this I decided to build a parser for some Wiki-Notation-Like text.

Here an example (without the -Start - and - End - row):

------------ Start ---------------
before
More before

And yet even more ...
[Lineup]
[Floor:Main Floor]
Test1
Test2
[Floor:Classics Floor]
Test3
Test4
Test5
Test6
[/Lineup]
after
more After
..

And even more.
------------ End ---------------

If the text contains a "Lineup" block, then this should be parsed. The content is at least one "Floor" followed by a number of Names, a new "Floor" or the closing "Lineup"
I managed my parser to parse the text if I change my grammar and the text I am trying to parse to "[Floor:]" (One Block) but I really need a name in there :(

As soon as I change my Grammar to support the Floor-Name, nothing works anymore.
Could you please help me with this? I'm not looking for someone that fixes it for me without a comment. I would really like to know why my grammar doesn't work.
I'm really stuck and I'm working on this for days now (Ok ... I admit, it's just my spare time after work ... but at least all of that).

Here comes my gammar. If I try to parse the full text, I allways get EarlyExitExceptions while parsing the :( :

grammar CalendarEventsJava;

/*------------------------------------------------------------------
* PARSER RULES
*------------------------------------------------------------------*/

event    : (
                               (LINE_CONTENT | NEWLINE)*
                               (lineup (LINE_CONTENT | NEWLINE)*)?
               );

lineup   : (LINEUP_OPEN NEWLINE floor+ LINEUP_CLOSE);

floor      : (FLOOR_OPEN LINE_CONTENT FLOOR_CLOSE NEWLINE lineupEntry+);

lineupEntry
                : (LINE_CONTENT? NEWLINE);

artist     : LINE_CONTENT;


/*------------------------------------------------------------------
* LEXER RULES
*------------------------------------------------------------------*/




LINEUP_OPEN
                :              '[Lineup]';
LINEUP_CLOSE
                :              '[/Lineup]';
FLOOR_OPEN
                :              '[Floor:';
FLOOR_CLOSE
                :              ']';

BLANKS               :              ( ' ' | '\t' )+;
NONBREAKING
                :              ~('\r' | '\n' | ']');
NEWLINE            :              '\r'? '\n';


// the content of a line consists of at least one non-breaking character.
LINE_CONTENT
                :              (NONBREAKING | ']')+ ;

I really hope you can help me, as I'm really anxious to really get started with ANTLR, cause I think it really rocks :)

Chris

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

英雄似剑 2024-10-26 02:32:39

问题

如果您在对源进行标记后检查标记流，您将看到以下标记被馈送到解析器：

LINEUP_OPEN  :: [Lineup]
NEWLINE      :: \n
LINE_CONTENT :: [Floor:Main Floor]
NEWLINE      :: \n
LINE_CONTENT :: Test1
NEWLINE      :: \n
LINE_CONTENT :: Test2
NEWLINE      :: \n
LINE_CONTENT :: [Floor:Classics Floor]
NEWLINE      :: \n
LINE_CONTENT :: Test3
NEWLINE      :: \n
LINE_CONTENT :: Test4
NEWLINE      :: \n
LINE_CONTENT :: Test5
NEWLINE      :: \n
LINE_CONTENT :: Test6
NEWLINE      :: \n
LINEUP_CLOSE :: [/Lineup]

如您所见，永远不会创建 FLOOR_OPEN，而是 < code>LINE_CONTENT 标记代替。

以下是手动调试令牌流的方法：

String source = 
        "[Lineup]\n" +
        "[Floor:Main Floor]\n" +
        "Test1\n" +
        "Test2\n" +
        "[Floor:Classics Floor]\n" +
        "Test3\n" +
        "Test4\n" +
        "Test5\n" +
        "Test6\n" +
        "[/Lineup]";
ANTLRStringStream in = new ANTLRStringStream(source);
CalendarEventsJavaLexer lexer = new CalendarEventsJavaLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
CalendarEventsJavaParser parser = new CalendarEventsJavaParser(tokens);
for(Object o : tokens.getTokens()) {
    CommonToken t = (CommonToken)o;
    System.out.println(parser.tokenNames[t.getType()] + " :: " + t.getText().replace("\n", "\\n"));
}

解决方案

更改：

FLOOR_OPEN
                :              '[Floor:';

为

FLOOR_OPEN   : '[Floor:' ~']'* ']';

（然后可以删除FLOOR_CLOSE）

并更改：

NONBREAKING
            :              ~('\r' | '\n');

为：

NONBREAKING  : ~('\r' | '\n' | '[' | ']');

将产生以下解析树：

在此处输入图像描述

注释

请注意，词法分析器规则 NONBREAKING 和 LINE_CONTENT 是非常相似，您可能不希望 NONBREAKING 出现在令牌流中。如果将 NONBREAKING 设置为片段规则会更好。片段规则仅由其他词法分析器规则使用，因此永远不会用于创建“真实”令牌：

fragment NONBREAKING  : ~('\r' | '\n' | '[' | ']');

LINE_CONTENT : NONBREAKING+;

The problem

If you examine the token stream after tokenizing your source, you'll see that the following tokens are fed to the parser:

LINEUP_OPEN  :: [Lineup]
NEWLINE      :: \n
LINE_CONTENT :: [Floor:Main Floor]
NEWLINE      :: \n
LINE_CONTENT :: Test1
NEWLINE      :: \n
LINE_CONTENT :: Test2
NEWLINE      :: \n
LINE_CONTENT :: [Floor:Classics Floor]
NEWLINE      :: \n
LINE_CONTENT :: Test3
NEWLINE      :: \n
LINE_CONTENT :: Test4
NEWLINE      :: \n
LINE_CONTENT :: Test5
NEWLINE      :: \n
LINE_CONTENT :: Test6
NEWLINE      :: \n
LINEUP_CLOSE :: [/Lineup]

As you can see, there is never a FLOOR_OPEN created but LINE_CONTENT tokens instead.

Here's how you can manually debug your token stream:

String source = 
        "[Lineup]\n" +
        "[Floor:Main Floor]\n" +
        "Test1\n" +
        "Test2\n" +
        "[Floor:Classics Floor]\n" +
        "Test3\n" +
        "Test4\n" +
        "Test5\n" +
        "Test6\n" +
        "[/Lineup]";
ANTLRStringStream in = new ANTLRStringStream(source);
CalendarEventsJavaLexer lexer = new CalendarEventsJavaLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
CalendarEventsJavaParser parser = new CalendarEventsJavaParser(tokens);
for(Object o : tokens.getTokens()) {
    CommonToken t = (CommonToken)o;
    System.out.println(parser.tokenNames[t.getType()] + " :: " + t.getText().replace("\n", "\\n"));
}

The solution

Changing:

FLOOR_OPEN
                :              '[Floor:';

FLOOR_OPEN   : '[Floor:' ~']'* ']';

(FLOOR_CLOSE can then be removed)

and changing:

NONBREAKING
            :              ~('\r' | '\n');

to:

NONBREAKING  : ~('\r' | '\n' | '[' | ']');

will result in the following parse tree:

enter image description here

Comments

Note that the lexer rules NONBREAKING and LINE_CONTENT are very similar, you probably don't want NONBREAKING to ever appear in the token stream. It's be better if you make NONBREAKING a fragment-rule. Fragment rules are only used by other lexer rules and will therefor never be used to create a "real" token:

fragment NONBREAKING  : ~('\r' | '\n' | '[' | ']');

LINE_CONTENT : NONBREAKING+;

回复收藏 0 原文

兮颜 2024-10-26 02:32:39

看起来

NONBREAKING
                :              ~('\r' | '\n');

正在消耗地板关闭。它将消耗直到行尾的所有字符。尝试从中排除地板关闭字符。

凯特.

It looks like

NONBREAKING
                :              ~('\r' | '\n');

is consuming the floor close. It will consume all characters up to the end of the line. Try excluding the floor close character from it.

Kate.

回复收藏 0 原文

~没有更多了~

关于作者

听不够的曲调

暂无简介

文章

24 人气

关注发私信

友情链接

文江博客

ANTLR 入门

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

问题

解决方案

注释

The problem

The solution

Comments

关于作者

相关话题

热门标签

推荐作者

西西弗的石头怪

5397313

烟沫凡尘

一个破名字

萌︼了一个春

当爱已成负担

友情链接

ANTLR 入门

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

问题

解决方案

注释

The problem

The solution

Comments

关于作者

相关话题

热门标签

推荐作者

西西弗的石头怪

5397313

烟沫凡尘

一个破名字

萌︼了一个春

当爱已成负担

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。