ANTLR - 基本语法包括意外字符?

发布于 2024-08-20 11:57:09 字数 793 浏览 3 评论 0原文

我有一个非常简单的 ANTLR 语法,我正在尝试让它工作,但目前惨败。非常感谢对此的一些指示...

root    :   (keyword|ignore)*;
keyword :    KEYWORD;
ignore  :    IGNORE;

KEYWORD : ABBRV|WORD;   

fragment WORD : ALPHA+;
fragment ALPHA : 'a'..'z'|'A'..'Z';
fragment ABBRV : WORD?('.'WORD);

IGNORE  : .{ Skip(); };

通过以下测试输入:

"some ASP.NET and .NET stuff. that work."

我想要一棵树,它只是关键字节点的列表,

"some", "ASP.NET", "and", ".NET", "stuff", "that", "work"

此刻我得到

"some", "ASP.NET", "and", ".NET", "stuff. that",

(由于某种原因“.”出现在最后一个关键字中,并且 then ,它会错过“工作”

如果我将 ABBRV 子句更改为

fragment ABBRV : ('.'WORD);

,但我会分别获得关键字(asp)和关键字(.net),但我需要它们作为单个标记,

您可以提供的任何帮助都会很大 。赞赏。

I've got a really simple ANTLR grammar that I'm trying to get working, but failing miserably at the moment. Would really appreciate some pointers on this...

root    :   (keyword|ignore)*;
keyword :    KEYWORD;
ignore  :    IGNORE;

KEYWORD : ABBRV|WORD;   

fragment WORD : ALPHA+;
fragment ALPHA : 'a'..'z'|'A'..'Z';
fragment ABBRV : WORD?('.'WORD);

IGNORE  : .{ Skip(); };

With the following test input:

"some ASP.NET and .NET stuff. that work."

I'm wanting a tree that is just a list of keyword nodes,

"some", "ASP.NET", "and", ".NET", "stuff", "that", "work"

At the moment I get

"some", "ASP.NET", "and", ".NET", "stuff. that",

(for some reason "." appears within the last keyword, and it misses "work"

If I change the ABBRV clause to

fragment ABBRV : ('.'WORD);

then that works fine, but I get keyword (asp) and keyword (.net) - seperately - but I need them as a single token.

Any help you can give would be much appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

从此见与不见 2024-08-27 11:57:09

有几件事,首先,您的忽略解析器规则永远不会被触发,甚至不必出现在这个语法中(也排除在根规则之外)。当然,由于您正在调试并且具有忽略规则,因此测试更容易(通过删除 IGNORE 词法分析器规则中的skip();)。

现在解释测试数据,因为没有一个词法分析器标记只匹配 WORD '.'由于文本后面的句点,测试数据的结尾被忽略。如果您在“work”和句点之间放置一个空格,那么最后一个单词将出现,而句点将不会出现,这就是您想要的。词法分析器不知道如何处理“工作”。当它结束时。如果您在末尾添加另一个单词(在句点和新单词之间添加一个空格),则为“work”。是作为一个 IGNORE 标记从词法分析器规则传递的。我本以为这个词会被传递,并且句点应该只在 IGNORE 标记中。

There are a couple things, first your ignore parser rule will never be triggered and does not even have to appear in this grammar (also leave out of the root rule). Of course, since you were debugging and had the ignore rule it is much easier to test (by dropping the skip(); in the IGNORE lexer rule).

Now to explain the test data, since none of the lexer tokens match just WORD '.' the ending of your test data is being ignored because of the period right after the text. If you place a space between 'work' and the period then the last word will appear and the period will not appear, this is what you want. The lexer does not know what to do with 'work.' when it ends. If you add another word at the end (put a space between the period and the new word) then 'work.' is being passed from the lexer rules as one IGNORE token. I would have thought the word would be passed and the period should be in the IGNORE token only.

野の 2024-08-27 11:57:09

我决定尝试使用 ANTLR3 语法来解决您的问题。这就是我想出的,附加了一些字符串:

  • 你的规范不包含很多规则,因此,我的语法不是很彻底。
  • 考虑添加到 KEYW 以匹配更多令牌。
  • 我现在没有 C# 兼容的 ANTLR。将“skip()”大写以使其兼容。

    语法 TestSplitter;
    
    开始:(KEYW DELIM!?)*;
    KEYW: ('a'..'z'|'A'..'Z'|'.')+ ;
    德利姆:“.”? ' '+ ;
    

I decided to try to solve your problem with an ANTLR3 Grammar. This is what I came up with, with some strings attached:

  • Your spec does not contain many rules, and as a result, my grammar is not very thorough.
  • Consider adding to KEYW to match more tokens.
  • I don't have C# compatible ANTLR right now. Capitalize the 'skip()' to make it compatible.

    grammar TestSplitter;
    
    start: (KEYW DELIM!?)* ;
    KEYW: ('a'..'z'|'A'..'Z'|'.')+ ;
    DELIM: '.'? ' '+ ;
    
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文