无法解释 ANTLRWorks 输出

发布于 2025-01-05 04:23:04 字数 509 浏览 4 评论 0原文

我使用以下简单的语法来了解 ANTLR。

grammar Example;
options {
language=Java;
}

ID  : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;

INT : '0'..'9'+
    ;
PLUS    :   '+';


ADDNUM  :   
    INT PLUS INT;

prog    :    ADDNUM;

当我尝试在 ANTLRWorks 中针对输入 1+2 运行语法时，我在控制台中收到以下错误：

[16:54:08] 解释... [16:54:08] 2:0 时出现匹配标记问题
NoViableAltException(' '@[1:1: 令牌 : ( ID | INT | PLUS | ADDNUM);])

任何人都可以帮我理解我哪里出错了。

原文

I am using the following simple grammar to get an understanding of ANTLR.

grammar Example;
options {
language=Java;
}

ID  : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;

INT : '0'..'9'+
    ;
PLUS    :   '+';


ADDNUM  :   
    INT PLUS INT;

prog    :    ADDNUM;

When I try running the grammar in ANTLRWorks for the input 1+2, I get the following error in the console:

[16:54:08] Interpreting... [16:54:08] problem matching token at 2:0
NoViableAltException(' '@[1:1: Tokens : ( ID | INT | PLUS | ADDNUM);])

Can anyone please help me understand where I am going wrong.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

悲凉≈ 2025-01-12 04:23:04

您可能没有将 prog 指定为 ANTLRWorks 中的起始规则。如果你这样做了，一切都会好起来的。

但您确实不应该像在 ADDNUM 中那样创建与表达式匹配的词法分析器规则：这应该是解析器规则：

grammar Example;

prog    : addExpr EOF;
addExpr : INT PLUS INT;
ID      : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*;
INT     : '0'..'9'+;
PLUS    : '+';

ANTLR 规则

何时使用解析器、词法分析器或词法

分析器规则

通常是语言的最小部分（字符串、数字、标识符、注释等）。尝试从像 1+2 这样的输入创建词法分析器规则会导致问题，因为：

如果您想从该标记中提取一些有意义的内容（例如评估它），您需要拆分该标记的内容令牌，因为在从中创建 1 个令牌后，整个表达式中的文本被“粘合”在一起；
当其之间有空格时，您会遇到问题： 1 + 2.。

表达式 1+2 是三个标记：INT、PLUS 和另一个 INT。

片段规则

当您不希望此规则成为“真实”令牌时，可以使用片段规则。例如，采用以下词法分析器规则：

ID    : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')*
FLOAT : '0'..'9'+ '.' '0'..'9'+; 
INT   : '0'..'9'+;

在上面的规则中，您使用了 '0'..'9' 四次，因此您可以将其放在单独的规则中

ID    : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | DIGIT)*
FLOAT : DIGIT+ '.' DIGIT+; 
INT   : DIGIT+;
DIGIT : '0'..'9';

但是您没有想要创建一个 DIGIT 标记：您只希望其他词法分析器规则使用 DIGIT 。在这种情况下，您可以创建一个 fragment 规则：

ID    : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | DIGIT)*
FLOAT : DIGIT+ '.' DIGIT+; 
INT   : DIGIT+;
fragment DIGIT : '0'..'9';

这将确保永远不会有 DIGIT 标记：因此永远不会在您的解析器规则中使用它！

解析器规则

解析器规则将标记粘合在一起：它们确保语言语法有效（也称为解析）。需要强调的是，解析器规则可以使用其他解析器规则或词法分析器规则，但不能片段规则。

另请参阅：ANTLR：有一个简单的示例吗？

You probably didn't indicate prog as the starting rule in ANTLRWorks. If you do, it all goes okay.

But you really shouldn't create a lexer rule that matches an expression like you do in ADDNUM: this should be a parser rule:

grammar Example;

prog    : addExpr EOF;
addExpr : INT PLUS INT;
ID      : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*;
INT     : '0'..'9'+;
PLUS    : '+';

ANTLR rules

There are no strict rules when to use parser-, lexer- or fragment rules, but here's what they're usually used for:

lexer rules

A lexer rule is usually the smallest part of a language (a string, a numbers, an identifier, a comment, etc.). Trying to create a lexer rule from input like 1+2 causes problems because:

if you ever want to extract something meaningful from that token (evaluate it, for example), you need to split the contents of that token because after creating 1 token from it, the text from the entire expression is "glued" together;
you run into problems when there are white-space in between it: 1 + 2.

The expression 1+2 are three tokens: INT, PLUS and another INT.

fragment rules

A fragment rule is used when you don't want this rule to ever because a "real" token. For example, take the following lexer rules:

ID    : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')*
FLOAT : '0'..'9'+ '.' '0'..'9'+; 
INT   : '0'..'9'+;

In the rules above, you're using '0'..'9' four times, so you could place that in a separate rule

ID    : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | DIGIT)*
FLOAT : DIGIT+ '.' DIGIT+; 
INT   : DIGIT+;
DIGIT : '0'..'9';

But you don't want to ever create a DIGIT token: you only want the DIGIT to be used by other lexer rules. In that case, you can create a fragment rule:

ID    : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | DIGIT)*
FLOAT : DIGIT+ '.' DIGIT+; 
INT   : DIGIT+;
fragment DIGIT : '0'..'9';

This will make sure there will never be a DIGIT token: and can therefor never use this in your parser rule(s)!

parser rules

Parser rules glue the tokens together: they make sure the language is syntactic valid (a.k.a. parsing). To emphasize, parser rules can use other parser rules or lexer rules, but not fragment rules.

Also see: ANTLR: Is there a simple example?

回复收藏 0 原文

~没有更多了~