Antlr 错误:以下标记定义永远无法匹配,因为先前的标记与相同的输入匹配
我正在使用antlr编写一种简单的语言,我在AntlrWorks中定义了Lexer语法,但是当我想生成java代码时,它给了我错误:
Antlr错误:以下标记定义永远无法匹配,因为先前的标记匹配相同输入: FLOAT_OR_INT,OPEN_PAR,CLOSE_PAR,...(几乎适用于所有规则!)
我是antlr的新手,我认为这是因为规则位置的顺序,但我不知道它们应该如何,什么是我的错吗?
这是语法:
lexer grammar OurCompiler;
options
{
k=5;
}
ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
;
protected
INT : ('0'..'9')+
;
protected
FLOAT : INT '.' INT
;
FLOAT_OR_INT : ( INT '.' ) => FLOAT { $setType(FLOAT); }
| INT { $setType(INT); }
;
OPENPAR_OR_OUTPUT_OPERATOR : '(' { $setType(OPEN_PAR); } | '(' '(' { $setType(OUTPUT_OPERATOR); }
;
CLOSEPAR_OR_INPUT_OPERATOR : ')' { $setType(CLOSE_PAR); } | ')' ')' { $setType(INPUT_OPERATOR); }
;
protected
OPEN_PAR : '(' ;
protected
CLOSE_PAR : ')' ;
protected
INPUT_OPERATOR : ')' ')' ;
protected
OUTPUT_OPERATOR : '(' '(' ;
BOOLEAN : 't' 'r' 'u' 'e' | 'f' 'a' 'l' 's' 'e' ;
LOWER : '<' ;
LOWER_EQUAL : LOWER '=' ;
UPPER : '>' ;
UPPER_EQUAL : UPPER '=' ;
ASSIGN : '=' ;
EQUAL : '=' '=' ;
NOT : '!' ;
NOT_EQUAL : NOT '=' ;
ADD : '+' ;
ADD_TO_PREVIOUS : ADD '=' ;
INCREMENT : ADD ADD ;
MINUS : '-' ;
MINUS_FROM_PREVIOUS : MINUS '=' ;
DECREMENT : MINUS MINUS ;
MULTIPLY : '*' ;
MULTIPLY_TO_PREVIOUS : MULTIPLY '=' ;
DIVIDE : '/' ;
DIVIDE_FROM_PREVIOUS : DIVIDE '=' ;
MODE : '%' ;
OPEN_BRAKET : '[' ;
CLOSE_BRAKET : ']' ;
OPEN_BRACE : '{' ;
CLOSE_BRACE : '}' ;
COLON : ':' ;
SEMICOLON : ';' ;
COMMA : ',' ;
SINGLE_LINE_COMMENT :
'#' '#' ( ~ ('\n'|'\r') )* ( '\n' | '\r' ('\n')? )? { $setType(Token.SKIP); newline(); }
;
MULTIPLE_LINE_COMMENT : '#' ( options {greedy=false;} : . )* '#' { $setType(Token.SKIP); }
;
WS :
( ' '
| '\t'
| '\r' { newline(); }
| '\n' { newline(); }
)
{ $setType(Token.SKIP); }
;
protected
ESC_SEQ : '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
;
STRING :
'"' ( ESC_SEQ | ~('\\'|'"') )* '"'
;
CHAR :
'\'' ( ESC_SEQ | ~('\''|'\\') ) '\''
;
INT_KEYWORD : 'i' 'n' 't' ;
FLOAT_KEYWORD : 'f' 'l' 'o' 'a' 't' ;
CHAR_KEYWORD : 'c' 'h' 'a' 'r' ;
STRING_KEYWORD : 's' 't' 'r' 'i' 'n' 'g' ;
BOOLEAN_KEYWORD : 'b' 'o' 'o' 'l' 'e' 'a' 'n' ;
INPUT_KEYWORD : 'i' 'n' ID { $setType(ID); }
| 'i' 'n'
;
OUTPUT_KEYWORD : 'o' 'u' 't' ID { $setType(ID); }
| 'o' 'u' 't' ;
IF_KEYWORD : 'i' 'f' ;
FOR_KEYWORD : 'f' 'o' 'r' ;
SWITCH_KEYWORD : 's' 'w' 'i' 't' 'c' 'h' ;
CASE_KEYWORD : 'c' 'a' 's' 'e' ;
BREAK_KEYWORD : 'b' 'r' 'e' 'a' 'k' ;
DEFAULT_KEYWORD : 'd' 'e' 'f' 'a' 'u' 'l' 't' ;
WHILE_KEYWORD : 'w' 'h' 'i' 'l' 'e' ;
ELSE_KEYWORD : 'e' 'l' 's' 'e' ;
ELSEIF_KEYWORD : 'e' 'l' 's' 'e' 'i' 'f' ;
AND_KEYWORD : 'a' 'n' 'd' ;
OR_KEYWORD : 'o' 'r' ;
NOT_KEYWORD : 'n' 'o' 't' ;
CONSTANT_KEYWORD : 'c' 'o' 'n' 's' 't' 'a' 'n' 't' ;
I am writhing a simple language with antlr, I defined a Lexer grammar in AntlrWorks, but when I want to generate the java code, it gives me the error:
Antlr error : the following token definition can never be matched because prior tokens match the same input:
FLOAT_OR_INT, OPEN_PAR, CLOSE_PAR, .... (almost for all the rules!)
I am new to antlr, I assume it is because of the order of rule locations, but I don't know how should they have to be, what is my mistake?
here is the grammar:
lexer grammar OurCompiler;
options
{
k=5;
}
ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
;
protected
INT : ('0'..'9')+
;
protected
FLOAT : INT '.' INT
;
FLOAT_OR_INT : ( INT '.' ) => FLOAT { $setType(FLOAT); }
| INT { $setType(INT); }
;
OPENPAR_OR_OUTPUT_OPERATOR : '(' { $setType(OPEN_PAR); } | '(' '(' { $setType(OUTPUT_OPERATOR); }
;
CLOSEPAR_OR_INPUT_OPERATOR : ')' { $setType(CLOSE_PAR); } | ')' ')' { $setType(INPUT_OPERATOR); }
;
protected
OPEN_PAR : '(' ;
protected
CLOSE_PAR : ')' ;
protected
INPUT_OPERATOR : ')' ')' ;
protected
OUTPUT_OPERATOR : '(' '(' ;
BOOLEAN : 't' 'r' 'u' 'e' | 'f' 'a' 'l' 's' 'e' ;
LOWER : '<' ;
LOWER_EQUAL : LOWER '=' ;
UPPER : '>' ;
UPPER_EQUAL : UPPER '=' ;
ASSIGN : '=' ;
EQUAL : '=' '=' ;
NOT : '!' ;
NOT_EQUAL : NOT '=' ;
ADD : '+' ;
ADD_TO_PREVIOUS : ADD '=' ;
INCREMENT : ADD ADD ;
MINUS : '-' ;
MINUS_FROM_PREVIOUS : MINUS '=' ;
DECREMENT : MINUS MINUS ;
MULTIPLY : '*' ;
MULTIPLY_TO_PREVIOUS : MULTIPLY '=' ;
DIVIDE : '/' ;
DIVIDE_FROM_PREVIOUS : DIVIDE '=' ;
MODE : '%' ;
OPEN_BRAKET : '[' ;
CLOSE_BRAKET : ']' ;
OPEN_BRACE : '{' ;
CLOSE_BRACE : '}' ;
COLON : ':' ;
SEMICOLON : ';' ;
COMMA : ',' ;
SINGLE_LINE_COMMENT :
'#' '#' ( ~ ('\n'|'\r') )* ( '\n' | '\r' ('\n')? )? { $setType(Token.SKIP); newline(); }
;
MULTIPLE_LINE_COMMENT : '#' ( options {greedy=false;} : . )* '#' { $setType(Token.SKIP); }
;
WS :
( ' '
| '\t'
| '\r' { newline(); }
| '\n' { newline(); }
)
{ $setType(Token.SKIP); }
;
protected
ESC_SEQ : '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
;
STRING :
'"' ( ESC_SEQ | ~('\\'|'"') )* '"'
;
CHAR :
'\'' ( ESC_SEQ | ~('\''|'\\') ) '\''
;
INT_KEYWORD : 'i' 'n' 't' ;
FLOAT_KEYWORD : 'f' 'l' 'o' 'a' 't' ;
CHAR_KEYWORD : 'c' 'h' 'a' 'r' ;
STRING_KEYWORD : 's' 't' 'r' 'i' 'n' 'g' ;
BOOLEAN_KEYWORD : 'b' 'o' 'o' 'l' 'e' 'a' 'n' ;
INPUT_KEYWORD : 'i' 'n' ID { $setType(ID); }
| 'i' 'n'
;
OUTPUT_KEYWORD : 'o' 'u' 't' ID { $setType(ID); }
| 'o' 'u' 't' ;
IF_KEYWORD : 'i' 'f' ;
FOR_KEYWORD : 'f' 'o' 'r' ;
SWITCH_KEYWORD : 's' 'w' 'i' 't' 'c' 'h' ;
CASE_KEYWORD : 'c' 'a' 's' 'e' ;
BREAK_KEYWORD : 'b' 'r' 'e' 'a' 'k' ;
DEFAULT_KEYWORD : 'd' 'e' 'f' 'a' 'u' 'l' 't' ;
WHILE_KEYWORD : 'w' 'h' 'i' 'l' 'e' ;
ELSE_KEYWORD : 'e' 'l' 's' 'e' ;
ELSEIF_KEYWORD : 'e' 'l' 's' 'e' 'i' 'f' ;
AND_KEYWORD : 'a' 'n' 'd' ;
OR_KEYWORD : 'o' 'r' ;
NOT_KEYWORD : 'n' 'o' 't' ;
CONSTANT_KEYWORD : 'c' 'o' 'n' 's' 't' 'a' 'n' 't' ;
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
浏览完你的语法后,我对它有 7 个评论:
1
k=?
表示解析器规则的前瞻,因为你的语法是词法分析器语法,所以将其删除;2
虽然没有错误,但
BOOLEAN_KEYWORD : 'b' 'o' 'o' 'l' 'e' 'a' 'n';
相当冗长。请改为BOOLEAN_KEYWORD : 'boolean';
。3
关键字
protected
在 ANTLR 3 中已更改为fragment
。但你却在做一些奇怪的事情。采用以下规则:您创建两个片段,然后让
FLOAT_OR_INT
检查谓词是否“看到”INT
后跟'.',然后将其更改为
FLOAT
。以下代码执行相同的操作,并且更具可读性/更好/更受欢迎:4
.*
默认情况下是不贪婪的,因此更改为
:甚至更好:
5
规则:
应该简单地是:
6
ANTLR 的词法分析器尝试匹配尽可能多的字符。每当两个规则匹配相同数量的字符时,首先定义的规则将“获胜”。这就是为什么您应该在
ID
规则之前定义所有*_KEYWORD
规则。7
最后,您不需要检查
"in"
或"out"
后面是否跟有ID
(然后更改类型的令牌)。每当词法分析器“看到”像"inside"
这样的输入时,它总是会创建一个ID
标记,而不是INPUT_KEYWORD< /code> 后跟一个
ID
,因为词法分析器会尽可能匹配(请参阅备注 #6)。看来您正在尝试通过反复试验来学习 ANTLR,或者正在使用过时的文档。这不是学习 ANTLR 的方法。尝试获取 Parr 的权威 ANTLR 参考来正确学习它。
祝你好运!
编辑
好吧,如果你无法让它工作,这里是你的语法的工作版本:
I have 7 remarks about your grammar after glancing over it:
1
k=?
denotes the look-ahead for parser rules and since yours is a lexer grammar, remove it;2
Although not wrong,
BOOLEAN_KEYWORD : 'b' 'o' 'o' 'l' 'e' 'a' 'n';
is rather verbose. DoBOOLEAN_KEYWORD : 'boolean';
instead.3
The keyword
protected
has changed in ANTLR 3 tofragment
. But you're doing odd things. Take the following rules:You create two fragments, and then have
FLOAT_OR_INT
check through a predicate if it "sees" anINT
followed by a'.'
and then change it into aFLOAT
. The following does the same and is far more readable/better/preferred:4
.*
is ungreedy by default, so change:into
or even better:
5
The rule:
should simply be:
6
ANTLR's lexer tries to match as much characters as possible. Whenever two rules match the same amount of characters, the rule defined firs will "win". That is why you should define all your
*_KEYWORD
rules before theID
rule.7
Lastly, you don't need to check if
"in"
or"out"
is followed by anID
(and then change the type of the token). Whenever the lexer "sees" input like"inside"
, it will always create a singleID
token, and not anINPUT_KEYWORD
followed by anID
, since the lexer matches as much as possible (see remark #6).It appears you're trying to learn ANTLR by trial and error, or are using out-dated documentation. This is not the way to learn ANTLR. Try to get a hold of Parr's The Definitive ANTLR Reference to learn it properly.
Good luck!
EDIT
Well, in case you don't manage to get it working, here's a working version of your grammar: