ANTLR:R5RS 词法结构的语法,数字问题
我正在使用 DLTK 在 eclipse 中实现方案的 IDE。到目前为止,我正在编写语法来识别词汇结构。
我正在关注官方 EBNF,可以在此处查看:
http://rose-r5rs.googlecode.com/hg/doc/r5rs -grammar.html
我无法获得数字语法的简单形式。例如十进制数字,我的
grammar r5rsnumbers;
options {
language = Java;
}
program:
NUMBER;
// NUMBERS
NUMBER : /*NUM_2 | NUM_8 |*/ NUM_10; //| NUM_16;
fragment NUM_10 : PREFIX_10 COMPLEX_10;
fragment COMPLEX_10
: REAL_10 (
'@' REAL_10
| '+' (
UREAL_10 'i'
| 'i'
)?
| '-' (
UREAL_10 'i'
| 'i'
)?
)?
| '+' (
UREAL_10 'i'
| 'i'
)?
| '-' (
UREAL_10 'i'
| 'i'
)?;
fragment REAL_10 : SIGN UREAL_10;
fragment UREAL_10
: UINTEGER_10 ('/' UINTEGER_10)?
| DECIMAL_10;
fragment UINTEGER_10 : DIGIT_10+ '#'*;
fragment DECIMAL_10
: UINTEGER_10 SUFFIX
| '.' DIGIT_10+ '#'* SUFFIX
| DIGIT_10+ '.' DIGIT_10* '#'* SUFFIX
| DIGIT_10+ '#'+ '.' '#'* SUFFIX;
fragment PREFIX_10
: RADIX_10 EXACTNESS
| EXACTNESS RADIX_10;
fragment DIGIT : '0'..'9';
fragment EMPTY : '""'; // empty is the empty string
fragment SUFFIX : EMPTY | EXPONENT_MARKER SIGN DIGIT_10+;
fragment EXPONENT_MARKER : 'e' | 's' | 'f' | 'd' | 'l';
fragment SIGN : EMPTY | '+' | '-';
fragment EXACTNESS : EMPTY | '#i' | '#e';
fragment RADIX_10 : EMPTY | '#d';
fragment DIGIT_10 : DIGIT;
问题是,它无法识别任何东西。我不明白从 PREFIX_10 收到的警告或如何解决它。如果我不在规则中使用片段,则该文件不会编译,因为他抱怨 DIGIT_10 规则与几乎所有其他先前规则匹配相同的输入。
与 num_2、num_8 和 num_16 相同
,加上我不确定我的空字符串解决方案。
我怎么去这里?
i'm implementing an IDE for scheme in eclipse using DLTK. So far, i am programming the grammar to recognize the lexical structure.
i'm following the official EBNF which can be viewed here:
http://rose-r5rs.googlecode.com/hg/doc/r5rs-grammar.html
i can't get a simple form of the numbers grammar getting worked. for example the decimal numbers, i have
grammar r5rsnumbers;
options {
language = Java;
}
program:
NUMBER;
// NUMBERS
NUMBER : /*NUM_2 | NUM_8 |*/ NUM_10; //| NUM_16;
fragment NUM_10 : PREFIX_10 COMPLEX_10;
fragment COMPLEX_10
: REAL_10 (
'@' REAL_10
| '+' (
UREAL_10 'i'
| 'i'
)?
| '-' (
UREAL_10 'i'
| 'i'
)?
)?
| '+' (
UREAL_10 'i'
| 'i'
)?
| '-' (
UREAL_10 'i'
| 'i'
)?;
fragment REAL_10 : SIGN UREAL_10;
fragment UREAL_10
: UINTEGER_10 ('/' UINTEGER_10)?
| DECIMAL_10;
fragment UINTEGER_10 : DIGIT_10+ '#'*;
fragment DECIMAL_10
: UINTEGER_10 SUFFIX
| '.' DIGIT_10+ '#'* SUFFIX
| DIGIT_10+ '.' DIGIT_10* '#'* SUFFIX
| DIGIT_10+ '#'+ '.' '#'* SUFFIX;
fragment PREFIX_10
: RADIX_10 EXACTNESS
| EXACTNESS RADIX_10;
fragment DIGIT : '0'..'9';
fragment EMPTY : '""'; // empty is the empty string
fragment SUFFIX : EMPTY | EXPONENT_MARKER SIGN DIGIT_10+;
fragment EXPONENT_MARKER : 'e' | 's' | 'f' | 'd' | 'l';
fragment SIGN : EMPTY | '+' | '-';
fragment EXACTNESS : EMPTY | '#i' | '#e';
fragment RADIX_10 : EMPTY | '#d';
fragment DIGIT_10 : DIGIT;
the problem is, it is not recognizing anything. i don't understand the warning i get from the PREFIX_10 or how to solve it. if i don't use fragment in the rules, the file isn't compiling since he complains about the DIGIT_10 rule matching the same input as almost all other prior rules.
it's the same with num_2, num_8 and num_16
plus, i am not sure with my solution of the empty-string.
how do i get around here?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
请注意,您的 ANTLR 规则:
不匹配空字符串,而是匹配两个双引号。
但是您不希望词法分析器规则仅匹配空字符串:这将导致它进入无限循环,因为任何字符串/源中都有无限数量的空字符串。
因此,BNF 规则:
不应该被翻译为以下 ANTLR 规则:
而是这样:
另请注意,您的规则:
有点难以阅读。以不同的方式缩进可能会使它更容易理解:
这可以通过编写来简化:
另请注意,许多 BNF 表示法不区分小写和大写文字。因此,您可能不想在 ANTLR 语法中编写
'i'
,而是使用('i' | 'I')
来代替。编辑
(片段)规则
PREFIX_10
有几个问题:首先,两者都匹配空字符串。因为替代方案 1总是匹配空字符串,所以替代方案 2 永远不会匹配,这就是 ANTLR 告诉您的。
现在,看看 BNF 规则:
(请注意
{#d}
等于{#d}
,因此;
IMO 只是放错了位置。所有其他半径都没有
部分)我会将它们翻译成以下内容(未经测试!)ANTLR 规则:
** 请注意,它不是:
因为词法分析器不知道通过哪种替代方式来匹配
# d。
如果
的 BNF 规则应该是这样的(即他们忘记放置|
):那么 ANTLR
PREFIX_10 仍应如下所示:
但使用
PREFIX_10
的所有其他规则应使PREFIX_10
可选。华泰
Note that your ANTLR rule:
does not match an empty string, but two double quotes.
But you don't want a lexer rule to match only an empty string: that will cause it to go in an infinite loop since there are an infinite amount of empty strings in any string/source.
So the BNF rules:
should not be translated as the following ANTLR rules:
but like this instead:
Also note that your rule:
is a bit hard to read. Indenting it differently might make this a bit easier to comprehend:
which could be simplified by writing:
Also be aware that many BNF notations make no distinction between lower- and uppercase literals. So instead of writing
'i'
in your ANTLR grammar, you might want to use('i' | 'I')
instead.EDIT
There are a couple of things wrong with the (fragment) rule
PREFIX_10
:For one, both match an empty string. Because alternative 1 will always match an empty string, alternative 2 would never match, which is what ANTLR was telling you.
Now, looking at the BNF rules:
(Note that
<empty> {#d}
equals{#d}
, so the<empty>
is IMO just misplaced. All other radii don't have and<empty>
part)I'd translate those into the following (untested!) ANTLR rules:
** Note that it's not:
because the lexer does not know through which alternative to match
#d
.And in case the BNF rule for
<radix 10>
should be like this (ie. they forgot to place a|
):then the ANTLR
PREFIX_10
should still look like:but then all other rules that use
PREFIX_10
should makePREFIX_10
optional.HTH