ANTLR：R5RS 词法结构的语法，数字问题

发布于 2024-11-14 09:59:25 字数 1769 浏览 9 评论 0原文

我正在使用 DLTK 在 eclipse 中实现方案的 IDE。到目前为止，我正在编写语法来识别词汇结构。

我正在关注官方 EBNF，可以在此处查看：
http://rose-r5rs.googlecode.com/hg/doc/r5rs -grammar.html

我无法获得数字语法的简单形式。例如十进制数字，我的

grammar r5rsnumbers;

options {
  language = Java;
}


program:
NUMBER;

// NUMBERS


NUMBER : /*NUM_2 | NUM_8 |*/ NUM_10; //| NUM_16;
fragment NUM_10 : PREFIX_10 COMPLEX_10;
fragment COMPLEX_10 
: REAL_10 (
            '@' REAL_10
            | '+' (
                    UREAL_10 'i'
                    | 'i'
                    )?  
            | '-' (
                    UREAL_10 'i'
                    | 'i'
                    )?
            )?
    | '+' (
        UREAL_10 'i'
        | 'i'
        )?  
    | '-' (
        UREAL_10 'i'
        | 'i'
        )?;

fragment REAL_10 : SIGN UREAL_10;
fragment UREAL_10 
    : UINTEGER_10 ('/' UINTEGER_10)?
    | DECIMAL_10;
fragment UINTEGER_10 : DIGIT_10+ '#'*;

fragment DECIMAL_10 
    : UINTEGER_10 SUFFIX
    | '.' DIGIT_10+ '#'* SUFFIX
    | DIGIT_10+ '.' DIGIT_10* '#'* SUFFIX
    | DIGIT_10+ '#'+ '.' '#'* SUFFIX;

fragment PREFIX_10 
    : RADIX_10  EXACTNESS
    | EXACTNESS RADIX_10;

fragment DIGIT : '0'..'9';
fragment EMPTY : '""'; // empty is the empty string
fragment SUFFIX : EMPTY | EXPONENT_MARKER SIGN DIGIT_10+;
fragment EXPONENT_MARKER : 'e' | 's' | 'f' | 'd' | 'l';
fragment SIGN : EMPTY | '+' |  '-';
fragment EXACTNESS : EMPTY | '#i' | '#e';
fragment RADIX_10 : EMPTY | '#d';
fragment DIGIT_10 : DIGIT;

问题是，它无法识别任何东西。我不明白从 PREFIX_10 收到的警告或如何解决它。如果我不在规则中使用片段，则该文件不会编译，因为他抱怨 DIGIT_10 规则与几乎所有其他先前规则匹配相同的输入。

与 num_2、num_8 和 num_16 相同

，加上我不确定我的空字符串解决方案。

我怎么去这里？

原文

i'm implementing an IDE for scheme in eclipse using DLTK. So far, i am programming the grammar to recognize the lexical structure.

i'm following the official EBNF which can be viewed here:
http://rose-r5rs.googlecode.com/hg/doc/r5rs-grammar.html

i can't get a simple form of the numbers grammar getting worked. for example the decimal numbers, i have

grammar r5rsnumbers;

options {
  language = Java;
}


program:
NUMBER;

// NUMBERS


NUMBER : /*NUM_2 | NUM_8 |*/ NUM_10; //| NUM_16;
fragment NUM_10 : PREFIX_10 COMPLEX_10;
fragment COMPLEX_10 
: REAL_10 (
            '@' REAL_10
            | '+' (
                    UREAL_10 'i'
                    | 'i'
                    )?  
            | '-' (
                    UREAL_10 'i'
                    | 'i'
                    )?
            )?
    | '+' (
        UREAL_10 'i'
        | 'i'
        )?  
    | '-' (
        UREAL_10 'i'
        | 'i'
        )?;

fragment REAL_10 : SIGN UREAL_10;
fragment UREAL_10 
    : UINTEGER_10 ('/' UINTEGER_10)?
    | DECIMAL_10;
fragment UINTEGER_10 : DIGIT_10+ '#'*;

fragment DECIMAL_10 
    : UINTEGER_10 SUFFIX
    | '.' DIGIT_10+ '#'* SUFFIX
    | DIGIT_10+ '.' DIGIT_10* '#'* SUFFIX
    | DIGIT_10+ '#'+ '.' '#'* SUFFIX;

fragment PREFIX_10 
    : RADIX_10  EXACTNESS
    | EXACTNESS RADIX_10;

fragment DIGIT : '0'..'9';
fragment EMPTY : '""'; // empty is the empty string
fragment SUFFIX : EMPTY | EXPONENT_MARKER SIGN DIGIT_10+;
fragment EXPONENT_MARKER : 'e' | 's' | 'f' | 'd' | 'l';
fragment SIGN : EMPTY | '+' |  '-';
fragment EXACTNESS : EMPTY | '#i' | '#e';
fragment RADIX_10 : EMPTY | '#d';
fragment DIGIT_10 : DIGIT;

the problem is, it is not recognizing anything. i don't understand the warning i get from the PREFIX_10 or how to solve it. if i don't use fragment in the rules, the file isn't compiling since he complains about the DIGIT_10 rule matching the same input as almost all other prior rules.

it's the same with num_2, num_8 and num_16

plus, i am not sure with my solution of the empty-string.

how do i get around here?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

无敌元气妹 2024-11-21 09:59:25

请注意，您的 ANTLR 规则：

EMPTY : '""';

不匹配空字符串，而是匹配两个双引号。

但是您不希望词法分析器规则仅匹配空字符串：这将导致它进入无限循环，因为任何字符串/源中都有无限数量的空字符串。

因此，BNF 规则：

<real 10>
    ::= <sign> <ureal 10>

<sign>
    ::= <empty> | {+} | {-}

不应该被翻译为以下 ANTLR 规则：

REAL_10 
  :  SIGN UREAL_10
  ;

SIGN 
  :  EMPTY 
  |  '+' 
  |  '-'
  ;

而是这样：

REAL_10 
  :  SIGN? UREAL_10
  ;

SIGN 
  :  '+' 
  |  '-'
  ;

另请注意，您的规则：

fragment COMPLEX_10 
: REAL_10 (
            '@' REAL_10
            | '+' (
                    UREAL_10 'i'
                    | 'i'
                    )?  
            | '-' (
                    UREAL_10 'i'
                    | 'i'
                    )?
            )?
    | '+' (
        UREAL_10 'i'
        | 'i'
        )?  
    | '-' (
        UREAL_10 'i'
        | 'i'
        )?;

有点难以阅读。以不同的方式缩进可能会使它更容易理解：

fragment COMPLEX_10
  :  REAL_10 ( '@' REAL_10 
             | '+' (UREAL_10 'i' | 'i')? 
             | '-' (UREAL_10 'i' | 'i')?
             )?
  |  '+' (UREAL_10 'i' | 'i')?  
  |  '-' (UREAL_10 'i' | 'i')?
  ;

这可以通过编写来简化：

fragment COMPLEX_10
  :  REAL_10 ('@' REAL_10)?
  |  REAL_10? ('+' | '-') UREAL_10? 'i'
  ;

另请注意，许多 BNF 表示法不区分小写和大写文字。因此，您可能不想在 ANTLR 语法中编写 'i'，而是使用 ('i' | 'I') 来代替。

编辑

塞巴斯蒂安写道：
但我仍然遇到 PREFIX_10 规则问题：片段 PREFIX_10 ：RADIX_10？精确？ |精确？ RADIX_10?; 这告诉我，替代方案 2 永远无法匹配，尽管它应该分别与 2 个替代方案匹配 #i #d 和 #d #i或者我在这里做错了什么？

（片段）规则 PREFIX_10 有几个问题：

fragment PREFIX_10 
  :  RADIX_10? EXACTNESS? // alternative 1
  |  EXACTNESS? RADIX_10? // alternative 2
  ;

首先，两者都匹配空字符串。因为替代方案 1总是匹配空字符串，所以替代方案 2 永远不会匹配，这就是 ANTLR 告诉您的。

现在，看看 BNF 规则：

<exactness>
    ::= <empty> | {#i} | {#e}

<prefix 10>
    ::= <radix 10> <exactness>
      | <exactness> <radix 10>

<radix 10>
    ::= <empty> {#d}

（请注意 {#d} 等于 {#d}，因此 ; IMO 只是放错了位置。所有其他半径都没有部分）

我会将它们翻译成以下内容（未经测试！）ANTLR 规则：

fragment EXACTNESS
  :  '#i' 
  |  '#e'
  ;

fragment PREFIX_10
  :  RADIX_10 EXACTNESS?
  |  EXACTNESS RADIX_10 // **
  ;

fragment RADIX_10
  :  '#d'
  ;

** 请注意，它不是：

fragment PREFIX_10
  :  RADIX_10 EXACTNESS? // matches '#d'
  |  EXACTNESS? RADIX_10 // matches '#d'
  ;

因为词法分析器不知道通过哪种替代方式来匹配 # d。

如果的 BNF 规则应该是这样的（即他们忘记放置 |）：

<radix 10>
    ::= <empty> 
      | {#d}

那么 ANTLR PREFIX_10 仍应如下所示：

fragment PREFIX_10
  :  RADIX_10 EXACTNESS?
  |  EXACTNESS RADIX_10
  ;

但使用 PREFIX_10 的所有其他规则应使 PREFIX_10 可选。

华泰

Note that your ANTLR rule:

EMPTY : '""';

does not match an empty string, but two double quotes.

But you don't want a lexer rule to match only an empty string: that will cause it to go in an infinite loop since there are an infinite amount of empty strings in any string/source.

So the BNF rules:

<real 10>
    ::= <sign> <ureal 10>

<sign>
    ::= <empty> | {+} | {-}

should not be translated as the following ANTLR rules:

REAL_10 
  :  SIGN UREAL_10
  ;

SIGN 
  :  EMPTY 
  |  '+' 
  |  '-'
  ;

but like this instead:

REAL_10 
  :  SIGN? UREAL_10
  ;

SIGN 
  :  '+' 
  |  '-'
  ;

Also note that your rule:

fragment COMPLEX_10 
: REAL_10 (
            '@' REAL_10
            | '+' (
                    UREAL_10 'i'
                    | 'i'
                    )?  
            | '-' (
                    UREAL_10 'i'
                    | 'i'
                    )?
            )?
    | '+' (
        UREAL_10 'i'
        | 'i'
        )?  
    | '-' (
        UREAL_10 'i'
        | 'i'
        )?;

is a bit hard to read. Indenting it differently might make this a bit easier to comprehend:

fragment COMPLEX_10
  :  REAL_10 ( '@' REAL_10 
             | '+' (UREAL_10 'i' | 'i')? 
             | '-' (UREAL_10 'i' | 'i')?
             )?
  |  '+' (UREAL_10 'i' | 'i')?  
  |  '-' (UREAL_10 'i' | 'i')?
  ;

which could be simplified by writing:

fragment COMPLEX_10
  :  REAL_10 ('@' REAL_10)?
  |  REAL_10? ('+' | '-') UREAL_10? 'i'
  ;

Also be aware that many BNF notations make no distinction between lower- and uppercase literals. So instead of writing 'i' in your ANTLR grammar, you might want to use ('i' | 'I') instead.

EDIT

Sebastian wrote:
but i'm still having problems with the PREFIX_10 rule: fragment PREFIX_10 : RADIX_10? EXACTNESS? | EXACTNESS? RADIX_10?; which tells me that alternative 2 can never be matched, although it should match #i #d and #d #i with the 2 alternatives seperately or am i doing something wrong here?

There are a couple of things wrong with the (fragment) rule PREFIX_10:

fragment PREFIX_10 
  :  RADIX_10? EXACTNESS? // alternative 1
  |  EXACTNESS? RADIX_10? // alternative 2
  ;

For one, both match an empty string. Because alternative 1 will always match an empty string, alternative 2 would never match, which is what ANTLR was telling you.

Now, looking at the BNF rules:

<exactness>
    ::= <empty> | {#i} | {#e}

<prefix 10>
    ::= <radix 10> <exactness>
      | <exactness> <radix 10>

<radix 10>
    ::= <empty> {#d}

(Note that <empty> {#d} equals {#d}, so the <empty> is IMO just misplaced. All other radii don't have and <empty> part)

I'd translate those into the following (untested!) ANTLR rules:

fragment EXACTNESS
  :  '#i' 
  |  '#e'
  ;

fragment PREFIX_10
  :  RADIX_10 EXACTNESS?
  |  EXACTNESS RADIX_10 // **
  ;

fragment RADIX_10
  :  '#d'
  ;

** Note that it's not:

fragment PREFIX_10
  :  RADIX_10 EXACTNESS? // matches '#d'
  |  EXACTNESS? RADIX_10 // matches '#d'
  ;

because the lexer does not know through which alternative to match #d.

And in case the BNF rule for <radix 10> should be like this (ie. they forgot to place a |):

<radix 10>
    ::= <empty> 
      | {#d}

then the ANTLR PREFIX_10 should still look like:

fragment PREFIX_10
  :  RADIX_10 EXACTNESS?
  |  EXACTNESS RADIX_10
  ;

but then all other rules that use PREFIX_10 should make PREFIX_10 optional.

HTH

回复收藏 0 原文

~没有更多了~

关于作者

荒人说梦

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

ANTLR：R5RS 词法结构的语法，数字问题

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

编辑

EDIT

关于作者

相关话题

热门标签

推荐作者

夢野间

百度③文鱼

小草泠泠

zhuwenyan

weirdo

坚持沉默

友情链接

ANTLR：R5RS 词法结构的语法，数字问题

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

编辑

EDIT

关于作者

相关话题

热门标签

推荐作者

夢野间

百度③文鱼

小草泠泠

zhuwenyan

weirdo

坚持沉默

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。