ANTLR 解决非 LL(*) 问题和句法谓词

发布于 2024-11-15 09:13:57 字数 1486 浏览 11 评论 0原文

考虑解析器中的以下规则：

expression 
    :   IDENTIFIER
    |   (...)
    |   procedure_call // e.g. (foo 1 2 3)
    |   macro_use   // e.g. (xyz (some datum))
    ;

procedure_call
    :   '(' expression expression* ')'
    ;

macro_use
    :   '(' IDENTIFIER datum* ')'
    ;

表达式规则中的procedure_call

// Note that any string that parses as an <expression> will also parse as a <datum>.
datum 
    : simple_datum 
    | compound_datum
    ;

simple_datum 
    :   BOOLEAN
    |   NUMBER
    |   CHARACTER
    |   STRING
    |   IDENTIFIER
    ;

compound_datum 
    : list 
    | vector
    ;

list 
    :   '(' (datum+ ( '.' datum)?)? ')'
    |   ABBREV_PREFIX datum
    ;

fragment ABBREV_PREFIX
    :   ('\''   | '`' | ',' | ',@')
    ;

vector 
    : '#(' datum* ')'
    ;

和 Macro_rule 替代项会生成非 LL(*) 结构错误。我可以看到问题，因为 (IDENTIFIER) 将解析为两者。但即使当我用 + 而不是 * 定义两者时，它也会生成错误，即使上面的示例不应该再解析。
我想出了句法谓词的用法，但我不知道如何使用它们来实现这里的技巧。
像

expression 
    :   IDENTIFIER 
    |   (...)
    |   (procedure_call)=>procedure_call // e.g. (foo 1 2 3)
    |   macro_use   // e.g. (xyz (some datum))
    ;

或

expression 
    :   IDENTIFIER
    |   (...)
    |   ('(' IDENTIFIER expression)=>procedure_call // e.g. (foo 1 2 3)
    |   macro_use   // e.g. (xyz (some datum))
    ;

这样的规则也不起作用，因为除了第一个规则之外，没有任何规则会匹配任何内容。有没有合适的方法来解决这个问题？

原文

consider following rules in the parser:

expression 
    :   IDENTIFIER
    |   (...)
    |   procedure_call // e.g. (foo 1 2 3)
    |   macro_use   // e.g. (xyz (some datum))
    ;

procedure_call
    :   '(' expression expression* ')'
    ;

macro_use
    :   '(' IDENTIFIER datum* ')'
    ;

and

// Note that any string that parses as an <expression> will also parse as a <datum>.
datum 
    : simple_datum 
    | compound_datum
    ;

simple_datum 
    :   BOOLEAN
    |   NUMBER
    |   CHARACTER
    |   STRING
    |   IDENTIFIER
    ;

compound_datum 
    : list 
    | vector
    ;

list 
    :   '(' (datum+ ( '.' datum)?)? ')'
    |   ABBREV_PREFIX datum
    ;

fragment ABBREV_PREFIX
    :   ('\''   | '`' | ',' | ',@')
    ;

vector 
    : '#(' datum* ')'
    ;

the procedure_call and macro_rule alternative in the expression rule generate an non-LL(*) structure error. I can see the problem, since (IDENTIFIER) will parse as both. but even when i define both with + instead of *, it generates the error, even though above example shouldn't be parsing anymore.
i came up with the usage of syntactic predicates, but i can't figure out how to use them to do the trick here.
something like

expression 
    :   IDENTIFIER 
    |   (...)
    |   (procedure_call)=>procedure_call // e.g. (foo 1 2 3)
    |   macro_use   // e.g. (xyz (some datum))
    ;

expression 
    :   IDENTIFIER
    |   (...)
    |   ('(' IDENTIFIER expression)=>procedure_call // e.g. (foo 1 2 3)
    |   macro_use   // e.g. (xyz (some datum))
    ;

doesnt work either, since none but the first rule will match anything. is there a proper way to solve that?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

大海や 2024-11-22 09:13:57

我发现了 R5RS 的 JavaCC 语法，我用它（很快！）编写了一个 ANTLR 等效项

/*
 * Copyright (C) 2011 by Bart Kiers, based on the work done by Håkan L. Younes'
 * JavaCC R5RS grammar, available at: http://mindprod.com/javacc/R5RS.jj
 * 
 * Permission is hereby granted, free of charge, to any person obtaining a copy
 * of this software and associated documentation files (the "Software"), to deal
 * in the Software without restriction, including without limitation the rights
 * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 * copies of the Software, and to permit persons to whom the Software is
 * furnished to do so, subject to the following conditions:
 * 
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 * 
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 * THE SOFTWARE.
 */
grammar R5RS;

parse
  :  commandOrDefinition* EOF
  ;

commandOrDefinition
  :  (syntaxDefinition)=>              syntaxDefinition
  |  (definition)=>                    definition
  |  ('(' BEGIN commandOrDefinition)=> '(' BEGIN commandOrDefinition+ ')'
  |                                    command
  ;

syntaxDefinition
  :  '(' DEFINE_SYNTAX keyword transformerSpec ')'
  ;

definition
  :  '(' DEFINE ( variable expression ')'
                | '(' variable defFormals ')' body ')'
                )
  |  '(' BEGIN definition* ')'
  ;

defFormals
  :  variable* ('.' variable)?
  ;

keyword
  :  identifier
  ;

transformerSpec
  :  '(' SYNTAX_RULES '(' identifier* ')' syntaxRule* ')'
  ;

syntaxRule
  :  '(' pattern template ')'
  ;

pattern
  :  patternIdentifier
  |  '(' (pattern+ ('.' pattern | ELLIPSIS)?)?  ')'
  |  '#(' (pattern+ ELLIPSIS? )? ')'
  |  patternDatum
  ;

patternIdentifier
  :  syntacticKeyword
  |  VARIABLE
  ;

patternDatum
  :  STRING
  |  CHARACTER
  |  bool
  |  number
  ;

template
  :  patternIdentifier
  |  '(' (templateElement+ ('.' templateElement)?)? ')'
  |  '#(' templateElement* ')'
  |  templateDatum
  ;

templateElement
  :  template ELLIPSIS?
  ;

templateDatum
  :  patternDatum
  ;

command
  :  expression
  ;

identifier
  :  syntacticKeyword
  |  variable
  ;

syntacticKeyword
  :  expressionKeyword
  |  ELSE
  |  ARROW
  |  DEFINE
  |  UNQUOTE
  |  UNQUOTE_SPLICING
  ;  

expressionKeyword
  :  QUOTE
  |  LAMBDA
  |  IF
  |  SET
  |  BEGIN
  |  COND
  |  AND
  |  OR
  |  CASE
  |  LET
  |  LETSTAR
  |  LETREC
  |  DO
  |  DELAY
  |  QUASIQUOTE
  ;  

expression
  :  (variable)=>          variable
  |  (literal)=>           literal
  |  (lambdaExpression)=>  lambdaExpression
  |  (conditional)=>       conditional
  |  (assignment)=>        assignment
  |  (derivedExpression)=> derivedExpression
  |  (procedureCall)=>     procedureCall
  |  (macroUse)=>          macroUse
  |                        macroBlock
  ;

variable
  :  VARIABLE
  |  ELLIPSIS
  ;

literal
  :  quotation
  |  selfEvaluating
  ;

quotation
  :  '\'' datum
  |  '(' QUOTE datum ')'
  ;

selfEvaluating
  :  bool
  |  number
  |  CHARACTER
  |  STRING
  ;

lambdaExpression
  :  '(' LAMBDA formals body ')'
  ;

formals
  :  '(' (variable+ ('.' variable)?)? ')'
  |  variable
  ;

conditional
  :  '(' IF test consequent alternate? ')'
  ;

test 
  :  expression
  ;

consequent  
  :  expression
  ;

alternate 
  :  expression
  ;

assignment
  :  '(' SET variable expression ')'
  ;

derivedExpression
  :  quasiquotation
  |  '(' ( COND ( '(' ELSE sequence ')'
                | condClause+ ('(' ELSE sequence ')')? 
                )
         | CASE expression ( '(' ELSE sequence ')'
                           | caseClause+ ('(' ELSE sequence ')')? 
                           )
         | AND test*
         | OR test*
         | LET variable? '(' bindingSpec* ')' body
         | LETSTAR '(' bindingSpec* ')' body
         | LETREC '(' bindingSpec* ')' body
         | BEGIN sequence
         | DO '(' iterationSpec* ')' '(' test doResult? ')' command*
         | DELAY expression
         ) 
     ')'
  ;

condClause
  :  '(' test (sequence | ARROW recipient)? ')'
  ;

recipient
  :  expression
  ;

caseClause
  :  '(' '(' datum* ')' sequence ')'
  ;

bindingSpec
  :  '(' variable expression ')'
  ;

iterationSpec
  :  '(' variable init step? ')'
  ;

init
  :  expression
  ;

step
  :  expression
  ;

doResult
  :  sequence
  ;

procedureCall
  :  '(' operator operand* ')'
  ;

operator
  :  expression
  ;

operand
  :  expression
  ;

macroUse
  :  '(' keyword datum* ')'
  ;

macroBlock
  :  '(' (LET_SYNTAX | LETREC_SYNTAX) '(' syntaxSpec* ')' body ')'
  ;

syntaxSpec
  :  '(' keyword transformerSpec ')'
  ;

body
  :  ((definition)=> definition)* sequence
  ;

//sequence
//  :  ((command)=> command)* expression
//  ;

sequence
  :  expression+
  ;

datum
  :  simpleDatum
  |  compoundDatum
  ;

simpleDatum
  :  bool
  |  number
  |  CHARACTER
  |  STRING
  |  identifier
  ;

compoundDatum
  :  list
  |  vector
  ;

list
  :  '(' (datum+ ('.' datum)?)? ')'
  |  abbreviation
  ;

abbreviation
  :  abbrevPrefix datum
  ;

abbrevPrefix
  :  '\'' | '`' | ',@' | ','
  ;

vector
  :  '#(' datum* ')'
  ;

number
  :  NUM_2
  |  NUM_8
  |  NUM_10
  |  NUM_16
  ;

bool
  :  TRUE
  |  FALSE
  ;

quasiquotation
  :  quasiquotationD[1]
  ;

quasiquotationD[int d]
  :  '`' qqTemplate[d]
  |  '(' QUASIQUOTE qqTemplate[d] ')'
  ;

qqTemplate[int d]
  :  (expression)=>  expression
  |  ('(' UNQUOTE)=> unquotation[d]
  |                  simpleDatum
  |                  vectorQQTemplate[d]
  |                  listQQTemplate[d]
  ;

vectorQQTemplate[int d]
  :  '#(' qqTemplateOrSplice[d]* ')'
  ;

listQQTemplate[int d]
  :                     '\'' qqTemplate[d]
  |  ('(' QUASIQUOTE)=> quasiquotationD[d+1]
  |                     '(' (qqTemplateOrSplice[d]+ ('.' qqTemplate[d])?)? ')'
  ;

unquotation[int d]
  :  ',' qqTemplate[d-1]
  |  '(' UNQUOTE qqTemplate[d-1] ')'
  ;

qqTemplateOrSplice[int d]
  :  ('(' UNQUOTE_SPLICING)=> splicingUnquotation[d]
  |                           qqTemplate[d]
  ;

splicingUnquotation[int d]
  :  ',@' qqTemplate[d-1]
  |  '(' UNQUOTE_SPLICING qqTemplate[d-1] ')'
  ;

// macro keywords
LET_SYNTAX       : 'let-syntax';
LETREC_SYNTAX    : 'letrec-syntax';
SYNTAX_RULES     : 'syntax-rules';
DEFINE_SYNTAX    : 'define-syntax';

// syntactic keywords
ELSE             : 'else';
ARROW            : '=>';
DEFINE           : 'define';
UNQUOTE_SPLICING : 'unquote-splicing';
UNQUOTE          : 'unquote';

// expression keywords
QUOTE            : 'quote';
LAMBDA           : 'lambda';
IF               : 'if';
SET              : 'set!';
BEGIN            : 'begin';
COND             : 'cond';
AND              : 'and';
OR               : 'or';
CASE             : 'case';
LET              : 'let';
LETSTAR          : 'let*';
LETREC           : 'letrec';
DO               : 'do';
DELAY            : 'delay';
QUASIQUOTE       : 'quasiquote';

NUM_2  : PREFIX_2 COMPLEX_2;
NUM_8  : PREFIX_8 COMPLEX_8;
NUM_10 : PREFIX_10? COMPLEX_10;
NUM_16 : PREFIX_16 COMPLEX_16;

ELLIPSIS : '...';

VARIABLE 
  :  INITIAL SUBSEQUENT* 
  |  PECULIAR_IDENTIFIER
  ;

STRING : '"' STRING_ELEMENT* '"';

CHARACTER : '#\\' (~(' ' | '\n') | CHARACTER_NAME);

TRUE  : '#' ('t' | 'T');
FALSE : '#' ('f' | 'F');

// to ignore
SPACE   : (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};
COMMENT : ';' ~('\r' | '\n')* {$channel=HIDDEN;};

// fragments  
fragment INITIAL : LETTER | SPECIAL_INITIAL;
fragment LETTER : 'a'..'z' | 'A'..'Z';
fragment SPECIAL_INITIAL : '!' | '
：可以使用以下类进行测试：
import org.antlr.runtime.*;

public class Main {
  public static void main(String[] args) throws Exception {
    String source = 
        "(define sum-iter                        \n" + 
        "  (lambda(n acc i)                      \n" + 
        "    (if (> i n)                         \n" + 
        "      acc                               \n" + 
        "      (sum-iter n (+ acc i) (+ i 1)))))   ";
    R5RSLexer lexer = new R5RSLexer(new ANTLRStringStream(source));
    R5RSParser parser = new R5RSParser(new CommonTokenStream(lexer));
    parser.parse();
  }
}

并生成词法分析器 &解析器，编译所有 Java 源文件并运行主类，执行以下操作：
bart@hades:~/Programming/ANTLR/Demos/R5RS$ java -cp antlr-3.3.jar org.antlr.Tool R5RS.g 
bart@hades:~/Programming/ANTLR/Demos/R5RS$ javac -cp antlr-3.3.jar *.java
bart@hades:~/Programming/ANTLR/Demos/R5RS$ java -cp .:antlr-3.3.jar Main
bart@hades:~/Programming/ANTLR/Demos/R5RS$
控制台上没有打印任何内容，这意味着解析器（和词法分析器）没有发现所提供的源有任何错误。
请注意，我没有单元测试，仅测试了 Main 类中的单个方案源。如果您发现 ANTLR 语法中有错误，我很乐意听到这些错误，以便我可以修复语法。在适当的时候，我可能会将语法提交到官方 ANTLR Wiki。
 | '%' | '&' | '*' | '/' | ':' | '<' | '=' | '>' | '?' | '^' | '_' | '~';
fragment SUBSEQUENT : INITIAL | DIGIT | SPECIAL_SUBSEQUENT;
fragment DIGIT : '0'..'9';
fragment SPECIAL_SUBSEQUENT : '.' | '+' | '-' | '@';
fragment PECULIAR_IDENTIFIER : '+' | '-';
fragment STRING_ELEMENT : ~('"' | '\\') | '\\' ('"' | '\\');
fragment CHARACTER_NAME : 'space' | 'newline';

fragment COMPLEX_2 
  :  REAL_2 ('@' REAL_2)?
  |  REAL_2? SIGN UREAL_2? ('i' | 'I')
  ;

fragment COMPLEX_8 
  :  REAL_8 ('@' REAL_8)?
  |  REAL_8? SIGN UREAL_8? ('i' | 'I')
  ;

fragment COMPLEX_10 
  :  REAL_10 ('@' REAL_10)?
  |  REAL_10? SIGN UREAL_10? ('i' | 'I')
  ;

fragment COMPLEX_16 
  :  REAL_16 ('@' REAL_16)?
  |  REAL_16? SIGN UREAL_16? ('i' | 'I')
  ;

fragment REAL_2 : SIGN? UREAL_2;
fragment REAL_8 : SIGN? UREAL_8;
fragment REAL_10 : SIGN? UREAL_10;
fragment REAL_16 : SIGN? UREAL_16;
fragment UREAL_2 : UINTEGER_2 ('/' UINTEGER_2)?;
fragment UREAL_8 : UINTEGER_8 ('/' UINTEGER_8)?;
fragment UREAL_10 : UINTEGER_10 ('/' UINTEGER_10)? | DECIMAL_10;
fragment UREAL_16 : UINTEGER_16 ('/' UINTEGER_16)?;

fragment DECIMAL_10 
  :  UINTEGER_10 SUFFIX
  |  '.' DIGIT+ '#'* SUFFIX?
  |  DIGIT+ '.' DIGIT* '#'* SUFFIX?
  |  DIGIT+ '#'+ '.' '#'* SUFFIX?
  ;

fragment UINTEGER_2 : DIGIT_2+ '#'*;
fragment UINTEGER_8 : DIGIT_8+ '#'*;
fragment UINTEGER_10 : DIGIT+ '#'*;
fragment UINTEGER_16 : DIGIT_16+ '#'*;
fragment PREFIX_2 : RADIX_2 EXACTNESS? | EXACTNESS RADIX_2;
fragment PREFIX_8 : RADIX_8 EXACTNESS? | EXACTNESS RADIX_8;
fragment PREFIX_10 : RADIX_10 EXACTNESS? | EXACTNESS RADIX_10;
fragment PREFIX_16 : RADIX_16 EXACTNESS? | EXACTNESS RADIX_16;
fragment SUFFIX : EXPONENT_MARKER SIGN? DIGIT+;
fragment EXPONENT_MARKER : 'e' | 's' | 'f' | 'd' | 'l' | 'E' | 'S' | 'F' | 'D' | 'L';
fragment SIGN : '+' | '-';
fragment EXACTNESS : '#' ('i' | 'e' | 'I' | 'E');
fragment RADIX_2 : '#' ('b' | 'B');
fragment RADIX_8 : '#' ('o' | 'O');
fragment RADIX_10 : '#' ('d' | 'D');
fragment RADIX_16 : '#' ('x' | 'X');
fragment DIGIT_2 : '0' | '1';
fragment DIGIT_8 : '0'..'7';
fragment DIGIT_16 : DIGIT | 'a'..'f' | 'A'..'F';

：可以使用以下类进行测试：

并生成词法分析器 &解析器，编译所有 Java 源文件并运行主类，执行以下操作：

控制台上没有打印任何内容，这意味着解析器（和词法分析器）没有发现所提供的源有任何错误。

请注意，我没有单元测试，仅测试了 Main 类中的单个方案源。如果您发现 ANTLR 语法中有错误，我很乐意听到这些错误，以便我可以修复语法。在适当的时候，我可能会将语法提交到官方 ANTLR Wiki。

I found a JavaCC grammar of R5RS which I used to (quickly!) write an ANTLR equivalent:

/*
 * Copyright (C) 2011 by Bart Kiers, based on the work done by Håkan L. Younes'
 * JavaCC R5RS grammar, available at: http://mindprod.com/javacc/R5RS.jj
 * 
 * Permission is hereby granted, free of charge, to any person obtaining a copy
 * of this software and associated documentation files (the "Software"), to deal
 * in the Software without restriction, including without limitation the rights
 * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 * copies of the Software, and to permit persons to whom the Software is
 * furnished to do so, subject to the following conditions:
 * 
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 * 
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 * THE SOFTWARE.
 */
grammar R5RS;

parse
  :  commandOrDefinition* EOF
  ;

commandOrDefinition
  :  (syntaxDefinition)=>              syntaxDefinition
  |  (definition)=>                    definition
  |  ('(' BEGIN commandOrDefinition)=> '(' BEGIN commandOrDefinition+ ')'
  |                                    command
  ;

syntaxDefinition
  :  '(' DEFINE_SYNTAX keyword transformerSpec ')'
  ;

definition
  :  '(' DEFINE ( variable expression ')'
                | '(' variable defFormals ')' body ')'
                )
  |  '(' BEGIN definition* ')'
  ;

defFormals
  :  variable* ('.' variable)?
  ;

keyword
  :  identifier
  ;

transformerSpec
  :  '(' SYNTAX_RULES '(' identifier* ')' syntaxRule* ')'
  ;

syntaxRule
  :  '(' pattern template ')'
  ;

pattern
  :  patternIdentifier
  |  '(' (pattern+ ('.' pattern | ELLIPSIS)?)?  ')'
  |  '#(' (pattern+ ELLIPSIS? )? ')'
  |  patternDatum
  ;

patternIdentifier
  :  syntacticKeyword
  |  VARIABLE
  ;

patternDatum
  :  STRING
  |  CHARACTER
  |  bool
  |  number
  ;

template
  :  patternIdentifier
  |  '(' (templateElement+ ('.' templateElement)?)? ')'
  |  '#(' templateElement* ')'
  |  templateDatum
  ;

templateElement
  :  template ELLIPSIS?
  ;

templateDatum
  :  patternDatum
  ;

command
  :  expression
  ;

identifier
  :  syntacticKeyword
  |  variable
  ;

syntacticKeyword
  :  expressionKeyword
  |  ELSE
  |  ARROW
  |  DEFINE
  |  UNQUOTE
  |  UNQUOTE_SPLICING
  ;  

expressionKeyword
  :  QUOTE
  |  LAMBDA
  |  IF
  |  SET
  |  BEGIN
  |  COND
  |  AND
  |  OR
  |  CASE
  |  LET
  |  LETSTAR
  |  LETREC
  |  DO
  |  DELAY
  |  QUASIQUOTE
  ;  

expression
  :  (variable)=>          variable
  |  (literal)=>           literal
  |  (lambdaExpression)=>  lambdaExpression
  |  (conditional)=>       conditional
  |  (assignment)=>        assignment
  |  (derivedExpression)=> derivedExpression
  |  (procedureCall)=>     procedureCall
  |  (macroUse)=>          macroUse
  |                        macroBlock
  ;

variable
  :  VARIABLE
  |  ELLIPSIS
  ;

literal
  :  quotation
  |  selfEvaluating
  ;

quotation
  :  '\'' datum
  |  '(' QUOTE datum ')'
  ;

selfEvaluating
  :  bool
  |  number
  |  CHARACTER
  |  STRING
  ;

lambdaExpression
  :  '(' LAMBDA formals body ')'
  ;

formals
  :  '(' (variable+ ('.' variable)?)? ')'
  |  variable
  ;

conditional
  :  '(' IF test consequent alternate? ')'
  ;

test 
  :  expression
  ;

consequent  
  :  expression
  ;

alternate 
  :  expression
  ;

assignment
  :  '(' SET variable expression ')'
  ;

derivedExpression
  :  quasiquotation
  |  '(' ( COND ( '(' ELSE sequence ')'
                | condClause+ ('(' ELSE sequence ')')? 
                )
         | CASE expression ( '(' ELSE sequence ')'
                           | caseClause+ ('(' ELSE sequence ')')? 
                           )
         | AND test*
         | OR test*
         | LET variable? '(' bindingSpec* ')' body
         | LETSTAR '(' bindingSpec* ')' body
         | LETREC '(' bindingSpec* ')' body
         | BEGIN sequence
         | DO '(' iterationSpec* ')' '(' test doResult? ')' command*
         | DELAY expression
         ) 
     ')'
  ;

condClause
  :  '(' test (sequence | ARROW recipient)? ')'
  ;

recipient
  :  expression
  ;

caseClause
  :  '(' '(' datum* ')' sequence ')'
  ;

bindingSpec
  :  '(' variable expression ')'
  ;

iterationSpec
  :  '(' variable init step? ')'
  ;

init
  :  expression
  ;

step
  :  expression
  ;

doResult
  :  sequence
  ;

procedureCall
  :  '(' operator operand* ')'
  ;

operator
  :  expression
  ;

operand
  :  expression
  ;

macroUse
  :  '(' keyword datum* ')'
  ;

macroBlock
  :  '(' (LET_SYNTAX | LETREC_SYNTAX) '(' syntaxSpec* ')' body ')'
  ;

syntaxSpec
  :  '(' keyword transformerSpec ')'
  ;

body
  :  ((definition)=> definition)* sequence
  ;

//sequence
//  :  ((command)=> command)* expression
//  ;

sequence
  :  expression+
  ;

datum
  :  simpleDatum
  |  compoundDatum
  ;

simpleDatum
  :  bool
  |  number
  |  CHARACTER
  |  STRING
  |  identifier
  ;

compoundDatum
  :  list
  |  vector
  ;

list
  :  '(' (datum+ ('.' datum)?)? ')'
  |  abbreviation
  ;

abbreviation
  :  abbrevPrefix datum
  ;

abbrevPrefix
  :  '\'' | '`' | ',@' | ','
  ;

vector
  :  '#(' datum* ')'
  ;

number
  :  NUM_2
  |  NUM_8
  |  NUM_10
  |  NUM_16
  ;

bool
  :  TRUE
  |  FALSE
  ;

quasiquotation
  :  quasiquotationD[1]
  ;

quasiquotationD[int d]
  :  '`' qqTemplate[d]
  |  '(' QUASIQUOTE qqTemplate[d] ')'
  ;

qqTemplate[int d]
  :  (expression)=>  expression
  |  ('(' UNQUOTE)=> unquotation[d]
  |                  simpleDatum
  |                  vectorQQTemplate[d]
  |                  listQQTemplate[d]
  ;

vectorQQTemplate[int d]
  :  '#(' qqTemplateOrSplice[d]* ')'
  ;

listQQTemplate[int d]
  :                     '\'' qqTemplate[d]
  |  ('(' QUASIQUOTE)=> quasiquotationD[d+1]
  |                     '(' (qqTemplateOrSplice[d]+ ('.' qqTemplate[d])?)? ')'
  ;

unquotation[int d]
  :  ',' qqTemplate[d-1]
  |  '(' UNQUOTE qqTemplate[d-1] ')'
  ;

qqTemplateOrSplice[int d]
  :  ('(' UNQUOTE_SPLICING)=> splicingUnquotation[d]
  |                           qqTemplate[d]
  ;

splicingUnquotation[int d]
  :  ',@' qqTemplate[d-1]
  |  '(' UNQUOTE_SPLICING qqTemplate[d-1] ')'
  ;

// macro keywords
LET_SYNTAX       : 'let-syntax';
LETREC_SYNTAX    : 'letrec-syntax';
SYNTAX_RULES     : 'syntax-rules';
DEFINE_SYNTAX    : 'define-syntax';

// syntactic keywords
ELSE             : 'else';
ARROW            : '=>';
DEFINE           : 'define';
UNQUOTE_SPLICING : 'unquote-splicing';
UNQUOTE          : 'unquote';

// expression keywords
QUOTE            : 'quote';
LAMBDA           : 'lambda';
IF               : 'if';
SET              : 'set!';
BEGIN            : 'begin';
COND             : 'cond';
AND              : 'and';
OR               : 'or';
CASE             : 'case';
LET              : 'let';
LETSTAR          : 'let*';
LETREC           : 'letrec';
DO               : 'do';
DELAY            : 'delay';
QUASIQUOTE       : 'quasiquote';

NUM_2  : PREFIX_2 COMPLEX_2;
NUM_8  : PREFIX_8 COMPLEX_8;
NUM_10 : PREFIX_10? COMPLEX_10;
NUM_16 : PREFIX_16 COMPLEX_16;

ELLIPSIS : '...';

VARIABLE 
  :  INITIAL SUBSEQUENT* 
  |  PECULIAR_IDENTIFIER
  ;

STRING : '"' STRING_ELEMENT* '"';

CHARACTER : '#\\' (~(' ' | '\n') | CHARACTER_NAME);

TRUE  : '#' ('t' | 'T');
FALSE : '#' ('f' | 'F');

// to ignore
SPACE   : (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};
COMMENT : ';' ~('\r' | '\n')* {$channel=HIDDEN;};

// fragments  
fragment INITIAL : LETTER | SPECIAL_INITIAL;
fragment LETTER : 'a'..'z' | 'A'..'Z';
fragment SPECIAL_INITIAL : '!' | '
which can be tested with the following class:
import org.antlr.runtime.*;

public class Main {
  public static void main(String[] args) throws Exception {
    String source = 
        "(define sum-iter                        \n" + 
        "  (lambda(n acc i)                      \n" + 
        "    (if (> i n)                         \n" + 
        "      acc                               \n" + 
        "      (sum-iter n (+ acc i) (+ i 1)))))   ";
    R5RSLexer lexer = new R5RSLexer(new ANTLRStringStream(source));
    R5RSParser parser = new R5RSParser(new CommonTokenStream(lexer));
    parser.parse();
  }
}

and to generate a lexer & parser, compile all Java source files and run the main class, do:
bart@hades:~/Programming/ANTLR/Demos/R5RS$ java -cp antlr-3.3.jar org.antlr.Tool R5RS.g 
bart@hades:~/Programming/ANTLR/Demos/R5RS$ javac -cp antlr-3.3.jar *.java
bart@hades:~/Programming/ANTLR/Demos/R5RS$ java -cp .:antlr-3.3.jar Main
bart@hades:~/Programming/ANTLR/Demos/R5RS$
The fact that nothing is being printed on the console means the parser (and lexer) didn't find any errors with the provided source.
Note that I have no Unit tests and have only tested the single Scheme source inside the Main class. If you find errors in the ANTLR grammar, I'd appreciate to hear about them so I can fix the grammar. In due time, I'll probably commit the grammar to the official ANTLR Wiki.
 | '%' | '&' | '*' | '/' | ':' | '<' | '=' | '>' | '?' | '^' | '_' | '~';
fragment SUBSEQUENT : INITIAL | DIGIT | SPECIAL_SUBSEQUENT;
fragment DIGIT : '0'..'9';
fragment SPECIAL_SUBSEQUENT : '.' | '+' | '-' | '@';
fragment PECULIAR_IDENTIFIER : '+' | '-';
fragment STRING_ELEMENT : ~('"' | '\\') | '\\' ('"' | '\\');
fragment CHARACTER_NAME : 'space' | 'newline';

fragment COMPLEX_2 
  :  REAL_2 ('@' REAL_2)?
  |  REAL_2? SIGN UREAL_2? ('i' | 'I')
  ;

fragment COMPLEX_8 
  :  REAL_8 ('@' REAL_8)?
  |  REAL_8? SIGN UREAL_8? ('i' | 'I')
  ;

fragment COMPLEX_10 
  :  REAL_10 ('@' REAL_10)?
  |  REAL_10? SIGN UREAL_10? ('i' | 'I')
  ;

fragment COMPLEX_16 
  :  REAL_16 ('@' REAL_16)?
  |  REAL_16? SIGN UREAL_16? ('i' | 'I')
  ;

fragment REAL_2 : SIGN? UREAL_2;
fragment REAL_8 : SIGN? UREAL_8;
fragment REAL_10 : SIGN? UREAL_10;
fragment REAL_16 : SIGN? UREAL_16;
fragment UREAL_2 : UINTEGER_2 ('/' UINTEGER_2)?;
fragment UREAL_8 : UINTEGER_8 ('/' UINTEGER_8)?;
fragment UREAL_10 : UINTEGER_10 ('/' UINTEGER_10)? | DECIMAL_10;
fragment UREAL_16 : UINTEGER_16 ('/' UINTEGER_16)?;

fragment DECIMAL_10 
  :  UINTEGER_10 SUFFIX
  |  '.' DIGIT+ '#'* SUFFIX?
  |  DIGIT+ '.' DIGIT* '#'* SUFFIX?
  |  DIGIT+ '#'+ '.' '#'* SUFFIX?
  ;

fragment UINTEGER_2 : DIGIT_2+ '#'*;
fragment UINTEGER_8 : DIGIT_8+ '#'*;
fragment UINTEGER_10 : DIGIT+ '#'*;
fragment UINTEGER_16 : DIGIT_16+ '#'*;
fragment PREFIX_2 : RADIX_2 EXACTNESS? | EXACTNESS RADIX_2;
fragment PREFIX_8 : RADIX_8 EXACTNESS? | EXACTNESS RADIX_8;
fragment PREFIX_10 : RADIX_10 EXACTNESS? | EXACTNESS RADIX_10;
fragment PREFIX_16 : RADIX_16 EXACTNESS? | EXACTNESS RADIX_16;
fragment SUFFIX : EXPONENT_MARKER SIGN? DIGIT+;
fragment EXPONENT_MARKER : 'e' | 's' | 'f' | 'd' | 'l' | 'E' | 'S' | 'F' | 'D' | 'L';
fragment SIGN : '+' | '-';
fragment EXACTNESS : '#' ('i' | 'e' | 'I' | 'E');
fragment RADIX_2 : '#' ('b' | 'B');
fragment RADIX_8 : '#' ('o' | 'O');
fragment RADIX_10 : '#' ('d' | 'D');
fragment RADIX_16 : '#' ('x' | 'X');
fragment DIGIT_2 : '0' | '1';
fragment DIGIT_8 : '0'..'7';
fragment DIGIT_16 : DIGIT | 'a'..'f' | 'A'..'F';

which can be tested with the following class:

and to generate a lexer & parser, compile all Java source files and run the main class, do:

The fact that nothing is being printed on the console means the parser (and lexer) didn't find any errors with the provided source.

Note that I have no Unit tests and have only tested the single Scheme source inside the Main class. If you find errors in the ANTLR grammar, I'd appreciate to hear about them so I can fix the grammar. In due time, I'll probably commit the grammar to the official ANTLR Wiki.

回复收藏 0 原文

~没有更多了~