在antlr3语法中切换词法分析器状态

发布于 2024-12-23 10:36:43 字数 1259 浏览 7 评论 0原文

我正在尝试构建 antlr 语法来解析模板语言。该语言可以嵌入到任何文本中，并且边界用开始/结束标记标记：{{ / }}。因此，有效的模板如下所示：

    foo {{ someVariable }} bar

其中 foo 和 bar 应被忽略，以及 {{ 和 }} 内的部分 标签应该被解析。我发现这个问题基本上有问题的答案，除了标签是只有一个 { 和 }。我尝试修改语法以匹配 2 个开始/结束字符，但是一旦我这样做，BUFFER 规则就会消耗所有字符，还有开始和结束括号。 LD 规则永远不会被调用。

有谁知道为什么当分隔符有 2 个字符时，antlr 词法分析器会消耗 Buffer 规则中的所有标记，但当分隔符只有一个字符时却不会消耗它们？

    grammar Test;

    options { 
      output=AST;
      ASTLabelType=CommonTree; 
    }

    @lexer::members {
      private boolean insideTag = false;
    }

    start   
      :  (tag | BUFFER )*
      ;

    tag
      : LD IDENT^ RD
      ;

    LD @after {
      // flip lexer the state
      insideTag=true;
      System.err.println("FLIPPING TAG");
    } : '{{';

    RD @after {
      // flip the state back
      insideTag=false;
    } : '}}';

    SPACE    : (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};
    IDENT    : (LETTER)*;
    BUFFER   : { !insideTag }?=> ~(LD | RD)+;

    fragment LETTER : ('a'..'z' | 'A'..'Z');

原文

I'm trying to construct an antlr grammar to parse a templating language. that language can be embedded in any text and the boundaries are marked with opening/closing tags: {{ / }}. So a valid template looks like this:

    foo {{ someVariable }} bar

Where foo and bar should be ignored, and the part inside the {{ and }} tags should be parsed. I've found this question which basically has an answer for the problem, except that the tags are only one { and }. I've tried to modify the grammar to match 2 opening/closing characters, but as soon as i do this, the BUFFER rule consumes ALL characters, also the opening and closing brackets. The LD rule is never being invoked.

Has anyone an idea why the antlr lexer is consuming all tokens in the Buffer rule when the delimiters have 2 characters, but does not consume the delimiters when they have only one character?

    grammar Test;

    options { 
      output=AST;
      ASTLabelType=CommonTree; 
    }

    @lexer::members {
      private boolean insideTag = false;
    }

    start   
      :  (tag | BUFFER )*
      ;

    tag
      : LD IDENT^ RD
      ;

    LD @after {
      // flip lexer the state
      insideTag=true;
      System.err.println("FLIPPING TAG");
    } : '{{';

    RD @after {
      // flip the state back
      insideTag=false;
    } : '}}';

    SPACE    : (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};
    IDENT    : (LETTER)*;
    BUFFER   : { !insideTag }?=> ~(LD | RD)+;

    fragment LETTER : ('a'..'z' | 'A'..'Z');

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

小…楫夜泊 2024-12-30 10:36:43

您可以匹配任何字符一次或多次，直到看到前面的 {{，方法是在括号 ( ... )+ 内包含谓词（请参阅 BUFFER代码> 演示中的规则）。

演示：

grammar Test;

options { 
  output=AST;
  ASTLabelType=CommonTree; 
}

@lexer::members {
  private boolean insideTag = false;
}

start   
  :  tag EOF
  ;

tag
  : LD IDENT^ RD
  ;

LD 
@after {insideTag=true;} 
 : '{{'
 ;

RD 
@after {insideTag=false;} 
 : '}}'
 ;

BUFFER
 : ({!insideTag && !(input.LA(1)=='{' && input.LA(2)=='{')}?=> .)+ {$channel=HIDDEN;}
 ;

SPACE 
 : (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;}
 ;

IDENT
 : ('a'..'z' | 'A'..'Z')+
 ;

请注意，最好将 BUFFER 规则保留为语法中的第一个词法分析器规则：这样，它将成为第一个尝试的标记。

如果您现在解析 "foo {{ someVariable }} bar"，则会创建以下 AST：

You can match any character once or more until you see {{ ahead by including a predicate inside the parenthesis ( ... )+ (see the BUFFER rule in the demo).

A demo:

grammar Test;

options { 
  output=AST;
  ASTLabelType=CommonTree; 
}

@lexer::members {
  private boolean insideTag = false;
}

start   
  :  tag EOF
  ;

tag
  : LD IDENT^ RD
  ;

LD 
@after {insideTag=true;} 
 : '{{'
 ;

RD 
@after {insideTag=false;} 
 : '}}'
 ;

BUFFER
 : ({!insideTag && !(input.LA(1)=='{' && input.LA(2)=='{')}?=> .)+ {$channel=HIDDEN;}
 ;

SPACE 
 : (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;}
 ;

IDENT
 : ('a'..'z' | 'A'..'Z')+
 ;

Note that it's best to keep the BUFFER rule as the first lexer rule in your grammar: that way, it will be the first token that is tried.

If you now parse "foo {{ someVariable }} bar", the following AST is created:

enter image description here

回复收藏 0 原文

歌枕肩 2024-12-30 10:36:43

这样的语法不符合您的需要吗？我不明白为什么 BUFFER 需要那么复杂。

grammar test;

options { 
  output=AST;
  ASTLabelType=CommonTree; 
}

@lexer::members { 
    private boolean inTag=false;
}

start   
  :  tag* EOF
  ;

tag
  : LD IDENT RD -> IDENT
  ;

LD 
@after { inTag=true; }
 : '{{'
 ;

RD 
@after { inTag=false; }
 : '}}'
 ;

IDENT   :   {inTag}?=> ('a'..'z'|'A'..'Z'|'_') 'a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;

BUFFER
 : . {$channel=HIDDEN;}
 ;

Wouldn't a grammar like this fit your needs? I don't see why the BUFFER needs to be that complicated.

grammar test;

options { 
  output=AST;
  ASTLabelType=CommonTree; 
}

@lexer::members { 
    private boolean inTag=false;
}

start   
  :  tag* EOF
  ;

tag
  : LD IDENT RD -> IDENT
  ;

LD 
@after { inTag=true; }
 : '{{'
 ;

RD 
@after { inTag=false; }
 : '}}'
 ;

IDENT   :   {inTag}?=> ('a'..'z'|'A'..'Z'|'_') 'a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;

BUFFER
 : . {$channel=HIDDEN;}
 ;

回复收藏 0 原文

~没有更多了~