为什么这个语法是错误208？

发布于 2024-12-11 13:15:33 字数 524 浏览 0 评论 0原文

我不明白为什么以下语法会导致错误 208 抱怨 IF 将永远不会匹配：

error(208): test.g:11:1: The following token definitions can never be matched because prior tokens match the same input: IF

ANTLRWorks 1.4.3

ANTLT 3.4

grammar test;

@lexer::members {
  private boolean rawAhead() {
  }
}

parse    :    IF*;

RAW    :    ({rawAhead()}?=> . )+;
IF      :    'if';
ID    :    ('A'..'Z'|'a'..'z')+;

删除 RAW 规则或 ID 规则可以解决该错误... 从我的角度来看，当 rawAhead() 返回 false 时，IF 确实有可能被匹配。

原文

I don't understand why the following grammar leads to error 208 complaining IF will be never matched:

error(208): test.g:11:1: The following token definitions can never be matched because prior tokens match the same input: IF

ANTLRWorks 1.4.3

ANTLT 3.4

grammar test;

@lexer::members {
  private boolean rawAhead() {
  }
}

parse    :    IF*;

RAW    :    ({rawAhead()}?=> . )+;
IF      :    'if';
ID    :    ('A'..'Z'|'a'..'z')+;

Either remove RAW rule or ID rule solves the error...
From my point of view, IF does have the possibility to be matched when rawAhead() returns false.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

如果没结果 2024-12-18 13:15:33

Bood 写道：
我认为这实际上很重要，假设我们在 mmode 之外有一个并且只有一个 'if'，例如 <#/>if<#/>，那么这里的 if 将与 IF 匹配，而不是RAW 应该是（相同长度，与第一个匹配），对吗？

是的，你说得对，好点。再考虑一下，这是预期的行为 AFAIK。但是，事情的工作方式似乎有点不同：RAW 规则优先于 ID 和 IF 规则，即使放在末尾时也是如此。如您所见，词法分析器语法：

freemarker_simple.g

grammar freemarker_simple;

@lexer::members {

  private boolean mmode = false;

  private boolean rawAhead() {
    if(mmode) return false;
    int ch1 = input.LA(1), ch2 = input.LA(2), ch3 = input.LA(3);
    return !(
        (ch1 == '<' && ch2 == '#') ||
        (ch1 == '<' && ch2 == '/' && ch3 == '#') ||
        (ch1 == '
 Main.java
import org.antlr.runtime.*;

public class Main {
  public static void main(String[] args) throws Exception {
    freemarker_simpleLexer lexer = new freemarker_simpleLexer(new ANTLRStringStream("<#/if>if<#if>foo<#if>"));
    freemarker_simpleParser parser = new freemarker_simpleParser(new CommonTokenStream(lexer));
    parser.parse();
  }
}

会将以下内容打印到控制台：
TAG_START       '<#'
IF              'if'
TAG_END         '>'
RAW             'if'
TAG_START       '<#'
IF              'if'
TAG_END         '>'
RAW             'foo'
TAG_START       '<#'
IF              'if'
TAG_END         '>'

如您所见， 'if' 和 'foo' < em> 被标记为输入中的RAW：
<#/if>if<#if>foo<#if>
      ^^     ^^^

 && ch2 == '{')
    );
  }
}

parse
  :  (t=. {System.out.printf("\%-15s '\%s'\n", tokenNames[$t.type], $t.text);})* EOF
  ;

OUTPUT_START  : '${'  {mmode=true;};
TAG_START     : '<#'  {mmode=true;};
TAG_END_START : '</' ('#' {mmode=true;} | ~'#' {$type=RAW;});

OUTPUT_END    : '}' {mmode=false;};
TAG_END       : '>' {mmode=false;};

EQUALS        : '==';
IF            : 'if';
STRING        : '"' ~'"'* '"';
ID            : ('a'..'z' | 'A'..'Z')+;
SPACE         : (' ' | '\t' | '\r' | '\n')+ {skip();};

RAW           : ({rawAhead()}?=> . )+;

Main.java

会将以下内容打印到控制台：

如您所见， 'if' 和 'foo' < em> 被标记为输入中的RAW：

Bood wrote:
I think it actually matters, say if we have an and just an 'if' outside of the mmode, e.g. <#/>if<#/>, then the if here will be matched with IF, not RAW it should be (same length, match the first), right?

Yeah, you're right, good point. Giving it some more thought that is the expected behavior AFAIK. But, it seems things work a bit differently: the RAW rule gets precedence over the ID and IF rules, even when placed at the end of the lexer grammar as you can see:

freemarker_simple.g

grammar freemarker_simple;

@lexer::members {

  private boolean mmode = false;

  private boolean rawAhead() {
    if(mmode) return false;
    int ch1 = input.LA(1), ch2 = input.LA(2), ch3 = input.LA(3);
    return !(
        (ch1 == '<' && ch2 == '#') ||
        (ch1 == '<' && ch2 == '/' && ch3 == '#') ||
        (ch1 == '
Main.java
import org.antlr.runtime.*;

public class Main {
  public static void main(String[] args) throws Exception {
    freemarker_simpleLexer lexer = new freemarker_simpleLexer(new ANTLRStringStream("<#/if>if<#if>foo<#if>"));
    freemarker_simpleParser parser = new freemarker_simpleParser(new CommonTokenStream(lexer));
    parser.parse();
  }
}

will print the following to the console:
TAG_START       '<#'
IF              'if'
TAG_END         '>'
RAW             'if'
TAG_START       '<#'
IF              'if'
TAG_END         '>'
RAW             'foo'
TAG_START       '<#'
IF              'if'
TAG_END         '>'

As you can see, the 'if' and 'foo' are tokenized as RAW in the input:
<#/if>if<#if>foo<#if>
      ^^     ^^^

 && ch2 == '{')
    );
  }
}

parse
  :  (t=. {System.out.printf("\%-15s '\%s'\n", tokenNames[$t.type], $t.text);})* EOF
  ;

OUTPUT_START  : '${'  {mmode=true;};
TAG_START     : '<#'  {mmode=true;};
TAG_END_START : '</' ('#' {mmode=true;} | ~'#' {$type=RAW;});

OUTPUT_END    : '}' {mmode=false;};
TAG_END       : '>' {mmode=false;};

EQUALS        : '==';
IF            : 'if';
STRING        : '"' ~'"'* '"';
ID            : ('a'..'z' | 'A'..'Z')+;
SPACE         : (' ' | '\t' | '\r' | '\n')+ {skip();};

RAW           : ({rawAhead()}?=> . )+;