动态创建词法分析器规则

发布于 2024-12-29 03:21:49 字数 117 浏览 4 评论 0原文

这是一个简单的规则:

NAME : 'name1' | 'name2' | 'name3';

是否可以使用包含字符串的数组动态地为此类规则提供替代方案?

Here is a simple rule:

NAME : 'name1' | 'name2' | 'name3';

Is it possible to provide alternatives for such rule dynamically using an array that contains strings?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

相思碎 2025-01-05 03:21:49

是的,动态令牌匹配 IDENTIFIER 规则

在这种情况下,只需在 Id 完全匹配后进行检查,看看 Id 的文本是否匹配位于预定义的集合中。如果它在集合中(在我的示例中为 Set),请更改令牌的类型。

一个小演示:

grammar T;

@lexer::members {
  private java.util.Set<String> special;

  public TLexer(ANTLRStringStream input, java.util.Set<String> special) {
    super(input);
    this.special = special;
  }

}

parse
 : (t=. {System.out.printf("\%-10s'\%s'\n", tokenNames[$t.type], $t.text);})* EOF
 ;

Id
 : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')*
   {if(special.contains($text)) $type=Special;}
 ;

Int
 : '0'..'9'+
 ;

Space
 : (' ' | '\t' | '\r' | '\n') {skip();}
 ;

fragment Special : ;

如果您现在运行以下演示:

import org.antlr.runtime.*;

public class Main {
  public static void main(String[] args) throws Exception {
    String source = "foo bar baz Mu";
    java.util.Set<String> set = new java.util.HashSet<String>();
    set.add("Mu");
    set.add("bar");
    TLexer lexer = new TLexer(new ANTLRStringStream(source), set);
    TParser parser = new TParser(new CommonTokenStream(lexer));
    parser.parse();
  }
}

您将看到打印以下内容:

Id        'foo'
Special   'bar'
Id        'baz'
Special   'Mu'

ANTLR4

对于 ANTLR4,您可以执行以下操作:使用

grammar T;

@lexer::members {
  private java.util.Set<String> special = new java.util.HashSet<>();

  public TLexer(CharStream input, java.util.Set<String> special) {
    this(input);
    this.special = special;
  }
}

tokens {
  Special
}

parse
 : .*? EOF
 ;

Id
 : [a-zA-Z_] [a-zA-Z_0-9]* {if(special.contains(getText())) setType(TParser.Special);}
 ;

Int
 : [0-9]+
 ;

Space
 : [ \t\r\n] -> skip
 ;

class: 测试它,

import org.antlr.v4.runtime.*;
import java.util.HashSet;
import java.util.Set;

public class Main {

  public static void main(String[] args) {

    String source = "foo bar baz Mu";
    Set<String> set = new HashSet<String>(){{
      add("Mu");
      add("bar");
    }};

    TLexer lexer = new TLexer(CharStreams.fromString(source), set);
    CommonTokenStream tokenStream = new CommonTokenStream(lexer);
    tokenStream.fill();

    for (Token t : tokenStream.getTokens()) {
      System.out.printf("%-10s '%s'\n", TParser.VOCABULARY.getSymbolicName(t.getType()), t.getText());
    }
  }
}

它将打印:

Id         'foo'
Special    'bar'
Id         'baz'
Special    'Mu'
EOF        '<EOF>'

Yes, dynamic tokens match IDENTIFIER rule

In that case, simply do a check after the Id has matched completely to see if the text the Id matched is in a predefined collection. If it is in the collection (a Set in my example) change the type of the token.

A small demo:

grammar T;

@lexer::members {
  private java.util.Set<String> special;

  public TLexer(ANTLRStringStream input, java.util.Set<String> special) {
    super(input);
    this.special = special;
  }

}

parse
 : (t=. {System.out.printf("\%-10s'\%s'\n", tokenNames[$t.type], $t.text);})* EOF
 ;

Id
 : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')*
   {if(special.contains($text)) $type=Special;}
 ;

Int
 : '0'..'9'+
 ;

Space
 : (' ' | '\t' | '\r' | '\n') {skip();}
 ;

fragment Special : ;

And if you now run the following demo:

import org.antlr.runtime.*;

public class Main {
  public static void main(String[] args) throws Exception {
    String source = "foo bar baz Mu";
    java.util.Set<String> set = new java.util.HashSet<String>();
    set.add("Mu");
    set.add("bar");
    TLexer lexer = new TLexer(new ANTLRStringStream(source), set);
    TParser parser = new TParser(new CommonTokenStream(lexer));
    parser.parse();
  }
}

You will see the following being printed:

Id        'foo'
Special   'bar'
Id        'baz'
Special   'Mu'

ANTLR4

For ANTLR4, you can do something like this:

grammar T;

@lexer::members {
  private java.util.Set<String> special = new java.util.HashSet<>();

  public TLexer(CharStream input, java.util.Set<String> special) {
    this(input);
    this.special = special;
  }
}

tokens {
  Special
}

parse
 : .*? EOF
 ;

Id
 : [a-zA-Z_] [a-zA-Z_0-9]* {if(special.contains(getText())) setType(TParser.Special);}
 ;

Int
 : [0-9]+
 ;

Space
 : [ \t\r\n] -> skip
 ;

test it with the class:

import org.antlr.v4.runtime.*;
import java.util.HashSet;
import java.util.Set;

public class Main {

  public static void main(String[] args) {

    String source = "foo bar baz Mu";
    Set<String> set = new HashSet<String>(){{
      add("Mu");
      add("bar");
    }};

    TLexer lexer = new TLexer(CharStreams.fromString(source), set);
    CommonTokenStream tokenStream = new CommonTokenStream(lexer);
    tokenStream.fill();

    for (Token t : tokenStream.getTokens()) {
      System.out.printf("%-10s '%s'\n", TParser.VOCABULARY.getSymbolicName(t.getType()), t.getText());
    }
  }
}

which will print:

Id         'foo'
Special    'bar'
Id         'baz'
Special    'Mu'
EOF        '<EOF>'
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文