没有特定字符序列的antlr字符串令牌

发布于 2025-01-30 10:18:23 字数 197 浏览 2 评论 0原文

我正在尝试定义与不包含某些字符序列的字符串令牌相匹配的Lexer语法。例如，我想捕获的字符串的“

”

""

"asda A rewr A"

"asda A"

"asdas B ad"

示例

"asdas AB fdsdf"

原文

I'm trying to define a lexer grammar that matches string tokens that don't contain a certain sequence of characters. For instance "AB"

Example of strings I want to capture

""

"asda A rewr A"

"asda A"

"asdas B ad"

but not

"asdas AB fdsdf"

I tried a few things but I always seem to miss some case

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

树深时见影 2025-02-06 10:18:23

可以用一些模式魔术：当您进入第一个 string-mode 并遇到ab时，您只需推入第二个字符串模式：

lexer grammar MyLexer;

QUOTE      : '"'        -> more, pushMode(MODE_1);
SPACES     : [ \t\r\n]+ -> skip;

mode MODE_1;
STR_1      : '"'        -> popMode;
AB         : 'AB'       -> more, pushMode(MODE_2);
CONTENTS_1 : ~["]       -> more;

mode MODE_2;
STR_2      : '"'        -> popMode, popMode;
CONTENTS_2 : ~["]+      -> more;

Java演示：

String source = "\"\"\n" +
    "\"asda A rewr A\"\n" +
    "\"asdas AB fdsdf\"\n" +
    "\"asda A\"\n" +
    "\"asdas B ad\"\n";

Lexer lexer = new MyLexer(CharStreams.fromString(source));
CommonTokenStream stream = new CommonTokenStream(lexer);
stream.fill();

System.out.println(source);

for (Token t : stream.getTokens()) {
  System.out.printf("%-20s `%s`%n",
      MyLexer.VOCABULARY.getSymbolicName(t.getType()),
      t.getText().replace("\n", "\\n"));
}

将打印以下内容：

""
"asda A rewr A"
"asdas AB fdsdf"
"asda A"
"asdas B ad"

STR_1                `""`
STR_1                `"asda A rewr A"`
STR_2                `"asdas AB fdsdf"`
STR_1                `"asda A"`
STR_1                `"asdas B ad"`

Could be done with a little mode magic: when you're in the first string-mode and you encounter a AB, you just push into the second string-mode:

lexer grammar MyLexer;

QUOTE      : '"'        -> more, pushMode(MODE_1);
SPACES     : [ \t\r\n]+ -> skip;

mode MODE_1;
STR_1      : '"'        -> popMode;
AB         : 'AB'       -> more, pushMode(MODE_2);
CONTENTS_1 : ~["]       -> more;

mode MODE_2;
STR_2      : '"'        -> popMode, popMode;
CONTENTS_2 : ~["]+      -> more;

The Java demo:

String source = "\"\"\n" +
    "\"asda A rewr A\"\n" +
    "\"asdas AB fdsdf\"\n" +
    "\"asda A\"\n" +
    "\"asdas B ad\"\n";

Lexer lexer = new MyLexer(CharStreams.fromString(source));
CommonTokenStream stream = new CommonTokenStream(lexer);
stream.fill();

System.out.println(source);

for (Token t : stream.getTokens()) {
  System.out.printf("%-20s `%s`%n",
      MyLexer.VOCABULARY.getSymbolicName(t.getType()),
      t.getText().replace("\n", "\\n"));
}

will print the following:

""
"asda A rewr A"
"asdas AB fdsdf"
"asda A"
"asdas B ad"

STR_1                `""`
STR_1                `"asda A rewr A"`
STR_2                `"asdas AB fdsdf"`
STR_1                `"asda A"`
STR_1                `"asdas B ad"`

回复收藏 0 原文

~没有更多了~