没有特定字符序列的antlr字符串令牌

发布于 2025-01-30 10:18:23 字数 197 浏览 2 评论 0原文

我正在尝试定义与不包含某些字符序列的字符串令牌相匹配的Lexer语法。例如,我想捕获的字符串的“

""

"asda A rewr A"

"asda A"

"asdas B ad"

示例

"asdas AB fdsdf"

ab

I'm trying to define a lexer grammar that matches string tokens that don't contain a certain sequence of characters. For instance "AB"

Example of strings I want to capture

""

"asda A rewr A"

"asda A"

"asdas B ad"

but not

"asdas AB fdsdf"

I tried a few things but I always seem to miss some case

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

树深时见影 2025-02-06 10:18:23

可以用一些模式魔术:当您进入第一个 string-mode 并遇到ab时,您只需推入第二个字符串模式

lexer grammar MyLexer;

QUOTE      : '"'        -> more, pushMode(MODE_1);
SPACES     : [ \t\r\n]+ -> skip;

mode MODE_1;
STR_1      : '"'        -> popMode;
AB         : 'AB'       -> more, pushMode(MODE_2);
CONTENTS_1 : ~["]       -> more;

mode MODE_2;
STR_2      : '"'        -> popMode, popMode;
CONTENTS_2 : ~["]+      -> more;

Java演示:

String source = "\"\"\n" +
    "\"asda A rewr A\"\n" +
    "\"asdas AB fdsdf\"\n" +
    "\"asda A\"\n" +
    "\"asdas B ad\"\n";

Lexer lexer = new MyLexer(CharStreams.fromString(source));
CommonTokenStream stream = new CommonTokenStream(lexer);
stream.fill();

System.out.println(source);

for (Token t : stream.getTokens()) {
  System.out.printf("%-20s `%s`%n",
      MyLexer.VOCABULARY.getSymbolicName(t.getType()),
      t.getText().replace("\n", "\\n"));
}

将打印以下内容:

""
"asda A rewr A"
"asdas AB fdsdf"
"asda A"
"asdas B ad"

STR_1                `""`
STR_1                `"asda A rewr A"`
STR_2                `"asdas AB fdsdf"`
STR_1                `"asda A"`
STR_1                `"asdas B ad"`

Could be done with a little mode magic: when you're in the first string-mode and you encounter a AB, you just push into the second string-mode:

lexer grammar MyLexer;

QUOTE      : '"'        -> more, pushMode(MODE_1);
SPACES     : [ \t\r\n]+ -> skip;

mode MODE_1;
STR_1      : '"'        -> popMode;
AB         : 'AB'       -> more, pushMode(MODE_2);
CONTENTS_1 : ~["]       -> more;

mode MODE_2;
STR_2      : '"'        -> popMode, popMode;
CONTENTS_2 : ~["]+      -> more;

The Java demo:

String source = "\"\"\n" +
    "\"asda A rewr A\"\n" +
    "\"asdas AB fdsdf\"\n" +
    "\"asda A\"\n" +
    "\"asdas B ad\"\n";

Lexer lexer = new MyLexer(CharStreams.fromString(source));
CommonTokenStream stream = new CommonTokenStream(lexer);
stream.fill();

System.out.println(source);

for (Token t : stream.getTokens()) {
  System.out.printf("%-20s `%s`%n",
      MyLexer.VOCABULARY.getSymbolicName(t.getType()),
      t.getText().replace("\n", "\\n"));
}

will print the following:

""
"asda A rewr A"
"asdas AB fdsdf"
"asda A"
"asdas B ad"

STR_1                `""`
STR_1                `"asda A rewr A"`
STR_2                `"asdas AB fdsdf"`
STR_1                `"asda A"`
STR_1                `"asdas B ad"`
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文