ANTLR4：如何匹配行首的多余空格？

发布于 2025-01-13 11:18:56 字数 2525 浏览 1 评论 0原文

我尝试匹配行首的额外空格，但没有成功。如何修改词法分析器规则以匹配？

TestParser.g4:

parser grammar TestParser;

options { tokenVocab=TestLexer; }

root
    : choice+ EOF
    ;

choice:
    QUESTION OPTION+;

TestLexer.g4:

lexer grammar TestLexer;

@lexer::members {
    private boolean aheadIsNotAnOption(IntStream _input) {
        int nextChar = _input.LA(1);
        return nextChar != 'A' && nextChar != 'B' && nextChar != 'C' && nextChar != 'D';
    }
}

QUESTION:                      {getCharPositionInLine() == 0}? DIGIT DOT CONTENT -> pushMode(OPTION_MODE);
OTHER:                         . -> skip;

mode OPTION_MODE;
OPTION:                        OPTION_HEADER DOT CONTENT;
NOT_OPTION_LINE:               NEWLINE SPACE* {aheadIsNotAnOption(_input)}? -> popMode, skip;
OPTION_OTHER:                  OTHER -> skip;

fragment DIGIT:                [0-9]+;
fragment OPTION_HEADER:        [A-D];
fragment CONTENT:              [a-zA-Z0-9 ,.'?/()!]+? {_input.LA(1) == '\n'}?;
fragment DOT:                  '.';
fragment NEWLINE:              '\n';
fragment SPACE:                ' ';

Text:

1.title
A.aaa
B.bbb
 C.ccc
2.title
A.aaa

Java代码:

import org.antlr.v4.runtime.CharStream;
import org.antlr.v4.runtime.CharStreams;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.Lexer;
import org.antlr.v4.runtime.tree.ParseTree;

import java.io.IOException;
import java.net.URISyntaxException;

public class TestParseTest {

    public static void main(String[] args) throws URISyntaxException, IOException {
        CharStream charStream = CharStreams.fromString("1.title\n" +
                "A.aaa\n" +
                "B.bbb\n" +
                " C.ccc\n" +
                "2.title\n" +
                "A.aaa\n");
        Lexer lexer = new TestLexer(charStream);

        CommonTokenStream tokens = new CommonTokenStream(lexer);
        TestParser parser = new TestParser(tokens);
        ParseTree parseTree = parser.root();

        System.out.println(parseTree.toStringTree(parser));
    }

}

输出如下：

(root (choice 1.title A.aaa B.bbb) (choice 2.title A.aaa) <EOF>)

思路是，当OPTION_MODE中遇到非选项行时，会弹出模式，现在当行首有一个多余的空格，与预期不匹配。

看来C.ccc之前的\n与NOT_OPTION_LINE匹配导致模式弹出？我希望 C.ccc 与 OPTION 匹配，谢谢。

原文

I tried to match the extra space at the beginning of the line, but it didn't work. How to modify the lexer rule to match?

TestParser.g4:

parser grammar TestParser;

options { tokenVocab=TestLexer; }

root
    : choice+ EOF
    ;

choice:
    QUESTION OPTION+;

TestLexer.g4:

lexer grammar TestLexer;

@lexer::members {
    private boolean aheadIsNotAnOption(IntStream _input) {
        int nextChar = _input.LA(1);
        return nextChar != 'A' && nextChar != 'B' && nextChar != 'C' && nextChar != 'D';
    }
}

QUESTION:                      {getCharPositionInLine() == 0}? DIGIT DOT CONTENT -> pushMode(OPTION_MODE);
OTHER:                         . -> skip;

mode OPTION_MODE;
OPTION:                        OPTION_HEADER DOT CONTENT;
NOT_OPTION_LINE:               NEWLINE SPACE* {aheadIsNotAnOption(_input)}? -> popMode, skip;
OPTION_OTHER:                  OTHER -> skip;

fragment DIGIT:                [0-9]+;
fragment OPTION_HEADER:        [A-D];
fragment CONTENT:              [a-zA-Z0-9 ,.'?/()!]+? {_input.LA(1) == '\n'}?;
fragment DOT:                  '.';
fragment NEWLINE:              '\n';
fragment SPACE:                ' ';

Text:

1.title
A.aaa
B.bbb
 C.ccc
2.title
A.aaa

Java code:

import org.antlr.v4.runtime.CharStream;
import org.antlr.v4.runtime.CharStreams;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.Lexer;
import org.antlr.v4.runtime.tree.ParseTree;

import java.io.IOException;
import java.net.URISyntaxException;

public class TestParseTest {

    public static void main(String[] args) throws URISyntaxException, IOException {
        CharStream charStream = CharStreams.fromString("1.title\n" +
                "A.aaa\n" +
                "B.bbb\n" +
                " C.ccc\n" +
                "2.title\n" +
                "A.aaa\n");
        Lexer lexer = new TestLexer(charStream);

        CommonTokenStream tokens = new CommonTokenStream(lexer);
        TestParser parser = new TestParser(tokens);
        ParseTree parseTree = parser.root();

        System.out.println(parseTree.toStringTree(parser));
    }

}

The output is as follows:

(root (choice 1.title A.aaa B.bbb) (choice 2.title A.aaa) <EOF>)

The idea is that when a non-option line is encountered in OPTION_MODE, the mode will pop up, and now when there is an extra space at the beginning of the line, it is not matched as expected.

It seems that the \n before C.ccc matches NOT_OPTION_LINE causing the mode to pop up? I want C.ccc to match as OPTION, thanks.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

蓝眸 2025-01-20 11:18:56

我认为你把它弄得有点太复杂了。在我看来，行要么以问题开头 ([ \t]* [0-9]+)，要么以选项开头 [ \t]* [AZ] 。在所有其他情况下，只需忽略该行 (.->skip)。归结为以下语法：

lexer grammar TestLexer;

QuestionStart
 : {getCharPositionInLine() == 0}? [ \t]* [0-9]+ '.' -> pushMode(ContentMode)
 ;

OptionStart
 : {getCharPositionInLine() == 0}? [ \t]* [A-Z] '.' -> pushMode(ContentMode)
 ;

Ignored
 : . -> skip
 ;

mode ContentMode;

  Content
   : ~[\r\n]+
   ;

  QuestionEnd
   : [\r\n]+ -> skip, popMode
   ;

解析器语法可能如下所示：然后

parser grammar TestParser;

options { tokenVocab=TestLexer; }

root
 : question+ EOF
 ;

question
 : QuestionStart Content option+
 ;

option
 : OptionStart Content+
 ;

Java 代码：

String source = "1.title\n" +
    "A.aaa\n" +
    "B.bbb\n" +
    " C.ccc\n" +
    "  ...ignored ...\n" +
    "2.title\n" +
    "A.aaa\n";

Lexer lexer = new TestLexer(CharStreams.fromString(source));

CommonTokenStream tokens = new CommonTokenStream(lexer);
TestParser parser = new TestParser(tokens);
ParseTree parseTree = parser.root();

System.out.println(parseTree.toStringTree(parser));

将打印：

(root (question 1. title (option A. aaa) (option B. bbb) (option  C. ccc)) (question 2. title (option A. aaa)) <EOF>)

编辑

鉴于您的语法中已经有目标特定代码，您可以从这样的选项中删除空格（未经测试！）：

OptionStart
 : {getCharPositionInLine() == 0}? [ \t]* [A-Z] '.'
   {setText(getText().trim());}
   -> pushMode(ContentMode)
 ;

I think you're making it a bit too complex. As I see it, lines either start as a question ([ \t]* [0-9]+) or as an option [ \t]* [A-Z]. In all other cases, just ignore the line (. -> skip). That boils down to the following grammar:

lexer grammar TestLexer;

QuestionStart
 : {getCharPositionInLine() == 0}? [ \t]* [0-9]+ '.' -> pushMode(ContentMode)
 ;

OptionStart
 : {getCharPositionInLine() == 0}? [ \t]* [A-Z] '.' -> pushMode(ContentMode)
 ;

Ignored
 : . -> skip
 ;

mode ContentMode;

  Content
   : ~[\r\n]+
   ;

  QuestionEnd
   : [\r\n]+ -> skip, popMode
   ;

A parser grammar could then look like this:

parser grammar TestParser;

options { tokenVocab=TestLexer; }

root
 : question+ EOF
 ;

question
 : QuestionStart Content option+
 ;

option
 : OptionStart Content+
 ;

And the Java code:

String source = "1.title\n" +
    "A.aaa\n" +
    "B.bbb\n" +
    " C.ccc\n" +
    "  ...ignored ...\n" +
    "2.title\n" +
    "A.aaa\n";

Lexer lexer = new TestLexer(CharStreams.fromString(source));

CommonTokenStream tokens = new CommonTokenStream(lexer);
TestParser parser = new TestParser(tokens);
ParseTree parseTree = parser.root();

System.out.println(parseTree.toStringTree(parser));

will then print:

(root (question 1. title (option A. aaa) (option B. bbb) (option  C. ccc)) (question 2. title (option A. aaa)) <EOF>)

EDIT

Given that you already have target specific code in your grammar, you could just trim the spaces from an option like this (untested!):

OptionStart
 : {getCharPositionInLine() == 0}? [ \t]* [A-Z] '.'
   {setText(getText().trim());}
   -> pushMode(ContentMode)
 ;

回复收藏 0 原文

~没有更多了~

关于作者

自在安然

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

ANTLR4：如何匹配行首的多余空格？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

编辑

EDIT

关于作者

相关话题

热门标签

推荐作者

尘曦

在梵高的星空下

善良天后

韬韬不绝

qq_CgiN62

不美如何

友情链接

ANTLR4：如何匹配行首的多余空格？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

编辑

EDIT

关于作者

相关话题

热门标签

推荐作者

尘曦

在梵高的星空下

善良天后

韬韬不绝

qq_CgiN62

不美如何

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。