ANTLR4:如何匹配行首的多余空格?

发布于 2025-01-13 11:18:56 字数 2525 浏览 1 评论 0原文

我尝试匹配行首的额外空格,但没有成功。如何修改词法分析器规则以匹配?

TestParser.g4:

parser grammar TestParser;

options { tokenVocab=TestLexer; }

root
    : choice+ EOF
    ;

choice:
    QUESTION OPTION+;

TestLexer.g4:

lexer grammar TestLexer;

@lexer::members {
    private boolean aheadIsNotAnOption(IntStream _input) {
        int nextChar = _input.LA(1);
        return nextChar != 'A' && nextChar != 'B' && nextChar != 'C' && nextChar != 'D';
    }
}

QUESTION:                      {getCharPositionInLine() == 0}? DIGIT DOT CONTENT -> pushMode(OPTION_MODE);
OTHER:                         . -> skip;

mode OPTION_MODE;
OPTION:                        OPTION_HEADER DOT CONTENT;
NOT_OPTION_LINE:               NEWLINE SPACE* {aheadIsNotAnOption(_input)}? -> popMode, skip;
OPTION_OTHER:                  OTHER -> skip;

fragment DIGIT:                [0-9]+;
fragment OPTION_HEADER:        [A-D];
fragment CONTENT:              [a-zA-Z0-9 ,.'?/()!]+? {_input.LA(1) == '\n'}?;
fragment DOT:                  '.';
fragment NEWLINE:              '\n';
fragment SPACE:                ' ';

Text:

1.title
A.aaa
B.bbb
 C.ccc
2.title
A.aaa

Java代码:

import org.antlr.v4.runtime.CharStream;
import org.antlr.v4.runtime.CharStreams;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.Lexer;
import org.antlr.v4.runtime.tree.ParseTree;

import java.io.IOException;
import java.net.URISyntaxException;

public class TestParseTest {

    public static void main(String[] args) throws URISyntaxException, IOException {
        CharStream charStream = CharStreams.fromString("1.title\n" +
                "A.aaa\n" +
                "B.bbb\n" +
                " C.ccc\n" +
                "2.title\n" +
                "A.aaa\n");
        Lexer lexer = new TestLexer(charStream);

        CommonTokenStream tokens = new CommonTokenStream(lexer);
        TestParser parser = new TestParser(tokens);
        ParseTree parseTree = parser.root();

        System.out.println(parseTree.toStringTree(parser));
    }

}

输出如下:

(root (choice 1.title A.aaa B.bbb) (choice 2.title A.aaa) <EOF>)

思路是,当OPTION_MODE中遇到非选项行时,会弹出模式,现在当行首有一个多余的空格,与预期不匹配。

看来C.ccc之前的\nNOT_OPTION_LINE匹配导致模式弹出?我希望 C.cccOPTION 匹配,谢谢。

I tried to match the extra space at the beginning of the line, but it didn't work. How to modify the lexer rule to match?

TestParser.g4:

parser grammar TestParser;

options { tokenVocab=TestLexer; }

root
    : choice+ EOF
    ;

choice:
    QUESTION OPTION+;

TestLexer.g4:

lexer grammar TestLexer;

@lexer::members {
    private boolean aheadIsNotAnOption(IntStream _input) {
        int nextChar = _input.LA(1);
        return nextChar != 'A' && nextChar != 'B' && nextChar != 'C' && nextChar != 'D';
    }
}

QUESTION:                      {getCharPositionInLine() == 0}? DIGIT DOT CONTENT -> pushMode(OPTION_MODE);
OTHER:                         . -> skip;

mode OPTION_MODE;
OPTION:                        OPTION_HEADER DOT CONTENT;
NOT_OPTION_LINE:               NEWLINE SPACE* {aheadIsNotAnOption(_input)}? -> popMode, skip;
OPTION_OTHER:                  OTHER -> skip;

fragment DIGIT:                [0-9]+;
fragment OPTION_HEADER:        [A-D];
fragment CONTENT:              [a-zA-Z0-9 ,.'?/()!]+? {_input.LA(1) == '\n'}?;
fragment DOT:                  '.';
fragment NEWLINE:              '\n';
fragment SPACE:                ' ';

Text:

1.title
A.aaa
B.bbb
 C.ccc
2.title
A.aaa

Java code:

import org.antlr.v4.runtime.CharStream;
import org.antlr.v4.runtime.CharStreams;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.Lexer;
import org.antlr.v4.runtime.tree.ParseTree;

import java.io.IOException;
import java.net.URISyntaxException;

public class TestParseTest {

    public static void main(String[] args) throws URISyntaxException, IOException {
        CharStream charStream = CharStreams.fromString("1.title\n" +
                "A.aaa\n" +
                "B.bbb\n" +
                " C.ccc\n" +
                "2.title\n" +
                "A.aaa\n");
        Lexer lexer = new TestLexer(charStream);

        CommonTokenStream tokens = new CommonTokenStream(lexer);
        TestParser parser = new TestParser(tokens);
        ParseTree parseTree = parser.root();

        System.out.println(parseTree.toStringTree(parser));
    }

}

The output is as follows:

(root (choice 1.title A.aaa B.bbb) (choice 2.title A.aaa) <EOF>)

The idea is that when a non-option line is encountered in OPTION_MODE, the mode will pop up, and now when there is an extra space at the beginning of the line, it is not matched as expected.

It seems that the \n before C.ccc matches NOT_OPTION_LINE causing the mode to pop up? I want C.ccc to match as OPTION, thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

蓝眸 2025-01-20 11:18:56

我认为你把它弄得有点太复杂了。在我看来,行要么以问题开头 ([ \t]* [0-9]+),要么以选项开头 [ \t]* [AZ] 。在所有其他情况下,只需忽略该行 (.->skip)。归结为以下语法:

lexer grammar TestLexer;

QuestionStart
 : {getCharPositionInLine() == 0}? [ \t]* [0-9]+ '.' -> pushMode(ContentMode)
 ;

OptionStart
 : {getCharPositionInLine() == 0}? [ \t]* [A-Z] '.' -> pushMode(ContentMode)
 ;

Ignored
 : . -> skip
 ;

mode ContentMode;

  Content
   : ~[\r\n]+
   ;

  QuestionEnd
   : [\r\n]+ -> skip, popMode
   ;

解析器语法可能如下所示: 然后

parser grammar TestParser;

options { tokenVocab=TestLexer; }

root
 : question+ EOF
 ;

question
 : QuestionStart Content option+
 ;

option
 : OptionStart Content+
 ;

Java 代码:

String source = "1.title\n" +
    "A.aaa\n" +
    "B.bbb\n" +
    " C.ccc\n" +
    "  ...ignored ...\n" +
    "2.title\n" +
    "A.aaa\n";

Lexer lexer = new TestLexer(CharStreams.fromString(source));

CommonTokenStream tokens = new CommonTokenStream(lexer);
TestParser parser = new TestParser(tokens);
ParseTree parseTree = parser.root();

System.out.println(parseTree.toStringTree(parser));

将打印:

(root (question 1. title (option A. aaa) (option B. bbb) (option  C. ccc)) (question 2. title (option A. aaa)) <EOF>)

编辑

鉴于您的语法中已经有目标特定代码,您可以从这样的选项中删除空格(未经测试!):

OptionStart
 : {getCharPositionInLine() == 0}? [ \t]* [A-Z] '.'
   {setText(getText().trim());}
   -> pushMode(ContentMode)
 ;

I think you're making it a bit too complex. As I see it, lines either start as a question ([ \t]* [0-9]+) or as an option [ \t]* [A-Z]. In all other cases, just ignore the line (. -> skip). That boils down to the following grammar:

lexer grammar TestLexer;

QuestionStart
 : {getCharPositionInLine() == 0}? [ \t]* [0-9]+ '.' -> pushMode(ContentMode)
 ;

OptionStart
 : {getCharPositionInLine() == 0}? [ \t]* [A-Z] '.' -> pushMode(ContentMode)
 ;

Ignored
 : . -> skip
 ;

mode ContentMode;

  Content
   : ~[\r\n]+
   ;

  QuestionEnd
   : [\r\n]+ -> skip, popMode
   ;

A parser grammar could then look like this:

parser grammar TestParser;

options { tokenVocab=TestLexer; }

root
 : question+ EOF
 ;

question
 : QuestionStart Content option+
 ;

option
 : OptionStart Content+
 ;

And the Java code:

String source = "1.title\n" +
    "A.aaa\n" +
    "B.bbb\n" +
    " C.ccc\n" +
    "  ...ignored ...\n" +
    "2.title\n" +
    "A.aaa\n";

Lexer lexer = new TestLexer(CharStreams.fromString(source));

CommonTokenStream tokens = new CommonTokenStream(lexer);
TestParser parser = new TestParser(tokens);
ParseTree parseTree = parser.root();

System.out.println(parseTree.toStringTree(parser));

will then print:

(root (question 1. title (option A. aaa) (option B. bbb) (option  C. ccc)) (question 2. title (option A. aaa)) <EOF>)

EDIT

Given that you already have target specific code in your grammar, you could just trim the spaces from an option like this (untested!):

OptionStart
 : {getCharPositionInLine() == 0}? [ \t]* [A-Z] '.'
   {setText(getText().trim());}
   -> pushMode(ContentMode)
 ;
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文