ANTLR4:如何匹配行首的多余空格?
我尝试匹配行首的额外空格,但没有成功。如何修改词法分析器规则以匹配?
TestParser.g4:
parser grammar TestParser;
options { tokenVocab=TestLexer; }
root
: choice+ EOF
;
choice:
QUESTION OPTION+;
TestLexer.g4:
lexer grammar TestLexer;
@lexer::members {
private boolean aheadIsNotAnOption(IntStream _input) {
int nextChar = _input.LA(1);
return nextChar != 'A' && nextChar != 'B' && nextChar != 'C' && nextChar != 'D';
}
}
QUESTION: {getCharPositionInLine() == 0}? DIGIT DOT CONTENT -> pushMode(OPTION_MODE);
OTHER: . -> skip;
mode OPTION_MODE;
OPTION: OPTION_HEADER DOT CONTENT;
NOT_OPTION_LINE: NEWLINE SPACE* {aheadIsNotAnOption(_input)}? -> popMode, skip;
OPTION_OTHER: OTHER -> skip;
fragment DIGIT: [0-9]+;
fragment OPTION_HEADER: [A-D];
fragment CONTENT: [a-zA-Z0-9 ,.'?/()!]+? {_input.LA(1) == '\n'}?;
fragment DOT: '.';
fragment NEWLINE: '\n';
fragment SPACE: ' ';
Text:
1.title
A.aaa
B.bbb
C.ccc
2.title
A.aaa
Java代码:
import org.antlr.v4.runtime.CharStream;
import org.antlr.v4.runtime.CharStreams;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.Lexer;
import org.antlr.v4.runtime.tree.ParseTree;
import java.io.IOException;
import java.net.URISyntaxException;
public class TestParseTest {
public static void main(String[] args) throws URISyntaxException, IOException {
CharStream charStream = CharStreams.fromString("1.title\n" +
"A.aaa\n" +
"B.bbb\n" +
" C.ccc\n" +
"2.title\n" +
"A.aaa\n");
Lexer lexer = new TestLexer(charStream);
CommonTokenStream tokens = new CommonTokenStream(lexer);
TestParser parser = new TestParser(tokens);
ParseTree parseTree = parser.root();
System.out.println(parseTree.toStringTree(parser));
}
}
输出如下:
(root (choice 1.title A.aaa B.bbb) (choice 2.title A.aaa) <EOF>)
思路是,当OPTION_MODE
中遇到非选项行时,会弹出模式,现在当行首有一个多余的空格,与预期不匹配。
看来C.ccc
之前的\n
与NOT_OPTION_LINE
匹配导致模式弹出?我希望 C.ccc
与 OPTION
匹配,谢谢。
I tried to match the extra space at the beginning of the line, but it didn't work. How to modify the lexer rule to match?
TestParser.g4:
parser grammar TestParser;
options { tokenVocab=TestLexer; }
root
: choice+ EOF
;
choice:
QUESTION OPTION+;
TestLexer.g4:
lexer grammar TestLexer;
@lexer::members {
private boolean aheadIsNotAnOption(IntStream _input) {
int nextChar = _input.LA(1);
return nextChar != 'A' && nextChar != 'B' && nextChar != 'C' && nextChar != 'D';
}
}
QUESTION: {getCharPositionInLine() == 0}? DIGIT DOT CONTENT -> pushMode(OPTION_MODE);
OTHER: . -> skip;
mode OPTION_MODE;
OPTION: OPTION_HEADER DOT CONTENT;
NOT_OPTION_LINE: NEWLINE SPACE* {aheadIsNotAnOption(_input)}? -> popMode, skip;
OPTION_OTHER: OTHER -> skip;
fragment DIGIT: [0-9]+;
fragment OPTION_HEADER: [A-D];
fragment CONTENT: [a-zA-Z0-9 ,.'?/()!]+? {_input.LA(1) == '\n'}?;
fragment DOT: '.';
fragment NEWLINE: '\n';
fragment SPACE: ' ';
Text:
1.title
A.aaa
B.bbb
C.ccc
2.title
A.aaa
Java code:
import org.antlr.v4.runtime.CharStream;
import org.antlr.v4.runtime.CharStreams;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.Lexer;
import org.antlr.v4.runtime.tree.ParseTree;
import java.io.IOException;
import java.net.URISyntaxException;
public class TestParseTest {
public static void main(String[] args) throws URISyntaxException, IOException {
CharStream charStream = CharStreams.fromString("1.title\n" +
"A.aaa\n" +
"B.bbb\n" +
" C.ccc\n" +
"2.title\n" +
"A.aaa\n");
Lexer lexer = new TestLexer(charStream);
CommonTokenStream tokens = new CommonTokenStream(lexer);
TestParser parser = new TestParser(tokens);
ParseTree parseTree = parser.root();
System.out.println(parseTree.toStringTree(parser));
}
}
The output is as follows:
(root (choice 1.title A.aaa B.bbb) (choice 2.title A.aaa) <EOF>)
The idea is that when a non-option line is encountered in OPTION_MODE
, the mode will pop up, and now when there is an extra space at the beginning of the line, it is not matched as expected.
It seems that the \n
before C.ccc
matches NOT_OPTION_LINE
causing the mode to pop up? I want C.ccc
to match as OPTION
, thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为你把它弄得有点太复杂了。在我看来,行要么以问题开头 (
[ \t]* [0-9]+
),要么以选项开头[ \t]* [AZ]
。在所有其他情况下,只需忽略该行 (.->skip
)。归结为以下语法:解析器语法可能如下所示: 然后
Java 代码:
将打印:
编辑
鉴于您的语法中已经有目标特定代码,您可以从这样的选项中删除空格(未经测试!):
I think you're making it a bit too complex. As I see it, lines either start as a question (
[ \t]* [0-9]+
) or as an option[ \t]* [A-Z]
. In all other cases, just ignore the line (. -> skip
). That boils down to the following grammar:A parser grammar could then look like this:
And the Java code:
will then print:
EDIT
Given that you already have target specific code in your grammar, you could just trim the spaces from an option like this (untested!):