在antlr3语法中切换词法分析器状态
我正在尝试构建 antlr 语法来解析模板语言。该语言可以嵌入到任何文本中,并且边界用开始/结束标记标记:{{
/ }}
。因此,有效的模板如下所示:
foo {{ someVariable }} bar
其中 foo
和 bar
应被忽略,以及 {{
和 }} 内的部分
标签应该被解析。我发现这个问题基本上有问题的答案,除了标签是只有一个 {
和 }
。我尝试修改语法以匹配 2 个开始/结束字符,但是一旦我这样做,BUFFER
规则就会消耗所有字符,还有开始和结束括号。 LD
规则永远不会被调用。
有谁知道为什么当分隔符有 2 个字符时,antlr 词法分析器会消耗 Buffer 规则中的所有标记,但当分隔符只有一个字符时却不会消耗它们?
grammar Test;
options {
output=AST;
ASTLabelType=CommonTree;
}
@lexer::members {
private boolean insideTag = false;
}
start
: (tag | BUFFER )*
;
tag
: LD IDENT^ RD
;
LD @after {
// flip lexer the state
insideTag=true;
System.err.println("FLIPPING TAG");
} : '{{';
RD @after {
// flip the state back
insideTag=false;
} : '}}';
SPACE : (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};
IDENT : (LETTER)*;
BUFFER : { !insideTag }?=> ~(LD | RD)+;
fragment LETTER : ('a'..'z' | 'A'..'Z');
I'm trying to construct an antlr grammar to parse a templating language. that language can be embedded in any text and the boundaries are marked with opening/closing tags: {{
/ }}
. So a valid template looks like this:
foo {{ someVariable }} bar
Where foo
and bar
should be ignored, and the part inside the {{
and }}
tags should be parsed. I've found this question which basically has an answer for the problem, except that the tags are only one {
and }
. I've tried to modify the grammar to match 2 opening/closing characters, but as soon as i do this, the BUFFER
rule consumes ALL characters, also the opening and closing brackets. The LD
rule is never being invoked.
Has anyone an idea why the antlr lexer is consuming all tokens in the Buffer
rule when the delimiters have 2 characters, but does not consume the delimiters when they have only one character?
grammar Test;
options {
output=AST;
ASTLabelType=CommonTree;
}
@lexer::members {
private boolean insideTag = false;
}
start
: (tag | BUFFER )*
;
tag
: LD IDENT^ RD
;
LD @after {
// flip lexer the state
insideTag=true;
System.err.println("FLIPPING TAG");
} : '{{';
RD @after {
// flip the state back
insideTag=false;
} : '}}';
SPACE : (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};
IDENT : (LETTER)*;
BUFFER : { !insideTag }?=> ~(LD | RD)+;
fragment LETTER : ('a'..'z' | 'A'..'Z');
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以匹配任何字符一次或多次,直到看到前面的
{{
,方法是在括号( ... )+
内包含谓词(请参阅BUFFER
代码> 演示中的规则)。演示:
请注意,最好将 BUFFER 规则保留为语法中的第一个词法分析器规则:这样,它将成为第一个尝试的标记。
如果您现在解析
"foo {{ someVariable }} bar"
,则会创建以下 AST:You can match any character once or more until you see
{{
ahead by including a predicate inside the parenthesis( ... )+
(see theBUFFER
rule in the demo).A demo:
Note that it's best to keep the
BUFFER
rule as the first lexer rule in your grammar: that way, it will be the first token that is tried.If you now parse
"foo {{ someVariable }} bar"
, the following AST is created:这样的语法不符合您的需要吗?我不明白为什么 BUFFER 需要那么复杂。
Wouldn't a grammar like this fit your needs? I don't see why the BUFFER needs to be that complicated.