在词法分析器/解析器中删除所需的周围引号
我的几个项目在我的语法中都遇到了类似的效果。
我需要解析类似 Key="Value"
的内容,
因此我创建了一个语法(我可以用来显示效果的最简单的语法):
grammar test;
KEY : [a-zA-Z0-9]+ ;
VALUE : DOUBLEQUOTE [ _a-zA-Z0-9.-]+ DOUBLEQUOTE ;
DOUBLEQUOTE : '"' ;
EQUALS : '=' ;
entry : key=KEY EQUALS value=VALUE;
我现在可以解析 thing="One Two Three"
在我的代码中我收到
key
=thing
value
="One Two Three"
总共在我的项目中,我最终需要一个额外的步骤来从 通常是
这样的(我使用 Java)
String value = ctx.value.getText();
value = value.substring(1, value.length()-1);
在我的实际语法中,我发现很难将周围的 "
的检查移到解析器中。
有没有一种干净的方法可以通过在词法分析器/解析器中执行某些操作来删除 "
?
本质上我希望 ctx.value.getText()
返回 One Two三
而不是“一二三”
更新:
我一直在研究 Bart Kiers 提供的出色答案,并发现这个变体正是我所寻找的。 通过将双引号放在隐藏通道上,词法分析器可以使用它们并对解析器隐藏。
TestLexer.g4
lexer grammar TestLexer;
KEY : [a-zA-Z0-9]+;
DOUBLEQUOTE : '"' -> channel(HIDDEN), pushMode(STRING_MODE);
EQUALS : '=';
mode STRING_MODE;
STRING_DOUBLEQUOTE
: '"' -> channel(HIDDEN), type(DOUBLEQUOTE), popMode
;
STRING
: [ _a-zA-Z0-9.-]+
;
和
TestParser.g4
parser grammar TestParser;
options { tokenVocab=TestLexer; }
entry : key=KEY EQUALS value=STRING ;
I several projects I have run into a similar effect in my grammars.
I have the need to parse something like Key="Value"
So I create a grammar (simplest I could make to show the effect):
grammar test;
KEY : [a-zA-Z0-9]+ ;
VALUE : DOUBLEQUOTE [ _a-zA-Z0-9.-]+ DOUBLEQUOTE ;
DOUBLEQUOTE : '"' ;
EQUALS : '=' ;
entry : key=KEY EQUALS value=VALUE;
I can now parse thing="One Two Three"
and in my code I receive
key
=thing
value
="One Two Three"
In all of my projects I end up with an extra step to strip those "
from the value.
Usually something like this (I use Java)
String value = ctx.value.getText();
value = value.substring(1, value.length()-1);
In my real grammars I find it very hard to move the check of the surrounding "
into the parser.
Is there a clean way to already drop the "
by doing something in the lexer/parser?
Essentially I want ctx.value.getText()
to return One Two Three
instead of "One Two Three"
.
Update:
I have been playing with the excellent answer provided by Bart Kiers and found this variation which does exactly what I was looking for.
By putting the DOUBLEQUOTE on a hidden channel they are used by the lexer and hidden from the parser.
TestLexer.g4
lexer grammar TestLexer;
KEY : [a-zA-Z0-9]+;
DOUBLEQUOTE : '"' -> channel(HIDDEN), pushMode(STRING_MODE);
EQUALS : '=';
mode STRING_MODE;
STRING_DOUBLEQUOTE
: '"' -> channel(HIDDEN), type(DOUBLEQUOTE), popMode
;
STRING
: [ _a-zA-Z0-9.-]+
;
and
TestParser.g4
parser grammar TestParser;
options { tokenVocab=TestLexer; }
entry : key=KEY EQUALS value=STRING ;
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
试试这个:
不用说:这将您的语法与 Java 联系起来,并且(取决于您有多少嵌入式 Java 代码)您的语法将很难移植到其他目标语言。
编辑
一旦创建了令牌,就没有内置方法可以将其分离(除了在嵌入式操作中这样做,正如我所演示的那样)。您正在寻找的可以完成,但这意味着重写您的语法,以便字符串文字不会被构造为单个标记。这可以通过使用词汇模式 以便可以在解析器中构造字符串。
快速演示:
TestLexer.g4
TestParser.g4
如果您现在运行 Java 代码:
将打印以下内容:
Try this:
Needless to say: this ties your grammar to Java, and (depending how many embedded Java code you have) your grammar will be hard to port to some other target language.
EDIT
Once a token is created, there is no built-in way to separate it (other than doing so in embedded actions, as I demonstrated). What you're looking for can be done, but that means rewriting your grammar so that a string literal is not constructed as a single token. This can be done by using lexical modes so that the string can be constructed in the parser.
A quick demo:
TestLexer.g4
TestParser.g4
If you now run the Java code:
this will be printed: