为 Antlr3 语法添加带引号的字符串支持
我正在尝试实现一种用于解析查询的语法。单个查询由 items
组成,其中每个项目可以是 name
或 name-ref
。
name
是 mystring
(只有字母,没有空格)或 "my long string"
(字母和空格,始终用引号引起来)。 name-ref
与 name
非常相似,唯一的区别是它应该以 ref:
开头 (ref:mystring,<代码>参考:“我的长字符串”)。查询应至少包含 1 项(
name
或 name-ref
)。
这就是我所拥有的:
NAME: ('a'..'z')+;
REF_TAG: 'ref:';
SP: ' '+;
name: NAME;
name_ref: REF_TAG name;
item: name | name_ref;
query: item (SP item)*;
此语法演示了我基本上需要获得的内容,唯一的功能是它不支持长引用字符串(它适用于没有空格的名称)。
SHORT_NAME: ('a'..'z')+;
LONG_NAME: SHORT_NAME (SP SHORT_NAME)*;
REF_TAG: 'ref:';
SP: ' '+;
Q: '"';
short_name: SHORT_NAME;
long_name: LONG_NAME;
name_ref: REF_TAG (short_name | (Q long_name Q));
item: (short_name | (Q long_name Q)) | name_ref;
query: item (SP item)*;
但这是行不通的。有什么想法有什么问题吗?也许,这很重要:我的第一个查询
应该被视为 3 item
(3 name
)和“我的第一个查询”
是 1 item
(1 long_name
)。
I'm trying to implement a grammar for parsing queries. Single query consists of items
where each item can be either name
or name-ref
.
name
is either mystring
(only letters, no spaces) or "my long string"
(letters and spaces, always quoted). name-ref
is very similar to name
and the only difference is that it should start with ref:
(ref:mystring
, ref:"my long string"
). Query should contain at least 1 item (name
or name-ref
).
Here's what I have:
NAME: ('a'..'z')+;
REF_TAG: 'ref:';
SP: ' '+;
name: NAME;
name_ref: REF_TAG name;
item: name | name_ref;
query: item (SP item)*;
This grammar demonstrates what I basically need to get and the only feature is that it doesn't support long quoted strings (it works fine for names that doesn't have spaces).
SHORT_NAME: ('a'..'z')+;
LONG_NAME: SHORT_NAME (SP SHORT_NAME)*;
REF_TAG: 'ref:';
SP: ' '+;
Q: '"';
short_name: SHORT_NAME;
long_name: LONG_NAME;
name_ref: REF_TAG (short_name | (Q long_name Q));
item: (short_name | (Q long_name Q)) | name_ref;
query: item (SP item)*;
But that doesn't work. Any ideas what's the problem? Probably, that's important: my first query
should be treated as 3 item
s (3 name
s) and "my first query"
is 1 item
(1 long_name
).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
ANTLR 的词法分析器贪婪地匹配:这就是为什么像
我的第一个查询
这样的输入被标记为LONG_NAME
,而不是3个之间有空格的SHORT_NAME
。只需删除
LONG_NAME
规则并在解析器规则long_name
中定义它即可。以下语法:
将解析输入:
如下:
但是,您可以 还在词法分析器中对带引号的名称进行标记,并使用一些自定义代码从中删除引号。从词法分析器中删除空格也是一种选择。像这样的东西:
它将解析相同的输入,如下所示:
请注意,实际令牌
LONG_NAME
将被去除其开始和结束引号。ANTLR's lexer matches greedily: that is why input like
my first query
is being tokenized asLONG_NAME
instead of 3SHORT_NAME
s with spaces in between.Simply remove the
LONG_NAME
rule and define it in the parser rulelong_name
.The following grammar:
will parse the input:
as follows:
However, you could also tokenize a quoted name in the lexer and strip the quotes from it with a bit of custom code. And removing spaces from the lexer could also be an option. Something like this:
which would parse the same input as follows:
Note that the actual token
LONG_NAME
will be stripped of its start- and end-quote.这是一个应该满足您的要求的语法:
如果您将其放在顶部:
您应该看到词法分析器很好地分解了所有内容(我希望;-))
应该给出:
我不是 100% 确定您的问题是什么语法,但我怀疑问题与您对不带引号的
LONG_NAME
的定义有关。或许你能明白其中的区别是什么?Here's a grammar that should work for your requirements:
If you put this at the top:
You should see the lexer break everything apart nicely (I hope ;-) )
Should give:
I'm not 100% sure what the problem is with your grammar, but I suspect the issue relates to your definition of a
LONG_NAME
without the quotes. Perhaps you can see what the distinction is?