ANTLR：匹配未转义的字符？

发布于 2024-10-06 22:13:10 字数 259 浏览 2 评论 0原文

我有一个类似的规则，

charGroup
    : '[' .+ ']';

但我猜它会匹配 [abc\] 之类的东西。假设我希望它仅匹配未转义的 ]，我该怎么做？在正则表达式中，我会使用负向后查找。

编辑：如果可能的话，我也希望它不贪婪/懒惰。从而只匹配[a][b]中的[a]。

原文

I've got a rule like,

charGroup
    : '[' .+ ']';

But I'm guessing that'll match something like [abc\]. Assuming I want it to match only unescaped ]s, how do I do that? In a regular expression I'd use a negative look-behind.

Edit: I'd also like it to be ungreedy/lazy if possible. So as to match only [a] in [a][b].

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

夜夜流光相皎洁 2024-10-13 22:13:10

您可能想做类似的事情：

charGroup
  :  '[' ('\\' . | ~('\\' | ']'))+ ']'
  ;

其中 ~('\\' | ']') 匹配除 \ 和 ] 之外的单个字符。请注意，您只能否定单个字符！不存在 ~('ab') 这样的东西。另一个经常犯的错误是否定内部解析器规则不会否定字符，而是否定标记。一个示例可能如下：

foo : ~(A | D);

A : 'a';
B : 'b';
C : 'c';
D : ~A;

现在解析器规则 foo 匹配标记 B 或标记 C （因此只有字符 'b' 和 'c'），而词法分析器规则 D 匹配除 'a' 之外的任何字符。

You probably wanted to do something like:

charGroup
  :  '[' ('\\' . | ~('\\' | ']'))+ ']'
  ;

where ~('\\' | ']') matches a single character other than \ and ]. Note that you can only negate single characters! There's no such thing as ~('ab'). Another mistake often made is that negating inside parser rules does not negate a character, but a token instead. An example might be in order:

foo : ~(A | D);

A : 'a';
B : 'b';
C : 'c';
D : ~A;

Now parser rule foo matches either token B or token C (so only the characters 'b' and 'c') while lexer rule D matches any character other than 'a'.

回复收藏 0 原文