当前位置：文江博客话题详情

否定内部词法分析器和解析器规则

发布于 2024-12-18 00:33:57 字数 53 浏览 8 评论 0原文

如何在 ANTLR 的词法分析器和解析器规则中使用否定元字符 ~？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

红墙和绿瓦 2024-12-25 00:33:57

否定可以发生在词法分析器和解析器规则内。

在词法分析器规则内，您可以否定字符，在解析器规则内，您可以否定标记（词法分析器规则）。但是词法分析器和解析器规则只能分别否定单个字符或单个标记。

几个例子：

词法分析器规则

要匹配除小写 ascii 字母之外的一个或多个字符，您可以执行以下操作：

NO_LOWERCASE : ~('a'..'z')+ ;

(negation-meta-char, ~, 的优先级高于 +，所以上面的规则等于 (~('a'..'z'))+)

请注意 'a'..'z ' 匹配单个字符（并且可以因此被否定），但是下面的规则是无效的：

ANY_EXCEPT_AB : ~('ab') ;

因为'ab'（显然）匹配2个字符，所以它不能被否定。要匹配由 2 个字符组成的标记，但不匹配 'ab'，您必须执行以下操作：

ANY_EXCEPT_AB 
  :  'a' ~'b' // any two chars starting with 'a' followed by any other than 'b'
  |  ~'a' .   // other than 'a' followed by any char
  ;

解析器规则

在解析器规则内，~ 否定某个标记，或多个令牌。例如，您定义了以下标记：

A : 'A';
B : 'B';
C : 'C';
D : 'D';
E : 'E';

如果您现在想要匹配除 A 之外的任何标记，则可以：

p : ~A ;

如果您想要匹配除 B 之外的任何标记并且D，您可以这样做：

p : ~(B | D) ;

但是，如果您想匹配除 A 后跟 B 之外的任何两个标记，您不能这样做：

p : ~(A B) ;

就像词法分析器规则一样，您不能否定多个标记。要完成上述任务，您需要执行以下操作：

P
  :  A ~B
  |  ~A .
  ;

请注意，解析器规则中的 . (DOT) 字符不匹配任何字符，因为它在词法分析器规则内执行。在解析器规则内，它匹配任何标记（A、B、C、D 或 E< /代码>，在本例中）。

请注意，您不能否定解析器规则。以下行为是非法的：

p : ~a ;
a : A  ;

Negating can occur inside lexer and parser rules.

Inside lexer rules you can negate characters, and inside parser rules you can negate tokens (lexer rules). But both lexer- and parser rules can only negate either single characters, or single tokens, respectively.

A couple of examples:

lexer rules

To match one or more characters except lowercase ascii letters, you can do:

NO_LOWERCASE : ~('a'..'z')+ ;

(the negation-meta-char, ~, has a higher precedence than the +, so the rule above equals (~('a'..'z'))+)

Note that 'a'..'z' matches a single character (and can therefor be negated), but the following rule is invalid:

ANY_EXCEPT_AB : ~('ab') ;

Because 'ab' (obviously) matches 2 characters, it cannot be negated. To match a token that consists of 2 character, but not 'ab', you'd have to do the following:

ANY_EXCEPT_AB 
  :  'a' ~'b' // any two chars starting with 'a' followed by any other than 'b'
  |  ~'a' .   // other than 'a' followed by any char
  ;

parser rules

Inside parser rules, ~ negates a certain token, or more than one token. For example, you have the following tokens defined:

A : 'A';
B : 'B';
C : 'C';
D : 'D';
E : 'E';

If you now want to match any token except the A, you do:

p : ~A ;

And if you want to match any token except B and D, you can do:

p : ~(B | D) ;

However, if you want to match any two tokens other than A followed by B, you cannot do:

p : ~(A B) ;

Just as with lexer rules, you cannot negate more than a single token. To accomplish the above, you need to do:

P
  :  A ~B
  |  ~A .
  ;

Note that the . (DOT) char in a parser rules does not match any character as it does inside lexer rules. Inside parser rules, it matches any token (A, B, C, D or E, in this case).

Note that you cannot negate parser rules. The following is illegal: