ANTLR4错误恢复问题的问题

发布于 2025-01-18 04:59:19 字数 1692 浏览 6 评论 0原文

我在 ANTLR4 中发现了一个关于错误恢复的奇怪问题。如果我从 ANTLR 书中获取语法示例

grammar simple;

prog:   classDef+ ; // match one or more class definitions

classDef
    :   'class' ID '{' member+ '}' // a class has one or more members
    ;

member
    :   'int' ID ';'                       // field definition
    |   'int' f=ID '(' ID ')' '{' stat '}' // method definition
    ;

stat:   expr ';'
    |   ID '=' expr ';'
    ;

expr:   INT 
    |   ID '(' INT ')'
    ;

INT :   [0-9]+ ;
ID  :   [a-zA-Z]+ ;
WS  :   [ \t\r\n]+ -> skip ;

并使用输入，

class T {
    y;
    int x;
}

它将把第一个成员视为错误（因为它期望在“y”之前有“int”）。

classDef
 | "class"
 | ID 'T'
 | "{"
 |- member
 |   | ID "y" -> error
 |   | ";" -> error
 |- member
 |   | "int"
 |   | ID "x"
 |   | ";"

在这种情况下，ANTLR4 从第一个成员子规则中的错误中恢复并正确解析第二个成员。

但是如果将成员classDef从强制的member+改为可选的member*

classDef
    :   'class' ID '{' member* '}' // a class has zero or more members
    ;

那么解析后的树就会变成这样

classDef
 | "class" -> error
 | ID "T" -> error
 | "{" -> error
 | ID "y" -> error
 | ";" -> error
 | "int" -> error
 | ID "x" -> error
 | ";" -> error
 | "}" -> error

看来错误恢复无法解决成员子规则内的问题不再存在。

显然，使用 member+ 是前进的方向，因为它提供了正确的错误恢复结果。但是我如何允许空的班级机构呢？我在语法中遗漏了什么吗？

DefaultErrorStrategy 类非常复杂，包含标记删除和插入，本书很好地解释了该类的理论。但我这里缺少的是如何针对特定规则实现自定义错误恢复？

就我而言，我会添加类似“如果 { 已被消耗，请尝试查找 int 或 }”之类的内容来优化此错误恢复规则。

这是否可以通过 ANTLR4 错误恢复以合理的方式实现？或者我是否必须手动实现手动解析器才能真正控制这些用例的错误恢复？

原文

I've found a strange issue regarding error recovery in ANTLR4. If I take the grammar example from the ANTLR book

grammar simple;

prog:   classDef+ ; // match one or more class definitions

classDef
    :   'class' ID '{' member+ '}' // a class has one or more members
    ;

member
    :   'int' ID ';'                       // field definition
    |   'int' f=ID '(' ID ')' '{' stat '}' // method definition
    ;

stat:   expr ';'
    |   ID '=' expr ';'
    ;

expr:   INT 
    |   ID '(' INT ')'
    ;

INT :   [0-9]+ ;
ID  :   [a-zA-Z]+ ;
WS  :   [ \t\r\n]+ -> skip ;

and use the input

class T {
    y;
    int x;
}

it will see the first member as an error (as it expects 'int' before 'y').

classDef
 | "class"
 | ID 'T'
 | "{"
 |- member
 |   | ID "y" -> error
 |   | ";" -> error
 |- member
 |   | "int"
 |   | ID "x"
 |   | ";"

In this case ANTLR4 recovers from the error in the first member subrule and parses the second member correct.

But if the member classDef is changed from mandatory member+ to optional member*

classDef
    :   'class' ID '{' member* '}' // a class has zero or more members
    ;

then the parsed tree will look like

classDef
 | "class" -> error
 | ID "T" -> error
 | "{" -> error
 | ID "y" -> error
 | ";" -> error
 | "int" -> error
 | ID "x" -> error
 | ";" -> error
 | "}" -> error

It seems that the error recovery cannot solve the issue inside the member subrule anymore.

Obviously using member+ is the way forward as it provides the correct error recovery result. But how do I allow empty class bodies? Am I missing something in the grammar?

The DefaultErrorStrategy class is quite complex with token deletions and insertions and the book explains the theory of this class in a very good way. But what I'm missing here is how to implement custom error recovery for specific rules?

In my case I would add something like "if { is already consumed, try to find int or }" to optimize the error recovery for this rule.

Is this possible with ANTLR4 error recovery in a reasonable way at all? Or do I have to implement manual parser by hand to really gain control over error recovery for those use cases?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

狂之美人 2025-01-25 04:59:19

值得注意的是，解析器永远不会输入给定输入的子规则。 classDef 规则在尝试匹配成员之前失败。

在尝试解析子规则之前，会调用 DefaultErrorStrategy 上的 sync 方法。此同步会识别出存在问题，并尝试通过删除单个令牌来恢复，看看是否可以解决问题。

在这种情况下，它不会，因此会引发异常，然后消耗令牌，直到找到“类”令牌。这是有道理的，因为这就是 classDef 后面可以遵循的内容，并且是 classDef 规则，而不是此时失败的成员规则。

正确执行看起来并不简单，但如果您安装 DefaultErrorStrategy 的自定义子类并重写 sync() 方法，您可以获得您喜欢的任何恢复策略。

类似以下内容可能是一个起点：

@Override
public void sync(Parser recognizer) throws RecognitionException {
  if (recognizer.getContext() instanceof simpleParser.ClassDefContext) {
    return;
  }

  super.sync(recognizer);
}

结果是同步不会失败，并且成员规则被执行。解析第一个成员失败，默认恢复方法将处理移动到类中的下一个成员。

It is worth noting that the parser never enters the sub rule for the given input. The classDef rule fails before trying to match a member.

Before trying to parse the sub-rule, the sync method on DefaultErrorStrategy is called. This sync recognizes there is a problem and tries to recover by deleting a single token to see if that fixes things up.

In this case it doesn't, so an exception is thrown and then tokens are consumed until a 'class' token is found. This makes sense because that is what can follow a classDef and it is the classDef rule, not the member rule that is failing at this point.

It doesn't look simple to do correctly, but if you install a custom subclass of DefaultErrorStrategy and override the sync() method, you can get any recovery strategy you like.

Something like the following could be a starting point:

@Override
public void sync(Parser recognizer) throws RecognitionException {
  if (recognizer.getContext() instanceof simpleParser.ClassDefContext) {
    return;
  }

  super.sync(recognizer);
}

The result being that the sync doesn't fail, and the member rule is executed. Parsing the first member fails, and the default recovery method handles moving on to the next member in the class.

回复收藏 0 原文

~没有更多了~