具有错误恢复功能的 GLR 解析器：输入出现错误时有太多选择

发布于 2024-10-06 16:34:29 字数 633 浏览 5 评论 0原文

序言

我编写了具有错误恢复功能的 GLR 解析器。当遇到错误时，它会分为以下几种选择：

将预期的元素插入输入中（可能是用户刚刚错过了它）并照常继续。
将错误的元素替换为预期的元素（可能是用户刚刚输入错误）并照常继续。
跳过错误的元素，如果下一个元素也错误，则转到#2。

但是，如果输入有很多错误（例如，用户错误地将 JPEG 文件提供给解析器），则替代方案的数量会呈指数级增长。

示例

这样的解析器对应于以下语法：

Program -> Identifier WS Identifier WS '=' WS Identifier
Identifier -> ('a'..'z' | 'A'..'Z' | '0'..'9')*
WS -> ' '*

应用于以下文本：

x = "abc\"def"; y = "ghi\"jkl";

在中等现代的台式计算机上失败并显示“内存不足”。

问题

如果输入错误，如何减少备选数量？

原文

Preamble

I have written GLR-parser with error recovery. When it encounters an error it splits into following alternatives:

Insert the expected element into input (may be the user just missed it) and proceed as usual.
Replace the erroneous element with expected one (may be the user just made a mistype) and proceed as usual.
Skip erroneous element and if next element is also erroneous then go to #2.

But if input has a lot of errors (for example, user has by mistake given JPEG file to the parser) a number of alternatives grows exponentially.

Example

Such a parser corresponding to the following grammar:

Program -> Identifier WS Identifier WS '=' WS Identifier
Identifier -> ('a'..'z' | 'A'..'Z' | '0'..'9')*
WS -> ' '*

applied to the following text:

x = "abc\"def"; y = "ghi\"jkl";

fails with "out of memory" on moderately modern desktop computer.

Question

How to reduce number of alternatives in case of errors in input?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

浅笑轻吟梦一曲 2024-10-13 16:34:29

在字符级别进行 GLR（因此解析）错误纠正是可能的，但会加剧您的问题。

我们使用的 GLR 错误恢复过程对令牌进行操作，因此它并没有那么糟糕。

但当输入有大量错误时，就很难恢复。更复杂的错误恢复方案基本上使用解析器来识别输入中语言的有效子字符串，然后尝试将子字符串修补在一起以获得结果。这是相当雄心勃勃的。

我已经构建了具有错误恢复功能的 GLR 解析器。我并没有那么雄心勃勃。一般来说
当活动解析器的数量超过“大量”（例如，10,000）或遇到的语法错误数量超过阈值（例如，10或20）时，解析器通常会中止。如果解析器在最后一秒没有推进输入流，您可能会考虑中止解析器，这是一个间接标志，它有太多实时解析器。

回复收藏 0 原文

~没有更多了~