如何进一步改进基于 Scala 解析器组合器的解析器中的错误消息?

发布于 2024-09-03 02:34:41 字数 1336 浏览 11 评论 0 原文

我已经编写了一个基于 Scala 解析器组合器的解析器:

class SxmlParser extends RegexParsers with ImplicitConversions with PackratParsers {
    [...]
    lazy val document: PackratParser[AstNodeDocument] =
        ((procinst | element | comment | cdata | whitespace | text)*) ^^ {
            AstNodeDocument(_)
        }
    [...]
}
object SxmlParser {
    def parse(text: String): AstNodeDocument = {
        var ast = AstNodeDocument()
        val parser = new SxmlParser()
        val result = parser.parseAll(parser.document, new CharArrayReader(text.toArray))
        result match {
            case parser.Success(x, _) => ast = x
            case parser.NoSuccess(err, next) => {
                tool.die("failed to parse SXML input " +
                    "(line " + next.pos.line + ", column " + next.pos.column + "):\n" +
                    err + "\n" +
                    next.pos.longString)
            }
        }
        ast
    }
}

通常生成的解析错误消息相当不错。但有时,

sxml: ERROR: failed to parse SXML input (line 32, column 1):
`"' expected but `' found
^

如果引号字符未关闭且解析器到达 EOT,就会发生这种情况。我想在这里看到的是(1)当解析器期望 '"' (我有多个)时,解析器处于什么生成式中,以及(2)此生成式在输入中开始解析的位置(这是一个指示符,其中有谁知道如何改进错误消息并在错误发生时包含有关实际内部解析状态的更多信息(也许可以在此处合理给出类似生产规则堆栈跟踪或其他内容以更好地识别)。顺便说一句,上面的“第 32 行,第 1 列”实际上是 EOT 位置,因此当然在这里没有用。

I've coded a parser based on Scala parser combinators:

class SxmlParser extends RegexParsers with ImplicitConversions with PackratParsers {
    [...]
    lazy val document: PackratParser[AstNodeDocument] =
        ((procinst | element | comment | cdata | whitespace | text)*) ^^ {
            AstNodeDocument(_)
        }
    [...]
}
object SxmlParser {
    def parse(text: String): AstNodeDocument = {
        var ast = AstNodeDocument()
        val parser = new SxmlParser()
        val result = parser.parseAll(parser.document, new CharArrayReader(text.toArray))
        result match {
            case parser.Success(x, _) => ast = x
            case parser.NoSuccess(err, next) => {
                tool.die("failed to parse SXML input " +
                    "(line " + next.pos.line + ", column " + next.pos.column + "):\n" +
                    err + "\n" +
                    next.pos.longString)
            }
        }
        ast
    }
}

Usually the resulting parsing error messages are rather nice. But sometimes it becomes just

sxml: ERROR: failed to parse SXML input (line 32, column 1):
`"' expected but `' found
^

This happens if a quote characters is not closed and the parser reaches the EOT. What I would like to see here is (1) what production the parser was in when it expected the '"' (I've multiple ones) and (2) where in the input this production started parsing (which is an indicator where the opening quote is in the input). Does anybody know how I can improve the error messages and include more information about the actual internal parsing state when the error happens (perhaps something like a production rule stacktrace or whatever can be given reasonably here to better identify the error location). BTW, the above "line 32, column 1" is actually the EOT position and hence of no use here, of course.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

何以畏孤独 2024-09-10 02:34:41

我还不知道如何处理(1),但当我发现这个网页时,我也在寻找(2):

https://wiki.scala-lang.org/plugins/viewsource/viewpagesrc.action?pageId=917624

我只是复制信息:

一个有用的增强功能是记录重要标记的输入位置(行号和列号)。为此,您必须做三件事:

  • 使每个输出类型扩展 scala.util.parsing.input.Positional
  • 调用 Parsers.positioned() 组合器
  • 使用记录行和列位置的文本源

最后,确保源跟踪位置。对于流,您可以简单地使用 scala.util.parsing.input.StreamReader;对于字符串,请使用 scala.util.parsing.input.CharArrayReader。

我目前正在使用它,所以稍后我会尝试添加一个简单的示例

I don't know yet how to deal with (1), but I was also looking for (2) when I found this webpage:

https://wiki.scala-lang.org/plugins/viewsource/viewpagesrc.action?pageId=917624

I'm just copying the information:

A useful enhancement is to record the input position (line number and column number) of the significant tokens. To do this, you must do three things:

  • Make each output type extend scala.util.parsing.input.Positional
  • invoke the Parsers.positioned() combinator
  • Use a text source that records line and column positions

and

Finally, ensure that the source tracks positions. For streams, you can simply use scala.util.parsing.input.StreamReader; for Strings, use scala.util.parsing.input.CharArrayReader.

I'm currently playing with it so I'll try to add a simple example later

天赋异禀 2024-09-10 02:34:41

在这种情况下,您可以使用 errfailure~! 以及专门为匹配错误而设计的生产规则。

In such cases you may use err, failure and ~! with production rules designed specifically to match the error.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文