如何进一步改进基于 Scala 解析器组合器的解析器中的错误消息?
我已经编写了一个基于 Scala 解析器组合器的解析器:
class SxmlParser extends RegexParsers with ImplicitConversions with PackratParsers {
[...]
lazy val document: PackratParser[AstNodeDocument] =
((procinst | element | comment | cdata | whitespace | text)*) ^^ {
AstNodeDocument(_)
}
[...]
}
object SxmlParser {
def parse(text: String): AstNodeDocument = {
var ast = AstNodeDocument()
val parser = new SxmlParser()
val result = parser.parseAll(parser.document, new CharArrayReader(text.toArray))
result match {
case parser.Success(x, _) => ast = x
case parser.NoSuccess(err, next) => {
tool.die("failed to parse SXML input " +
"(line " + next.pos.line + ", column " + next.pos.column + "):\n" +
err + "\n" +
next.pos.longString)
}
}
ast
}
}
通常生成的解析错误消息相当不错。但有时,
sxml: ERROR: failed to parse SXML input (line 32, column 1):
`"' expected but `' found
^
如果引号字符未关闭且解析器到达 EOT,就会发生这种情况。我想在这里看到的是(1)当解析器期望 '"' (我有多个)时,解析器处于什么生成式中,以及(2)此生成式在输入中开始解析的位置(这是一个指示符,其中有谁知道如何改进错误消息并在错误发生时包含有关实际内部解析状态的更多信息(也许可以在此处合理给出类似生产规则堆栈跟踪或其他内容以更好地识别)。顺便说一句,上面的“第 32 行,第 1 列”实际上是 EOT 位置,因此当然在这里没有用。
I've coded a parser based on Scala parser combinators:
class SxmlParser extends RegexParsers with ImplicitConversions with PackratParsers {
[...]
lazy val document: PackratParser[AstNodeDocument] =
((procinst | element | comment | cdata | whitespace | text)*) ^^ {
AstNodeDocument(_)
}
[...]
}
object SxmlParser {
def parse(text: String): AstNodeDocument = {
var ast = AstNodeDocument()
val parser = new SxmlParser()
val result = parser.parseAll(parser.document, new CharArrayReader(text.toArray))
result match {
case parser.Success(x, _) => ast = x
case parser.NoSuccess(err, next) => {
tool.die("failed to parse SXML input " +
"(line " + next.pos.line + ", column " + next.pos.column + "):\n" +
err + "\n" +
next.pos.longString)
}
}
ast
}
}
Usually the resulting parsing error messages are rather nice. But sometimes it becomes just
sxml: ERROR: failed to parse SXML input (line 32, column 1):
`"' expected but `' found
^
This happens if a quote characters is not closed and the parser reaches the EOT. What I would like to see here is (1) what production the parser was in when it expected the '"' (I've multiple ones) and (2) where in the input this production started parsing (which is an indicator where the opening quote is in the input). Does anybody know how I can improve the error messages and include more information about the actual internal parsing state when the error happens (perhaps something like a production rule stacktrace or whatever can be given reasonably here to better identify the error location). BTW, the above "line 32, column 1" is actually the EOT position and hence of no use here, of course.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我还不知道如何处理(1),但当我发现这个网页时,我也在寻找(2):
https://wiki.scala-lang.org/plugins/viewsource/viewpagesrc.action?pageId=917624
我只是复制信息:
并
我目前正在使用它,所以稍后我会尝试添加一个简单的示例
I don't know yet how to deal with (1), but I was also looking for (2) when I found this webpage:
https://wiki.scala-lang.org/plugins/viewsource/viewpagesrc.action?pageId=917624
I'm just copying the information:
and
I'm currently playing with it so I'll try to add a simple example later
在这种情况下,您可以使用
err
、failure
和~!
以及专门为匹配错误而设计的生产规则。In such cases you may use
err
,failure
and~!
with production rules designed specifically to match the error.