Scala 解析器组合器递归 bnf 的技巧?
我试图匹配这个语法:
pgm ::= exprs
exprs ::= expr [; exprs]
expr ::= ID | expr . [0-9]+
我的 scala packrat 解析器组合器看起来像这样:
import scala.util.parsing.combinator.PackratParsers
import scala.util.parsing.combinator.syntactical._
object Dotter extends StandardTokenParsers with PackratParsers {
lexical.delimiters ++= List(".",";")
def pgm = repsep(expr,";")
def expr :Parser[Any]= ident | expr~"."~num
def num = numericLit
def parse(input: String) =
phrase(pgm)(new PackratReader(new lexical.Scanner(input))) match {
case Success(result, _) => println("Success!"); Some(result)
case n @ _ => println(n);println("bla"); None
}
def main(args: Array[String]) {
val prg = "x.1.2.3;" +
"y.4.1.1;" +
"z;" +
"n.1.10.30"
parse(prg);
}
}
但这不起作用。要么它“匹配贪婪”并告诉我:
[1.2] failure: end of input expected
x.1.2.3;y.4.1.1;z;n.1.10.30
或者如果我将 |
更改为 |||
我会得到一个 stackoverflow:
Exception in thread "main" java.lang.StackOverflowError
at java.lang.Character.isLetter(Unknown Source)
at java.lang.Character.isLetter(Unknown Source)
at scala.util.parsing.combinator.lexical.Lexical$$anonfun$letter$1.apply(Lexical.scala:32)
at scala.util.parsing.combinator.lexical.Lexical$$anonfun$letter$1.apply(Lexical.scala:32)
...
我大概明白为什么会出现错误;我该怎么做才能解析上面这样的语法?对我来说这似乎并不深奥
编辑: 基于 http://scala 中引用的论文-programming-language.1934581.n4.nabble.com/Packrat-parser-guidance-td1956908.html 我发现我的程序实际上并没有使用新的 Packrat 解析器。
IE。将 Parser[Any]
更改为 PackratParser[Any]
并使用 lazy val
而不是 def
我将上面的内容重写为这:
import scala.util.parsing.combinator.PackratParsers
import scala.util.parsing.combinator.syntactical._
object Dotter extends StandardTokenParsers with PackratParsers {
lexical.delimiters ++= List(".",";")
lazy val pgm : PackratParser[Any] = repsep(expr,";")
lazy val expr :PackratParser[Any]= expr~"."~num | ident
lazy val num = numericLit
def parse(input: String) =
phrase(pgm)(new PackratReader(new lexical.Scanner(input))) match {
case Success(result, _) => println("Success!"); Some(result)
case n @ _ => println(n);println("bla"); None
}
def main(args: Array[String]) {
val prg = "x.1.2.3 ;" +
"y.4.1.1;" +
"z;" +
"n.1.10.30"
parse(prg);
}
}
Im trying to match this syntax:
pgm ::= exprs
exprs ::= expr [; exprs]
expr ::= ID | expr . [0-9]+
My scala packrat parser combinator looks like this:
import scala.util.parsing.combinator.PackratParsers
import scala.util.parsing.combinator.syntactical._
object Dotter extends StandardTokenParsers with PackratParsers {
lexical.delimiters ++= List(".",";")
def pgm = repsep(expr,";")
def expr :Parser[Any]= ident | expr~"."~num
def num = numericLit
def parse(input: String) =
phrase(pgm)(new PackratReader(new lexical.Scanner(input))) match {
case Success(result, _) => println("Success!"); Some(result)
case n @ _ => println(n);println("bla"); None
}
def main(args: Array[String]) {
val prg = "x.1.2.3;" +
"y.4.1.1;" +
"z;" +
"n.1.10.30"
parse(prg);
}
}
But this doesnt work. Either it "matches greedy" and tells me:
[1.2] failure: end of input expected
x.1.2.3;y.4.1.1;z;n.1.10.30
or if I change the |
to a |||
I get a stackoverflow:
Exception in thread "main" java.lang.StackOverflowError
at java.lang.Character.isLetter(Unknown Source)
at java.lang.Character.isLetter(Unknown Source)
at scala.util.parsing.combinator.lexical.Lexical$anonfun$letter$1.apply(Lexical.scala:32)
at scala.util.parsing.combinator.lexical.Lexical$anonfun$letter$1.apply(Lexical.scala:32)
...
I kindoff understand why I get the errors; what can I do to parse a syntax like the above? It doesnt seem that esoteric to me
EDIT:
Based on the paper referenced in http://scala-programming-language.1934581.n4.nabble.com/Packrat-parser-guidance-td1956908.html
I found out that my program didnt actually use the new packrat parser.
Ie. change Parser[Any]
to PackratParser[Any]
and use lazy val
instead of def
I rewrote the above to this:
import scala.util.parsing.combinator.PackratParsers
import scala.util.parsing.combinator.syntactical._
object Dotter extends StandardTokenParsers with PackratParsers {
lexical.delimiters ++= List(".",";")
lazy val pgm : PackratParser[Any] = repsep(expr,";")
lazy val expr :PackratParser[Any]= expr~"."~num | ident
lazy val num = numericLit
def parse(input: String) =
phrase(pgm)(new PackratReader(new lexical.Scanner(input))) match {
case Success(result, _) => println("Success!"); Some(result)
case n @ _ => println(n);println("bla"); None
}
def main(args: Array[String]) {
val prg = "x.1.2.3 ;" +
"y.4.1.1;" +
"z;" +
"n.1.10.30"
parse(prg);
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
问题是(至少部分地)你实际上并没有使用 Packrat 解析器。请参阅 Scala 的文档 PackratParsers 特征,上面写着
我对 Scala 2.8 的解析器组合器了解不够,无法完全解决这个问题,但通过以下修改,我能够让它解析到分号,这比您已完成的工作有所改进。
The problem is (at least partially) that you're not actually using Packrat parsers. See the documentation for Scala's PackratParsers trait, which says
I don't know enough about Scala 2.8's parser combinators to fix this entirely, but with the following modifications, I was able to get it to parse as far as the semicolon, which is an improvement over what you've accomplished.
产生式
是递归的。它扩展到
第二行发生左递归的位置。这就是导致解析器溢出堆栈的原因。
您应该重写语法,避免左递归产生式。
The production
is left recursive. It expands to
where the left recursion occurs on the 2nd line. This is what causes the parser to overflow the stack.
You should rewrite your grammar avoiding left recursive productions.