Scala 解析器组合器递归 bnf 的技巧?

发布于 2024-09-11 06:07:11 字数 2712 浏览 11 评论 0原文

我试图匹配这个语法:

pgm ::= exprs
exprs ::= expr [; exprs]
expr ::= ID | expr . [0-9]+

我的 scala packrat 解析器组合器看起来像这样:

import scala.util.parsing.combinator.PackratParsers
import scala.util.parsing.combinator.syntactical._

object Dotter extends StandardTokenParsers with PackratParsers {
    lexical.delimiters ++= List(".",";")
    def pgm = repsep(expr,";")
    def expr :Parser[Any]= ident | expr~"."~num
    def num = numericLit

       def parse(input: String) =
    phrase(pgm)(new PackratReader(new lexical.Scanner(input))) match {
      case Success(result, _) => println("Success!"); Some(result)
      case n @ _ => println(n);println("bla"); None
    }  

    def main(args: Array[String]) {
      val prg = "x.1.2.3;" +
            "y.4.1.1;" +
            "z;" +
            "n.1.10.30"


            parse(prg);
    }
}

但这不起作用。要么它“匹配贪婪”并告诉我:

[1.2] failure: end of input expected 
x.1.2.3;y.4.1.1;z;n.1.10.30

或者如果我将 | 更改为 ||| 我会得到一个 stackoverflow:

Exception in thread "main" java.lang.StackOverflowError
at java.lang.Character.isLetter(Unknown Source)
at java.lang.Character.isLetter(Unknown Source)
at scala.util.parsing.combinator.lexical.Lexical$$anonfun$letter$1.apply(Lexical.scala:32)
at scala.util.parsing.combinator.lexical.Lexical$$anonfun$letter$1.apply(Lexical.scala:32)
...

我大概明白为什么会出现错误;我该怎么做才能解析上面这样的语法?对我来说这似乎并不深奥

编辑: 基于 http://scala 中引用的论文-programming-language.1934581.n4.nabble.com/Packrat-parser-guidance-td1956908.html 我发现我的程序实际上并没有使用新的 Packrat 解析器。

IE。将 Parser[Any] 更改为 PackratParser[Any] 并使用 lazy val 而不是 def

我将上面的内容重写为这:

import scala.util.parsing.combinator.PackratParsers
import scala.util.parsing.combinator.syntactical._

object Dotter extends StandardTokenParsers with PackratParsers {
    lexical.delimiters ++= List(".",";")
    lazy val pgm : PackratParser[Any] = repsep(expr,";")
    lazy val expr :PackratParser[Any]= expr~"."~num | ident
    lazy val num = numericLit

    def parse(input: String) =
    phrase(pgm)(new PackratReader(new lexical.Scanner(input))) match {
      case Success(result, _) => println("Success!"); Some(result)
      case n @ _ => println(n);println("bla"); None
    }  

    def main(args: Array[String]) {
      val prg = "x.1.2.3 ;" +
            "y.4.1.1;" +
            "z;" +
            "n.1.10.30"


            parse(prg);
    }
}

Im trying to match this syntax:

pgm ::= exprs
exprs ::= expr [; exprs]
expr ::= ID | expr . [0-9]+

My scala packrat parser combinator looks like this:

import scala.util.parsing.combinator.PackratParsers
import scala.util.parsing.combinator.syntactical._

object Dotter extends StandardTokenParsers with PackratParsers {
    lexical.delimiters ++= List(".",";")
    def pgm = repsep(expr,";")
    def expr :Parser[Any]= ident | expr~"."~num
    def num = numericLit

       def parse(input: String) =
    phrase(pgm)(new PackratReader(new lexical.Scanner(input))) match {
      case Success(result, _) => println("Success!"); Some(result)
      case n @ _ => println(n);println("bla"); None
    }  

    def main(args: Array[String]) {
      val prg = "x.1.2.3;" +
            "y.4.1.1;" +
            "z;" +
            "n.1.10.30"


            parse(prg);
    }
}

But this doesnt work. Either it "matches greedy" and tells me:

[1.2] failure: end of input expected 
x.1.2.3;y.4.1.1;z;n.1.10.30

or if I change the | to a ||| I get a stackoverflow:

Exception in thread "main" java.lang.StackOverflowError
at java.lang.Character.isLetter(Unknown Source)
at java.lang.Character.isLetter(Unknown Source)
at scala.util.parsing.combinator.lexical.Lexical$anonfun$letter$1.apply(Lexical.scala:32)
at scala.util.parsing.combinator.lexical.Lexical$anonfun$letter$1.apply(Lexical.scala:32)
...

I kindoff understand why I get the errors; what can I do to parse a syntax like the above? It doesnt seem that esoteric to me

EDIT:
Based on the paper referenced in http://scala-programming-language.1934581.n4.nabble.com/Packrat-parser-guidance-td1956908.html
I found out that my program didnt actually use the new packrat parser.

Ie. change Parser[Any] to PackratParser[Any] and use lazy val instead of def

I rewrote the above to this:

import scala.util.parsing.combinator.PackratParsers
import scala.util.parsing.combinator.syntactical._

object Dotter extends StandardTokenParsers with PackratParsers {
    lexical.delimiters ++= List(".",";")
    lazy val pgm : PackratParser[Any] = repsep(expr,";")
    lazy val expr :PackratParser[Any]= expr~"."~num | ident
    lazy val num = numericLit

    def parse(input: String) =
    phrase(pgm)(new PackratReader(new lexical.Scanner(input))) match {
      case Success(result, _) => println("Success!"); Some(result)
      case n @ _ => println(n);println("bla"); None
    }  

    def main(args: Array[String]) {
      val prg = "x.1.2.3 ;" +
            "y.4.1.1;" +
            "z;" +
            "n.1.10.30"


            parse(prg);
    }
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

小…红帽 2024-09-18 06:07:11

问题是(至少部分地)你实际上并没有使用 Packrat 解析器。请参阅 Scala 的文档 PackratParsers 特征,上面写着

使用 PackratParsers 非常相似
使用解析器:

  • 任何扩展解析器的类/特征(直接或通过
    子类)可以混合在 PackratParsers 中。
    示例:对象 MyGrammar 扩展
    标准令牌解析器
    Packrat解析器
  • 之前声明为 def 的每个语法产生式没有
    形式参数变成惰性值,
    它的类型从
    Parser[Elem] 到 PackratParser[Elem]。
    例如,def 生产:
    Parser[Int] = {...} 变为惰性 val
    生产:PackratParser[Int] = {...}
  • 重要提示:使用 PackratParsers 并不是一个全有或全无的决定。
    它们可以与普通的自由混合
    单一语法中的解析器。

我对 Scala 2.8 的解析器组合器了解不够,无法完全解决这个问题,但通过以下修改,我能够让它解析到分号,这比您已完成的工作有所改进。

object Dotter extends StandardTokenParsers with PackratParsers {
    lexical.delimiters ++= List(".",";")
    lazy val pgm:PackratParser[Any] = repsep(expr,";")
    lazy val expr:PackratParser[Any]= ident ||| (expr~"."~numericLit)

    def parse(input: String) = phrase(expr)(lex(input)) match {
      case Success(result, _) => println("Success!"); Some(result)
      case n @ _ => println(n);println("bla"); None
    }  

    def lex(input:String) = new PackratReader(new lexical.Scanner(input))
}

The problem is (at least partially) that you're not actually using Packrat parsers. See the documentation for Scala's PackratParsers trait, which says

Using PackratParsers is very similar
to using Parsers:

  • any class/trait that extends Parsers (directly or through a
    subclass) can mix in PackratParsers.
    Example: object MyGrammar extends
    StandardTokenParsers with
    PackratParsers
  • each grammar production previously declared as a def without
    formal parameters becomes a lazy val,
    and its type is changed from
    Parser[Elem] to PackratParser[Elem].
    So, for example, def production:
    Parser[Int] = {...} becomes lazy val
    production: PackratParser[Int] = {...}
  • Important: using PackratParsers is not an all or nothing decision.
    They can be free mixed with regular
    Parsers in a single grammar.

I don't know enough about Scala 2.8's parser combinators to fix this entirely, but with the following modifications, I was able to get it to parse as far as the semicolon, which is an improvement over what you've accomplished.

object Dotter extends StandardTokenParsers with PackratParsers {
    lexical.delimiters ++= List(".",";")
    lazy val pgm:PackratParser[Any] = repsep(expr,";")
    lazy val expr:PackratParser[Any]= ident ||| (expr~"."~numericLit)

    def parse(input: String) = phrase(expr)(lex(input)) match {
      case Success(result, _) => println("Success!"); Some(result)
      case n @ _ => println(n);println("bla"); None
    }  

    def lex(input:String) = new PackratReader(new lexical.Scanner(input))
}
小傻瓜 2024-09-18 06:07:11

产生式

expr ::= ID | expr . [0-9]+

是递归的。它扩展到

expr ::= ID
expr ::= expr . [0-9]+

第二行发生左递归的位置。这就是导致解析器溢出堆栈的原因。

您应该重写语法,避免左递归产生式。

expr ::= ID {. [0-9]+}

The production

expr ::= ID | expr . [0-9]+

is left recursive. It expands to

expr ::= ID
expr ::= expr . [0-9]+

where the left recursion occurs on the 2nd line. This is what causes the parser to overflow the stack.

You should rewrite your grammar avoiding left recursive productions.

expr ::= ID {. [0-9]+}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文