解析器组合器没有终止 - 如何记录发生了什么?

发布于 2024-08-23 13:58:12 字数 632 浏览 3 评论 0原文

我正在尝试解析器组合器,并且经常遇到看似无限递归的情况。这是我遇到的第一个问题:

import util.parsing.combinator.Parsers
import util.parsing.input.CharSequenceReader

class CombinatorParserTest extends Parsers {

  type Elem = Char

  def notComma = elem("not comma", _ != ',')

  def notEndLine = elem("not end line", x => x != '\r' && x != '\n')

  def text = rep(notComma | notEndLine)

}

object CombinatorParserTest {

  def main(args:Array[String]): Unit = {
    val p = new CombinatorParserTest()
    val r = p.text(new CharSequenceReader(","))
    // does not get here
    println(r)
  }

}

如何打印正在发生的事情?为什么这还没有结束?

I am experimenting with parser combinators and I often run into what seems like infinite recursions. Here is the first one I ran into:

import util.parsing.combinator.Parsers
import util.parsing.input.CharSequenceReader

class CombinatorParserTest extends Parsers {

  type Elem = Char

  def notComma = elem("not comma", _ != ',')

  def notEndLine = elem("not end line", x => x != '\r' && x != '\n')

  def text = rep(notComma | notEndLine)

}

object CombinatorParserTest {

  def main(args:Array[String]): Unit = {
    val p = new CombinatorParserTest()
    val r = p.text(new CharSequenceReader(","))
    // does not get here
    println(r)
  }

}

How can I print what is going on? And why does this not finish?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

失与倦" 2024-08-30 13:58:12

记录解析 notCommanotEndLine 的尝试表明它是文件结尾(在 log(...)("mesg ”)输出)正在被重复解析。以下是我为此目的修改解析器的方法:

def text = rep(log(notComma)("notComma") | log(notEndLine)("notEndLine"))

我不完全确定发生了什么(我尝试了语法的许多变体),但我认为它是这样的: EOF 并不是真正人为引入输入流的字符,而是输入末尾的一种永久条件。因此,这个从未被消耗的 EOF 伪字符被重复解析为“要么不是逗号,要么不是行尾”。

Logging the attempts to parse notComma and notEndLine show that it is the end-of-file (shown as a CTRL-Z in the log(...)("mesg") output) that is being repeatedly parsed. Here's how I modified your parser for this purpose:

def text = rep(log(notComma)("notComma") | log(notEndLine)("notEndLine"))

I'm not entirely sure what's going on (I tried many variations on your grammar), but I think it's something like this: The EOF is not really a character artificially introduced into the input stream, but rather a sort of perpetual condition at the end of the input. Thus this never-consumed EOF pseudo-character is repeatedly parsed as "either not a comma or not an end-of-line."

冷了相思 2024-08-30 13:58:12

好吧,我想我已经弄清楚了。 `CharSequenceReader 返回 '\032' 作为输入结束的标记。因此,如果我像这样修改输入,它就会起作用:

import util.parsing.combinator.Parsers
import util.parsing.input.CharSequenceReader

class CombinatorParserTest extends Parsers {

  type Elem = Char

  import CharSequenceReader.EofCh

  def notComma = elem("not comma", x => x != ',' && x!=EofCh)

  def notEndLine = elem("not end line", x => x != '\r' && x != '\n' && x!=EofCh)

  //def text = rep(notComma | notEndLine)
  def text = rep(log(notComma)("notComma") | log(notEndLine)("notEndLine"))

}

object CombinatorParserTest {

  def main(args:Array[String]): Unit = {
    val p = new CombinatorParserTest()
    val r = p.text(new CharSequenceReader(","))
    println(r)
  }

}

请参阅 CharSequenceReader 此处。如果scaladoc提到它,它会节省我很多时间。

Ok, I think I've figured this out. `CharSequenceReader returns '\032' as a marker for the end of the input. So if I modify my input like this, it works:

import util.parsing.combinator.Parsers
import util.parsing.input.CharSequenceReader

class CombinatorParserTest extends Parsers {

  type Elem = Char

  import CharSequenceReader.EofCh

  def notComma = elem("not comma", x => x != ',' && x!=EofCh)

  def notEndLine = elem("not end line", x => x != '\r' && x != '\n' && x!=EofCh)

  //def text = rep(notComma | notEndLine)
  def text = rep(log(notComma)("notComma") | log(notEndLine)("notEndLine"))

}

object CombinatorParserTest {

  def main(args:Array[String]): Unit = {
    val p = new CombinatorParserTest()
    val r = p.text(new CharSequenceReader(","))
    println(r)
  }

}

See source code for CharSequenceReader here. If the scaladoc mentioned it, it would have saved me a lot of time.

你丑哭了我 2024-08-30 13:58:12

我发现日志记录功能输入起来非常困难。比如为什么我必须做 log(parser)("string")?为什么不使用像 parser.log("string") 这样简单的东西呢?不管怎样,为了克服这个问题,我做了这个:

trait Logging { self: Parsers =>

    // Used to turn logging on or off
    val debug: Boolean

    // Much easier than having to wrap a parser with a log function and type a message
    // i.e. log(someParser)("Message") vs someParser.log("Message")
    implicit class Logged[+A](parser: Parser[A]) {
        def log(msg: String): Parser[A] =
            if (debug) self.log(parser)(msg) else parser
    }
}

现在在你的解析器中,你可以像这样混合这个特征:

import scala.util.parsing.combinator.Parsers
import scala.util.parsing.input.CharSequenceReader


object CombinatorParserTest extends App with Parsers with Logging {

    type Elem = Char

    override val debug: Boolean = true

    def notComma: Parser[Char] = elem("not comma", _ != ',')
    def notEndLine: Parser[Char] = elem("not end line", x => x != '\r' && x != '\n')
    def text: Parser[List[Char]] = rep(notComma.log("notComma") | notEndLine.log("notEndLine"))

    val r = text(new CharSequenceReader(","))

    println(r)
}

如果需要的话,你还可以覆盖 debug 字段来关闭日志记录。

运行此命令还显示第二个解析器正确解析了逗号:

trying notComma at scala.util.parsing.input.CharSequenceReader@506e6d5e
notComma --> [1.1] failure: not comma expected

,
^
trying notEndLine at scala.util.parsing.input.CharSequenceReader@506e6d5e
notEndLine --> [1.2] parsed: ,
trying notComma at scala.util.parsing.input.CharSequenceReader@15975490
notComma --> [1.2] failure: end of input

,
 ^
trying notEndLine at scala.util.parsing.input.CharSequenceReader@15975490
notEndLine --> [1.2] failure: end of input

,
 ^
The result is List(,)

Process finished with exit code 0

I find the logging function is extremely awkward to type. Like why do I have to do log(parser)("string")? Why not have something as simple as parser.log("string")?. Anyways, to overcome that, I made this instead:

trait Logging { self: Parsers =>

    // Used to turn logging on or off
    val debug: Boolean

    // Much easier than having to wrap a parser with a log function and type a message
    // i.e. log(someParser)("Message") vs someParser.log("Message")
    implicit class Logged[+A](parser: Parser[A]) {
        def log(msg: String): Parser[A] =
            if (debug) self.log(parser)(msg) else parser
    }
}

Now in your parser, you can mix-in this trait like so:

import scala.util.parsing.combinator.Parsers
import scala.util.parsing.input.CharSequenceReader


object CombinatorParserTest extends App with Parsers with Logging {

    type Elem = Char

    override val debug: Boolean = true

    def notComma: Parser[Char] = elem("not comma", _ != ',')
    def notEndLine: Parser[Char] = elem("not end line", x => x != '\r' && x != '\n')
    def text: Parser[List[Char]] = rep(notComma.log("notComma") | notEndLine.log("notEndLine"))

    val r = text(new CharSequenceReader(","))

    println(r)
}

You can also override the debug field to turn off the logging if so desired.

Running this also shows the second parser correctly parsed the comma:

trying notComma at scala.util.parsing.input.CharSequenceReader@506e6d5e
notComma --> [1.1] failure: not comma expected

,
^
trying notEndLine at scala.util.parsing.input.CharSequenceReader@506e6d5e
notEndLine --> [1.2] parsed: ,
trying notComma at scala.util.parsing.input.CharSequenceReader@15975490
notComma --> [1.2] failure: end of input

,
 ^
trying notEndLine at scala.util.parsing.input.CharSequenceReader@15975490
notEndLine --> [1.2] failure: end of input

,
 ^
The result is List(,)

Process finished with exit code 0
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文