解析器组合器没有终止 - 如何记录发生了什么?
我正在尝试解析器组合器,并且经常遇到看似无限递归的情况。这是我遇到的第一个问题:
import util.parsing.combinator.Parsers
import util.parsing.input.CharSequenceReader
class CombinatorParserTest extends Parsers {
type Elem = Char
def notComma = elem("not comma", _ != ',')
def notEndLine = elem("not end line", x => x != '\r' && x != '\n')
def text = rep(notComma | notEndLine)
}
object CombinatorParserTest {
def main(args:Array[String]): Unit = {
val p = new CombinatorParserTest()
val r = p.text(new CharSequenceReader(","))
// does not get here
println(r)
}
}
如何打印正在发生的事情?为什么这还没有结束?
I am experimenting with parser combinators and I often run into what seems like infinite recursions. Here is the first one I ran into:
import util.parsing.combinator.Parsers
import util.parsing.input.CharSequenceReader
class CombinatorParserTest extends Parsers {
type Elem = Char
def notComma = elem("not comma", _ != ',')
def notEndLine = elem("not end line", x => x != '\r' && x != '\n')
def text = rep(notComma | notEndLine)
}
object CombinatorParserTest {
def main(args:Array[String]): Unit = {
val p = new CombinatorParserTest()
val r = p.text(new CharSequenceReader(","))
// does not get here
println(r)
}
}
How can I print what is going on? And why does this not finish?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
记录解析
notComma
和notEndLine
的尝试表明它是文件结尾(在 log(...)("mesg ”)输出)正在被重复解析。以下是我为此目的修改解析器的方法:我不完全确定发生了什么(我尝试了语法的许多变体),但我认为它是这样的: EOF 并不是真正人为引入输入流的字符,而是输入末尾的一种永久条件。因此,这个从未被消耗的 EOF 伪字符被重复解析为“要么不是逗号,要么不是行尾”。
Logging the attempts to parse
notComma
andnotEndLine
show that it is the end-of-file (shown as a CTRL-Z in the log(...)("mesg") output) that is being repeatedly parsed. Here's how I modified your parser for this purpose:I'm not entirely sure what's going on (I tried many variations on your grammar), but I think it's something like this: The EOF is not really a character artificially introduced into the input stream, but rather a sort of perpetual condition at the end of the input. Thus this never-consumed EOF pseudo-character is repeatedly parsed as "either not a comma or not an end-of-line."
好吧,我想我已经弄清楚了。 `CharSequenceReader 返回 '\032' 作为输入结束的标记。因此,如果我像这样修改输入,它就会起作用:
请参阅
CharSequenceReader
此处。如果scaladoc提到它,它会节省我很多时间。Ok, I think I've figured this out. `CharSequenceReader returns '\032' as a marker for the end of the input. So if I modify my input like this, it works:
See source code for
CharSequenceReader
here. If the scaladoc mentioned it, it would have saved me a lot of time.我发现日志记录功能输入起来非常困难。比如为什么我必须做
log(parser)("string")
?为什么不使用像parser.log("string")
这样简单的东西呢?不管怎样,为了克服这个问题,我做了这个:现在在你的解析器中,你可以像这样混合这个特征:
如果需要的话,你还可以覆盖
debug
字段来关闭日志记录。运行此命令还显示第二个解析器正确解析了逗号:
I find the logging function is extremely awkward to type. Like why do I have to do
log(parser)("string")
? Why not have something as simple asparser.log("string")
?. Anyways, to overcome that, I made this instead:Now in your parser, you can mix-in this trait like so:
You can also override the
debug
field to turn off the logging if so desired.Running this also shows the second parser correctly parsed the comma: