使用 Scala 解析器组合器的解析方案
我正在 Scala 中编写一个小型方案解释器,并且在解析方案中的列表时遇到问题。我的代码解析包含多个数字、标识符和布尔值的列表,但如果我尝试解析包含多个字符串或列表的列表,它就会阻塞。我缺少什么?
这是我的解析器:
class SchemeParsers extends RegexParsers {
// Scheme boolean #t and #f translate to Scala's true and false
def bool : Parser[Boolean] =
("#t" | "#f") ^^ {case "#t" => true; case "#f" => false}
// A Scheme identifier allows alphanumeric chars, some symbols, and
// can't start with a digit
def id : Parser[String] =
"""[a-zA-Z=*+/<>!\?][a-zA-Z0-9=*+/<>!\?]*""".r ^^ {case s => s}
// This interpreter only accepts numbers as integers
def num : Parser[Int] = """-?\d+""".r ^^ {case s => s toInt}
// A string can have any character except ", and is wrapped in "
def str : Parser[String] = '"' ~> """[^""]*""".r <~ '"' ^^ {case s => s}
// A Scheme list is a series of expressions wrapped in ()
def list : Parser[List[Any]] =
'(' ~> rep(expr) <~ ')' ^^ {s: List[Any] => s}
// A Scheme expression contains any of the other constructions
def expr : Parser[Any] = id | str | num | bool | list ^^ {case s => s}
}
I'm writing a small scheme interpreter in Scala and I'm running into problems parsing lists in Scheme. My code parses lists that contain multiple numbers, identifiers, and booleans, but it chokes if I try to parse a list containing multiple strings or lists. What am I missing?
Here's my parser:
class SchemeParsers extends RegexParsers {
// Scheme boolean #t and #f translate to Scala's true and false
def bool : Parser[Boolean] =
("#t" | "#f") ^^ {case "#t" => true; case "#f" => false}
// A Scheme identifier allows alphanumeric chars, some symbols, and
// can't start with a digit
def id : Parser[String] =
"""[a-zA-Z=*+/<>!\?][a-zA-Z0-9=*+/<>!\?]*""".r ^^ {case s => s}
// This interpreter only accepts numbers as integers
def num : Parser[Int] = """-?\d+""".r ^^ {case s => s toInt}
// A string can have any character except ", and is wrapped in "
def str : Parser[String] = '"' ~> """[^""]*""".r <~ '"' ^^ {case s => s}
// A Scheme list is a series of expressions wrapped in ()
def list : Parser[List[Any]] =
'(' ~> rep(expr) <~ ')' ^^ {s: List[Any] => s}
// A Scheme expression contains any of the other constructions
def expr : Parser[Any] = id | str | num | bool | list ^^ {case s => s}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
正如@Gabe 正确指出的那样,您留下了一些未处理的空白:
As it was correctly pointed out by @Gabe, you left some white-spaces unhandled:
该代码的唯一问题是您使用字符而不是字符串。下面,我删除了多余的
^^ { case s =>; s }
并将所有字符替换为字符串。下面我将进一步讨论这个问题。所有
解析器
对其Elem
类型都有隐式accept
。因此,如果基本元素是Char
,例如在RegexParsers
中,那么它们有一个隐式接受操作,这就是符号(
、)
和"
,它们是代码中的字符。RegexParsers
自动执行的操作是跳过空格(定义为 < code>protected val whiteSpace = """\s+""".r,因此您可以在任何String
或Regex
的开头自动覆盖它)如果出现错误消息,它还负责将定位光标移过空格,您似乎没有意识到
“以空格开头的字符串”
。它的前缀空格从解析的输出中删除,这不太可能是您想要的:-)另外,由于
\s
包含新行,因此在任何标识符之前都可以接受新行。或者可能不是您想要的,您可以通过重写
skipWhiteSpace
来禁用整个正则表达式中的空格跳过。另一方面,默认的skipWhiteSpace
测试whiteSpace
的长度,因此您可以通过操纵whiteSpace
的值来打开和关闭它code> 贯穿整个解析过程。The only problem with the code is your usage of characters instead of strings. Below, I removed the redundant
^^ { case s => s }
and replaced all characters with strings. I'll further discuss this issue below.All
Parsers
have an implicitaccept
for theirElem
types. So, if the basic element is aChar
, such as inRegexParsers
, then there's an implicit accept action for them, which is what happens here for the symbols(
,)
and"
, which are characters in your code.What
RegexParsers
do automatically is to skip white spaces (defined asprotected val whiteSpace = """\s+""".r
, so you could override that) automatically at the beginning of anyString
orRegex
. It also takes care of moving the positioning cursor past the white space in case of error messages.One consequence of this that you seem not to have realized is that
" a string beginning with a space"
will have its prefix space removed from the parsed output, which is very unlikely to be something you want. :-)Also, since
\s
includes new lines, a new line will be acceptable before any identifier, which may or may not be what you want.You may disable space skipping in your regex as a whole by overrideing
skipWhiteSpace
. On the other hand, the defaultskipWhiteSpace
tests forwhiteSpace
's length, so you could potentially turn it on and off just by manipulating the value ofwhiteSpace
throughout the parsing process.