使用 Scala 解析器组合器的解析方案

发布于 2024-10-06 20:19:28 字数 1119 浏览 0 评论 0原文

我正在 Scala 中编写一个小型方案解释器,并且在解析方案中的列表时遇到问题。我的代码解析包含多个数字、标识符和布尔值的列表,但如果我尝试解析包含多个字符串或列表的列表,它就会阻塞。我缺少什么?

这是我的解析器:

class SchemeParsers extends RegexParsers {

// Scheme boolean #t and #f translate to Scala's true and false
def bool : Parser[Boolean] = 
    ("#t" | "#f") ^^ {case "#t" => true; case "#f" => false}

// A Scheme identifier allows alphanumeric chars, some symbols, and 
// can't start with a digit
def id : Parser[String] = 
    """[a-zA-Z=*+/<>!\?][a-zA-Z0-9=*+/<>!\?]*""".r ^^ {case s => s}

// This interpreter only accepts numbers as integers
def num : Parser[Int] = """-?\d+""".r ^^ {case s => s toInt}

// A string can have any character except ", and is wrapped in "
def str : Parser[String] = '"' ~> """[^""]*""".r <~ '"' ^^ {case s => s}

// A Scheme list is a series of expressions wrapped in ()
def list : Parser[List[Any]] = 
    '(' ~> rep(expr) <~ ')' ^^ {s: List[Any] => s}

// A Scheme expression contains any of the other constructions
def expr : Parser[Any] = id | str | num | bool | list ^^ {case s => s}
} 

I'm writing a small scheme interpreter in Scala and I'm running into problems parsing lists in Scheme. My code parses lists that contain multiple numbers, identifiers, and booleans, but it chokes if I try to parse a list containing multiple strings or lists. What am I missing?

Here's my parser:

class SchemeParsers extends RegexParsers {

// Scheme boolean #t and #f translate to Scala's true and false
def bool : Parser[Boolean] = 
    ("#t" | "#f") ^^ {case "#t" => true; case "#f" => false}

// A Scheme identifier allows alphanumeric chars, some symbols, and 
// can't start with a digit
def id : Parser[String] = 
    """[a-zA-Z=*+/<>!\?][a-zA-Z0-9=*+/<>!\?]*""".r ^^ {case s => s}

// This interpreter only accepts numbers as integers
def num : Parser[Int] = """-?\d+""".r ^^ {case s => s toInt}

// A string can have any character except ", and is wrapped in "
def str : Parser[String] = '"' ~> """[^""]*""".r <~ '"' ^^ {case s => s}

// A Scheme list is a series of expressions wrapped in ()
def list : Parser[List[Any]] = 
    '(' ~> rep(expr) <~ ')' ^^ {s: List[Any] => s}

// A Scheme expression contains any of the other constructions
def expr : Parser[Any] = id | str | num | bool | list ^^ {case s => s}
} 

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

ら栖息 2024-10-13 20:19:28

正如@Gabe 正确指出的那样,您留下了一些未处理的空白:

scala> object SchemeParsers extends RegexParsers {
     |
     | private def space  = regex("[ \\n]*".r)
     |
     | // Scheme boolean #t and #f translate to Scala's true and false
     | private def bool : Parser[Boolean] =
     |     ("#t" | "#f") ^^ {case "#t" => true; case "#f" => false}
     |
     | // A Scheme identifier allows alphanumeric chars, some symbols, and
     | // can't start with a digit
     | private def id : Parser[String] =
     |     """[a-zA-Z=*+/<>!\?][a-zA-Z0-9=*+/<>!\?]*""".r
     |
     | // This interpreter only accepts numbers as integers
     | private def num : Parser[Int] = """-?\d+""".r ^^ {case s => s toInt}
     |
     | // A string can have any character except ", and is wrapped in "
     | private def str : Parser[String] = '"' ~> """[^""]*""".r <~ '"' <~ space ^^ {case s => s}
     |
     | // A Scheme list is a series of expressions wrapped in ()
     | private def list : Parser[List[Any]] =
     |     '(' ~> space  ~> rep(expr) <~ ')' <~ space ^^ {s: List[Any] => s}
     |
     | // A Scheme expression contains any of the other constructions
     | private def expr : Parser[Any] = id | str | num | bool | list ^^ {case s => s}
     |
     | def parseExpr(str: String) = parse(expr, str)
     | }
defined module SchemeParsers   

scala> SchemeParsers.parseExpr("""(("a" "b") ("a" "b"))""")
res12: SchemeParsers.ParseResult[Any] = [1.22] parsed: List(List(a, b), List(a, b))

scala> SchemeParsers.parseExpr("""("a" "b" "c")""")
res13: SchemeParsers.ParseResult[Any] = [1.14] parsed: List(a, b, c)

scala> SchemeParsers.parseExpr("""((1) (1 2) (1 2 3))""")
res14: SchemeParsers.ParseResult[Any] = [1.20] parsed: List(List(1), List(1, 2), List(1, 2, 3))

As it was correctly pointed out by @Gabe, you left some white-spaces unhandled:

scala> object SchemeParsers extends RegexParsers {
     |
     | private def space  = regex("[ \\n]*".r)
     |
     | // Scheme boolean #t and #f translate to Scala's true and false
     | private def bool : Parser[Boolean] =
     |     ("#t" | "#f") ^^ {case "#t" => true; case "#f" => false}
     |
     | // A Scheme identifier allows alphanumeric chars, some symbols, and
     | // can't start with a digit
     | private def id : Parser[String] =
     |     """[a-zA-Z=*+/<>!\?][a-zA-Z0-9=*+/<>!\?]*""".r
     |
     | // This interpreter only accepts numbers as integers
     | private def num : Parser[Int] = """-?\d+""".r ^^ {case s => s toInt}
     |
     | // A string can have any character except ", and is wrapped in "
     | private def str : Parser[String] = '"' ~> """[^""]*""".r <~ '"' <~ space ^^ {case s => s}
     |
     | // A Scheme list is a series of expressions wrapped in ()
     | private def list : Parser[List[Any]] =
     |     '(' ~> space  ~> rep(expr) <~ ')' <~ space ^^ {s: List[Any] => s}
     |
     | // A Scheme expression contains any of the other constructions
     | private def expr : Parser[Any] = id | str | num | bool | list ^^ {case s => s}
     |
     | def parseExpr(str: String) = parse(expr, str)
     | }
defined module SchemeParsers   

scala> SchemeParsers.parseExpr("""(("a" "b") ("a" "b"))""")
res12: SchemeParsers.ParseResult[Any] = [1.22] parsed: List(List(a, b), List(a, b))

scala> SchemeParsers.parseExpr("""("a" "b" "c")""")
res13: SchemeParsers.ParseResult[Any] = [1.14] parsed: List(a, b, c)

scala> SchemeParsers.parseExpr("""((1) (1 2) (1 2 3))""")
res14: SchemeParsers.ParseResult[Any] = [1.20] parsed: List(List(1), List(1, 2), List(1, 2, 3))
通知家属抬走 2024-10-13 20:19:28

该代码的唯一问题是您使用字符而不是字符串。下面,我删除了多余的 ^^ { case s =>; s } 并将所有字符替换为字符串。下面我将进一步讨论这个问题。

class SchemeParsers extends RegexParsers {

// Scheme boolean #t and #f translate to Scala's true and false
def bool : Parser[Boolean] = 
    ("#t" | "#f") ^^ {case "#t" => true; case "#f" => false}

// A Scheme identifier allows alphanumeric chars, some symbols, and 
// can't start with a digit
def id : Parser[String] = 
    """[a-zA-Z=*+/<>!\?][a-zA-Z0-9=*+/<>!\?]*""".r ^^ {case s => s}

// This interpreter only accepts numbers as integers
def num : Parser[Int] = """-?\d+""".r ^^ {case s => s toInt}

// A string can have any character except ", and is wrapped in "
def str : Parser[String] = "\"" ~> """[^""]*""".r <~ "\""

// A Scheme list is a series of expressions wrapped in ()
def list : Parser[List[Any]] = 
    "(" ~> rep(expr) <~ ")" ^^ {s: List[Any] => s}

// A Scheme expression contains any of the other constructions
def expr : Parser[Any] = id | str | num | bool | list
}

所有解析器对其Elem类型都有隐式accept。因此,如果基本元素是 Char,例如在 RegexParsers 中,那么它们有一个隐式接受操作,这就是符号 ()",它们是代码中的字符。

RegexParsers 自动执行的操作是跳过空格(定义为 < code>protected val whiteSpace = """\s+""".r,因此您可以在任何 StringRegex 的开头自动覆盖它)如果出现错误消息,它还负责将定位光标移过空格,

您似乎没有意识到 “以空格开头的字符串” 。它的前缀空格从解析的输出中删除,这不太可能是您想要的:-)

另外,由于 \s 包含新行,因此在任何标识符之前都可以接受新行。或者可能不是您想要的,

您可以通过重写 skipWhiteSpace 来禁用整个正则表达式中的空格跳过。另一方面,默认的 skipWhiteSpace 测试 whiteSpace 的长度,因此您可以通过操纵 whiteSpace 的值来打开和关闭它code> 贯穿整个解析过程。

The only problem with the code is your usage of characters instead of strings. Below, I removed the redundant ^^ { case s => s } and replaced all characters with strings. I'll further discuss this issue below.

class SchemeParsers extends RegexParsers {

// Scheme boolean #t and #f translate to Scala's true and false
def bool : Parser[Boolean] = 
    ("#t" | "#f") ^^ {case "#t" => true; case "#f" => false}

// A Scheme identifier allows alphanumeric chars, some symbols, and 
// can't start with a digit
def id : Parser[String] = 
    """[a-zA-Z=*+/<>!\?][a-zA-Z0-9=*+/<>!\?]*""".r ^^ {case s => s}

// This interpreter only accepts numbers as integers
def num : Parser[Int] = """-?\d+""".r ^^ {case s => s toInt}

// A string can have any character except ", and is wrapped in "
def str : Parser[String] = "\"" ~> """[^""]*""".r <~ "\""

// A Scheme list is a series of expressions wrapped in ()
def list : Parser[List[Any]] = 
    "(" ~> rep(expr) <~ ")" ^^ {s: List[Any] => s}

// A Scheme expression contains any of the other constructions
def expr : Parser[Any] = id | str | num | bool | list
}

All Parsers have an implicit accept for their Elem types. So, if the basic element is a Char, such as in RegexParsers, then there's an implicit accept action for them, which is what happens here for the symbols (, ) and ", which are characters in your code.

What RegexParsers do automatically is to skip white spaces (defined as protected val whiteSpace = """\s+""".r, so you could override that) automatically at the beginning of any String or Regex. It also takes care of moving the positioning cursor past the white space in case of error messages.

One consequence of this that you seem not to have realized is that " a string beginning with a space" will have its prefix space removed from the parsed output, which is very unlikely to be something you want. :-)

Also, since \s includes new lines, a new line will be acceptable before any identifier, which may or may not be what you want.

You may disable space skipping in your regex as a whole by overrideing skipWhiteSpace. On the other hand, the default skipWhiteSpace tests for whiteSpace's length, so you could potentially turn it on and off just by manipulating the value of whiteSpace throughout the parsing process.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文