这是解析器组合器库的合理基础吗?
我最近一直在使用 FParsec,我发现通用解析器的缺乏是我的一个主要停止点。我对这个小库的目标是简单性以及对通用输入的支持。你能想到有什么可以改善这一点的补充吗?或者有什么特别糟糕的吗?
open LazyList type State<'a, 'b> (input:LazyList<'a>, data:'b) = member this.Input = input member this.Data = data type Result<'a, 'b, 'c> = | Success of 'c * State<'a, 'b> | Failure of string * State<'a, 'b> type Parser<'a,'b, 'c> = State<'a, 'b> -> Result<'a, 'b, 'c> let (>>=) left right state = match left state with | Success (result, state) -> (right result) state | Failure (message, _) -> Result<'a, 'b, 'd>.Failure (message, state) let (<|>) left right state = match left state with | Success (_, _) as result -> result | Failure (_, _) -> right state let (|>>) parser transform state = match parser state with | Success (result, state) -> Success (transform result, state) | Failure (message, _) -> Failure (message, state) let (<?>) parser errorMessage state = match parser state with | Success (_, _) as result -> result | Failure (_, _) -> Failure (errorMessage, state) type ParseMonad() = member this.Bind (f, g) = f >>= g member this.Return x s = Success(x, s) member this.Zero () s = Failure("", s) member this.Delay (f:unit -> Parser<_,_,_>) = f() let parse = ParseMonad()
回溯
令人惊讶的是,它并没有花费太多代码来实现您所描述的内容。虽然有点草率,但似乎效果很好。
let (>>=) left right state =
seq {
for res in left state do
match res with
| Success(v, s) ->
let v =
right v s
|> List.tryFind (
fun res ->
match res with
| Success (_, _) -> true
| _ -> false
)
match v with
| Some v -> yield v
| None -> ()
} |> Seq.toList
let (<|>) left right state =
left state @ right state
回溯第 2 部分
将代码切换为使用惰性列表和尾部调用优化递归。
let (>>=) left right state =
let rec readRight lst =
match lst with
| Cons (x, xs) ->
match x with
| Success (r, s) as q -> LazyList.ofList [q]
| Failure (m, s) -> readRight xs
| Nil -> LazyList.empty<Result<'a, 'b, 'd>>
let rec readLeft lst =
match lst with
| Cons (x, xs) ->
match x with
| Success (r, s) ->
match readRight (right r s) with
| Cons (x, xs) ->
match x with
| Success (r, s) as q -> LazyList.ofList [q]
| Failure (m, s) -> readRight xs
| Nil -> readLeft xs
| Failure (m, s) -> readLeft xs
| Nil -> LazyList.empty<Result<'a, 'b, 'd>>
readLeft (left state)
let (<|>) (left:Parser<'a, 'b, 'c>) (right:Parser<'a, 'b, 'c>) state =
LazyList.delayed (fun () -> left state)
|> LazyList.append
<| LazyList.delayed (fun () -> right state)
I've been working with FParsec lately and I found that the lack of generic parsers is a major stopping point for me. My goal for this little library is simplicity as well as support for generic input. Can you think of any additions that would improve this or is anything particularly bad?
open LazyList type State<'a, 'b> (input:LazyList<'a>, data:'b) = member this.Input = input member this.Data = data type Result<'a, 'b, 'c> = | Success of 'c * State<'a, 'b> | Failure of string * State<'a, 'b> type Parser<'a,'b, 'c> = State<'a, 'b> -> Result<'a, 'b, 'c> let (>>=) left right state = match left state with | Success (result, state) -> (right result) state | Failure (message, _) -> Result<'a, 'b, 'd>.Failure (message, state) let (<|>) left right state = match left state with | Success (_, _) as result -> result | Failure (_, _) -> right state let (|>>) parser transform state = match parser state with | Success (result, state) -> Success (transform result, state) | Failure (message, _) -> Failure (message, state) let (<?>) parser errorMessage state = match parser state with | Success (_, _) as result -> result | Failure (_, _) -> Failure (errorMessage, state) type ParseMonad() = member this.Bind (f, g) = f >>= g member this.Return x s = Success(x, s) member this.Zero () s = Failure("", s) member this.Delay (f:unit -> Parser<_,_,_>) = f() let parse = ParseMonad()
Backtracking
Surprisingly it didn't take too much code to implement what you describe. It is a bit sloppy but seems to work quite well.
let (>>=) left right state =
seq {
for res in left state do
match res with
| Success(v, s) ->
let v =
right v s
|> List.tryFind (
fun res ->
match res with
| Success (_, _) -> true
| _ -> false
)
match v with
| Some v -> yield v
| None -> ()
} |> Seq.toList
let (<|>) left right state =
left state @ right state
Backtracking Part 2
Switched around the code to use lazy lists and tail-call optimized recursion.
let (>>=) left right state =
let rec readRight lst =
match lst with
| Cons (x, xs) ->
match x with
| Success (r, s) as q -> LazyList.ofList [q]
| Failure (m, s) -> readRight xs
| Nil -> LazyList.empty<Result<'a, 'b, 'd>>
let rec readLeft lst =
match lst with
| Cons (x, xs) ->
match x with
| Success (r, s) ->
match readRight (right r s) with
| Cons (x, xs) ->
match x with
| Success (r, s) as q -> LazyList.ofList [q]
| Failure (m, s) -> readRight xs
| Nil -> readLeft xs
| Failure (m, s) -> readLeft xs
| Nil -> LazyList.empty<Result<'a, 'b, 'd>>
readLeft (left state)
let (<|>) (left:Parser<'a, 'b, 'c>) (right:Parser<'a, 'b, 'c>) state =
LazyList.delayed (fun () -> left state)
|> LazyList.append
<| LazyList.delayed (fun () -> right state)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为您需要做出的一个重要的设计决策是是否要在解析器中支持回溯(我不太记得解析理论,但这可能指定了解析器可以处理的语言类型)。
回溯。在您的实现中,解析器可能会失败(
Failure
情况),也可能只产生一个结果(Success
情况)。另一种选择是生成零个或多个结果(例如,将结果表示为seq<'c>
)。抱歉,如果这是您已经考虑过的事情:-),但无论如何...区别在于您的解析器始终遵循第一个可能的选项。例如,如果您编写如下内容:
使用您的实现,对于输入“abcd”,这将失败。它将选择
<|>
运算符的第一个分支,然后该分支将处理前两个字符,并且序列中的下一个解析器将失败。基于序列的实现将能够回溯并遵循<|>
中的第二条路径并解析输入。Combine。我想到的另一个想法是,您还可以将
Combine
成员添加到解析器计算构建器中。这有点微妙(因为您需要了解计算表达式是如何转换的),但有时它可能很有用。如果添加:然后您可以很好地编写递归解析器:
I think that one important design decision that you'll need to make is whether you want to support backtracking in your parsers or not (I don't remember much about parsing theory, but this probably specifies the types of languages that your parser can handle).
Backtracking. In your implementation, a parser can either fail (the
Failure
case) or produce exactly one result (theSuccess
case). An alternative option is to generate zero or more results (for example, represent results asseq<'c>
). Sorry if this is something you already considered :-), but anyway...The difference is that your parser always follows the first possible option. For example, if you write something like the following:
Using your implementation, this will fail for input "abcd". It will choose the first branch of the
<|>
operator, which will then process first two characters and the next parser in the sequence will fail. An implementation based on sequences would be able to backtrack and follow the second path in<|>
and parse the input.Combine. Another idea that occurs to me is that you could also add
Combine
member to your parser computation builder. This is a bit subtle (because you need to understand how computation expressions are translated), but it can be sometimes useful. If you add:You can then write recursive parsers nicely: