FParsec 中的尾递归

发布于 2025-01-05 00:32:57 字数 2309 浏览 6 评论 0原文

我遇到了具有两个递归分支的解析器的问题。为了更容易地演示这个问题,我使用了 Luca Bolognese 写的文章为例:

<前><代码><表达式>; ::= <名称>; | <功能> | <应用> <名称> ::= 非空白字符序列 <功能> ::= \ <名称>; 。 <正文> <正文> ::= <表达式>; <应用> ::= ( <函数表达式> <参数表达式> ) <函数表达式> ::= <表达式>; <自变量表达> ::= <表达式>;

文章中的解析器非常简洁:

let ws = " \t\n" 
let specialChars = ".)(\\\n" 

let pWs = spaces 
let pName = manyChars (noneOf (ws + specialChars)) |>> EName 

let pExpr, pExprRef = createParserForwardedToRef<Expression, Unit>() 

let curry2 f a b = f(a,b) 
let pFunction = pchar '\\' >>. pipe2 pName (pchar '.' >>. pExpr) (curry2 Function) 

let pApplication = pchar '(' >>. pipe2 pExpr (pWs >>. pExpr) (curry2 Application)
                            .>> pWs .>> pchar ')'

do pExprRef := pFunction <|> pApplication <|> pName 

let pExpressions = sepBy pExpr spaces1 

let fparseString text = 
    match run pExpressions text with 
    | Success(result, _, _)   -> result 
    | Failure(errorMsg, _, _) -> failwith (sprintf "Failure: %s" errorMsg) 

我对 pApplication 感兴趣,因为它们由两个 pExpr 组成,而这两个 pExpr 又可能是 pApplication也。解析器在以下基准测试中耗尽了堆栈空间:

let generateString level =
    let rec loop i =
        seq {
                if i < level then
                    yield "("
                    yield! loop level
                    yield " "
                    yield! loop (i+1)
                    yield ")"
                else 
                    yield "(x x)"
        }
    loop 0 |> String.concat ""

let N = 5000
let s = generateString N;; 
let _ = fparseString s;;

如何将解析器重写为尾递归?

当我尝试为类似 Lisp 的语言并用真实的基准测试它。我有 TermVarBinding ,它们是相互递归的类型,还有一个 let 解析器,它表现出与上面的 pApplication 相同的问题。我的解析器位于 github 以防分析错误。 - 递归问题。

I have encounter a problem with parsers having two branches of recursion. To demonstrate the problem easier, I use a simple grammar of a lambda calculus from the article written by Luca Bolognese as the example:

<expression> ::= <name> | <function> | <application>  
<name> ::= non­blank character sequence  
<function> ::= \ <name> . <body>  
<body> ::= <expression>  
<application> ::= ( <function expression> <argument expression> )  
<function expression> ::= <expression>  
<argument expression> ::= <expression>

The parser in the article is quite concise:

let ws = " \t\n" 
let specialChars = ".)(\\\n" 

let pWs = spaces 
let pName = manyChars (noneOf (ws + specialChars)) |>> EName 

let pExpr, pExprRef = createParserForwardedToRef<Expression, Unit>() 

let curry2 f a b = f(a,b) 
let pFunction = pchar '\\' >>. pipe2 pName (pchar '.' >>. pExpr) (curry2 Function) 

let pApplication = pchar '(' >>. pipe2 pExpr (pWs >>. pExpr) (curry2 Application)
                            .>> pWs .>> pchar ')'

do pExprRef := pFunction <|> pApplication <|> pName 

let pExpressions = sepBy pExpr spaces1 

let fparseString text = 
    match run pExpressions text with 
    | Success(result, _, _)   -> result 
    | Failure(errorMsg, _, _) -> failwith (sprintf "Failure: %s" errorMsg) 

I'm interested in pApplication since they consist of two pExprs which in turn could be pApplications too. The parser runs out of stack space in the following benchmark:

let generateString level =
    let rec loop i =
        seq {
                if i < level then
                    yield "("
                    yield! loop level
                    yield " "
                    yield! loop (i+1)
                    yield ")"
                else 
                    yield "(x x)"
        }
    loop 0 |> String.concat ""

let N = 5000
let s = generateString N;; 
let _ = fparseString s;;

How can I rewrite the parser to be tail-recursive?

I recognized the problem when trying to write a parser for a Lisp-like language and test it with real benchmarks. I have Term and VarBinding which are mutually recursive types and a let parser which exhibits the same issue as pApplication above. My parser is on github in case the analysis is wrong regarding the not tail-recursive problem.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

梦太阳 2025-01-12 00:32:57

FParsec 的内置解析器组合器通常不允许尾递归解析器实现,主要是因为错误处理的实现方式。

这意味着 FParsec 解析器的递归深度受到堆栈大小的限制——就像大多数递归下降解析器的情况一样。当然,这不会影响 manysepBychainl 等序列的解析,因为这些 FParsec 组合器都有以下实现:恒定的堆栈空间使用情况。

.NET 中的默认堆栈大小通常足以使用 FParsec 以明确指定的格式解析人类生成的输入(假设您使用内置组合器解析序列)。然而,具有深度嵌套结构的机器生成的输入(例如 5000 级深度 s 表达式)很容易导致堆栈溢出。

如果您事先知道输入中的嵌套深度被限制在合理的值,您可以简单地 增加堆栈大小到适当的值。希望您的基准数据也是如此。

否则,您可能必须为语法的递归元素实现特殊用途的解析器函数。您需要使用 -级别 FParsec API ,您显然必须实现此函数,使其使用堆而不是堆栈来跟踪的嵌套上下文和中间解析结果。

The built-in parser combinators of FParsec generally don't allow for a tail-recursive parser implementation, mainly because of the way the error handling is implemented.

This means that the recursion depth of an FParsec parser is limited by the stack size -- as is the case for most recursive descent parsers. Of course, this doesn't affect the parsing of sequences with many, sepBy, chainl etc., because these FParsec combinators all have implementations with constant stack space usage.

The default stack size in .NET is usually more than enough for parsing human-generated input in well-specified formats with FParsec (assuming you parse sequences with the built-in combinators). However, machine-generated input with a deeply nested structure (like your 5000 level deep s-expressions) can easily lead to a stack overflow.

If you know in advance that the nesting depth in your input is limited to a reasonable value, you could simply increase the stack size to an appropriate value. Hopefully this is true for your benchmark data.

Otherwise you may have to implement a special purpose parser function for the recursive element(s) of your grammar. You would need to implement this function using the low-level API of FParsec and you would obviously have to implement this function such that it uses the heap instead of the stack for keeping track of the nesting context and intermediate parsing results.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文