避免堆栈溢出(使用 F# 无限序列序列)

发布于 2024-07-20 23:53:12 字数 2887 浏览 10 评论 0原文

我有一个为 f# 中的 morris seq 编写的“学习代码”,它遭受堆栈溢出的困扰,我不知道如何避免。 “morris”返回“看和说”序列的无限序列(即,{{1}、{1,1}、{2,1}、{1,2,1,1}、{1,1,1 ,2,2,1}, {3,1,2,2,1,1},...})。

    let printList l =
        Seq.iter (fun n -> printf "%i" n) l
        printfn ""

    let rec morris s = 
        let next str = seq {
            let cnt = ref 1  // Stack overflow is below when enumerating
            for cur in [|0|] |> Seq.append str |> Seq.windowed 2 do
                if cur.[0] <> cur.[1] then
                    yield!( [!cnt ; cur.[0]] )
                    cnt := 0
                incr cnt
        }
        seq {
        yield s
        yield! morris (next s) // tail recursion, no stack overflow
        }

    // "main"
    // Print the nth iteration
    let _ =  [1] |> morris |> Seq.nth 3125 |> printList 

您可以使用 Seq.nth 来挑选第 n 次迭代,但在遇到堆栈溢出之前只能进行到此为止。 我拥有的一点递归是尾递归,它本质上构建了一组链接的枚举器。 问题不在这里。 这是在第 4000 个序列上调用“enum”的时候。 请注意,这是 F# 1.9.6.16 的情况,之前的版本最高可达 14000 以上)。 这是因为链接序列的解析方式。 序列是惰性的,因此“递归”是惰性的。 也就是说,seq n 调用 seq n-1,而 seq n-1 又调用 seq n-2,依此类推来获取第一项(第一个 # 是最坏的情况)。

我明白 [|0|] |> Seq.append str |> Seq.windowed 2 使我的问题变得更糟,如果消除它,我可以将生成的 # 增加三倍。 实际上,该代码运行良好。 morris 的第 3125 次迭代长度将超过 10^359 个字符。

我真正想要解决的问题是如何保留惰性评估,并且对我可以选择的迭代的堆栈大小没有限制。 我正在寻找合适的 F# 习惯用法来根据内存大小进行限制。

2010 年 10 月更新

在更好地学习 F#、一点点 Haskell、思考和学习之后, 研究这个问题一年多了,我终于可以回答我自己的问题了。 但与困难问题一样,问题始于错误的问题。 问题不在于序列的序列——这实际上是因为递归定义的序列。 我的函数式编程技能现在好一点了,所以更容易看到下面的版本发生了什么,它仍然得到一个 stackoverflow,

let next str = 
    Seq.append str [0]
    |> Seq.pairwise
    |> Seq.scan (fun (n,_) (c,v) ->
            if (c = v) then (n+1,Seq.empty)
            else (1,Seq.ofList [n;c]) ) (1,Seq.empty)
    |> Seq.collect snd

let morris = Seq.unfold(fun sq -> Some(sq,next sq))

它基本上创建了一个非常长的 Seq 处理函数调用链来生成序列。 F# 自带的 Seq 模块是不使用堆栈就无法跟踪链的。 它对追加和递归定义的序列使用了一种优化,但该优化仅在递归实现追加时才有效。

所以这会起作用

let rec ints n = seq { yield n; yield! ints (n+1) }
printf "%A" (ints 0 |> Seq.nth 100000);;

并且这个会得到 stackoverflow。

let rec ints n = seq { yield n; yield! (ints (n+1)|> Seq.map id) }
printf "%A" (ints 0 |> Seq.nth 100000);;

为了证明 F# 库是问题所在,我编写了自己的 Seq 模块,该模块使用连续性实现了追加、成对、扫描和收集,现在我可以毫无问题地开始生成和打印 50,000 个 seq(它永远不会完成,因为它已经结束了) 10^5697 位长)。

一些附加说明:

  • 延续是我正在寻找的习惯用法,但在这种情况下,它们必须进入 F# 库,而不是我的代码。 我从 Tomas Petricek 的真实世界函数式编程一书中了解了 F# 中的延续。
  • 我接受的懒惰列表答案包含了另一个习语; 懒惰评价。 在我重写的库中,我还必须利用惰性类型来避免堆栈溢出。
  • 惰性列表版本有点幸运(可能是设计使然,但这超出了我目前的确定能力)-它在构造和迭代时使用的活动模式匹配导致列表在所需的递归变得太深之前计算值,所以它很懒,但又没有懒到需要延续来避免堆栈溢出。 例如,当第二个序列需要第一个序列中的一个数字时,它已经被计算出来了。 换句话说,LL版本对于序列生成并不是严格的JIT惰性,只是列表管理。

I have this "learning code" I wrote for the morris seq in f# that suffers from stack overflow that I don't know how to avoid. "morris" returns an infinite sequence of "see and say" sequences (i.e., {{1}, {1,1}, {2,1}, {1,2,1,1}, {1,1,1,2,2,1}, {3,1,2,2,1,1},...}).

    let printList l =
        Seq.iter (fun n -> printf "%i" n) l
        printfn ""

    let rec morris s = 
        let next str = seq {
            let cnt = ref 1  // Stack overflow is below when enumerating
            for cur in [|0|] |> Seq.append str |> Seq.windowed 2 do
                if cur.[0] <> cur.[1] then
                    yield!( [!cnt ; cur.[0]] )
                    cnt := 0
                incr cnt
        }
        seq {
        yield s
        yield! morris (next s) // tail recursion, no stack overflow
        }

    // "main"
    // Print the nth iteration
    let _ =  [1] |> morris |> Seq.nth 3125 |> printList 

You can pick off the nth iteration using Seq.nth but you can only get so far before you hit a stack overflow. The one bit of recursion I have is tail recursion and it in essence builds a linked set of enumerators. That's not where the problem is. It's when "enum" is called on the say the 4000th sequence. Note that's with F# 1.9.6.16, the previous version topped out above 14000). It's because the way the linked sequences are resolved. The sequences are lazy and so the "recursion" is lazy. That is, seq n calls seq n-1 which calls seq n-2 and so forth to get the first item (the very first # is the worst case).

I understand that [|0|] |> Seq.append str |> Seq.windowed 2, is making my problem worse and I could triple the # I could generate if I eliminated that. Practically speaking the code works well enough. The 3125th iteration of morris would be over 10^359 characters in length.

The problem I'm really trying to solve is how to retain the lazy eval and have a no limit based on stack size for the iteration I can pick off. I'm looking for the proper F# idiom to make the limit based on memory size.

Update Oct '10

After learning F# a bit better, a tiny bit of Haskell, thinking & investigating this problem for over year, I finally can answer my own question. But as always with difficult problems, the problem starts with it being the wrong question. The problem isn't sequences of sequences - it's really because of a recursively defined sequence. My functional programming skills are a little better now and so it's easier to see what's going on with the version below, which still gets a stackoverflow

let next str = 
    Seq.append str [0]
    |> Seq.pairwise
    |> Seq.scan (fun (n,_) (c,v) ->
            if (c = v) then (n+1,Seq.empty)
            else (1,Seq.ofList [n;c]) ) (1,Seq.empty)
    |> Seq.collect snd

let morris = Seq.unfold(fun sq -> Some(sq,next sq))

That basicially creates a really long chain of Seq processing function calls to generate the sequnces. The Seq module that comes with F# is what can't follow the chain without using the stack. There's an optimization it uses for append and recursively defined sequences, but that optimization only works if the recursion is implementing an append.

So this will work

let rec ints n = seq { yield n; yield! ints (n+1) }
printf "%A" (ints 0 |> Seq.nth 100000);;

And this one will get a stackoverflow.

let rec ints n = seq { yield n; yield! (ints (n+1)|> Seq.map id) }
printf "%A" (ints 0 |> Seq.nth 100000);;

To prove the F# libary was the issue, I wrote my own Seq module that implemented append, pairwise, scan and collect using continutions and now I can begin generating and printing out the 50,000 seq without a problem (it'll never finish since it's over 10^5697 digits long).

Some additional notes:

  • Continuations were the idiom I was looking for, but in this case, they had to go into the F# library, not my code. I learned about continuations in F# from Tomas Petricek's Real-World Functional Programming book.
  • The lazy list answer that I accepted held the other idiom; lazy evaluation. In my rewritten library, I also had to leverage the lazy type to avoid stackoverflow.
  • The lazy list version sorta of works by luck (maybe by design but that's beyond my current ability to determine) - the active-pattern matching it uses while it's constructing and iterating causes the lists to calculate values before the required recursion gets too deep, so it's lazy, but not so lazy it needs continuations to avoid stackoverflow. For example, by the time the 2nd sequence needs a digit from the 1st sequence, it's already been calculated. In other words, the LL version is not strictly JIT lazy for sequence generation, only list management.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

蓦然回首 2024-07-27 23:53:12

您绝对应该查看

http://research.microsoft.com/en-us/um/cambridge/projects/fsharp/manual/FSharp.PowerPack/Microsoft.FSharp.Collections.LazyList.html

但我会尝试稍后发布更全面的答案。

更新

好的,解决方案如下。 它将 Morris 序列表示为 int 的 LazyLists 的 LazyList,因为我认为您希望它在“两个方向”上都是惰性的。

F# LazyList(在 FSharp.PowerPack.dll 中)具有三个有用的属性:

  • 它是惰性的(在第一次需要时才会计算第 n 个元素)
  • 它不会重新计算(在同一元素上重新计算第 n 个元素)对象实例不会重新计算它 - 它会在第一次计算后缓存每个元素)
  • 您可以“忘记”前缀(当您“尾部”进入列表时,不再引用的前缀可用于垃圾收集)

第一个属性是通用的与 seq (IEnumerable),但其他两个是 LazyList 所独有的,对于计算问题(例如本问题中提出的问题)非常有用。

言归正传,代码:

// print a lazy list up to some max depth
let rec PrintList n ll =
    match n with
    | 0 -> printfn ""
    | _ -> match ll with
           | LazyList.Nil -> printfn ""
           | LazyList.Cons(x,xs) ->
               printf "%d" x
               PrintList (n-1) xs

// NextMorris : LazyList<int> -> LazyList<int>
let rec NextMorris (LazyList.Cons(cur,rest)) = 
    let count = ref 1
    let ll = ref rest
    while LazyList.nonempty !ll && (LazyList.hd !ll) = cur do
        ll := LazyList.tl !ll
        incr count
    LazyList.cons !count
        (LazyList.consf cur (fun() ->
            if LazyList.nonempty !ll then
                NextMorris !ll
            else
                LazyList.empty()))

// Morris : LazyList<int> -> LazyList<LazyList<int>>
let Morris s =
    let rec MakeMorris ll =
        LazyList.consf ll (fun () ->
            let next = NextMorris ll
            MakeMorris next
        )
    MakeMorris s

// "main"
// Print the nth iteration, up to a certain depth
[1] |> LazyList.of_list |> Morris |> Seq.nth 3125 |> PrintList 10
[1] |> LazyList.of_list |> Morris |> Seq.nth 3126 |> PrintList 10
[1] |> LazyList.of_list |> Morris |> Seq.nth 100000 |> PrintList 35
[1] |> LazyList.of_list |> Morris |> Seq.nth 100001 |> PrintList 35

UPDATE2

如果你只是想计数,那也没关系:

let LLLength ll =
    let rec Loop ll acc =
        match ll with
        | LazyList.Cons(_,rest) -> Loop rest (acc+1N)
        | _ -> acc
    Loop ll 0N

let Main() =
    // don't do line below, it leaks
    //let hundredth = [1] |> LazyList.of_list |> Morris |> Seq.nth 100
    // if we only want to count length, make sure we throw away the only
    // copy as we traverse it to count
    [1] |> LazyList.of_list |> Morris |> Seq.nth 100
        |> LLLength |> printfn "%A" 
Main()    

内存使用量保持平稳(我的盒子上低于 16M)...尚未完成运行,但我即使在我的慢速盒子上,也可以快速计算第 55 个长度,所以我认为这应该可以正常工作。 另请注意,我使用“bignum”作为长度,因为我认为这会溢出“int”。

You should definitely check out

http://research.microsoft.com/en-us/um/cambridge/projects/fsharp/manual/FSharp.PowerPack/Microsoft.FSharp.Collections.LazyList.html

but I will try to post a more comprehensive answer later.

UPDATE

Ok, a solution is below. It represents the Morris sequence as a LazyList of LazyLists of int, since I presume you want it to be lazy in 'both directions'.

The F# LazyList (in the FSharp.PowerPack.dll) has three useful properties:

  • it is lazy (evaluation of the nth element will not happen until it is first demanded)
  • it does not recompute (re-evaluation of the nth element on the same object instance will not recompute it - it caches each element after it's first computed)
  • you can 'forget' prefixes (as you 'tail' into the list, the no-longer-referenced prefix is available for garbage collection)

The first property is common with seq (IEnumerable), but the other two are unique to LazyList and very useful for computational problems such as the one posed in this question.

Without further ado, the code:

// print a lazy list up to some max depth
let rec PrintList n ll =
    match n with
    | 0 -> printfn ""
    | _ -> match ll with
           | LazyList.Nil -> printfn ""
           | LazyList.Cons(x,xs) ->
               printf "%d" x
               PrintList (n-1) xs

// NextMorris : LazyList<int> -> LazyList<int>
let rec NextMorris (LazyList.Cons(cur,rest)) = 
    let count = ref 1
    let ll = ref rest
    while LazyList.nonempty !ll && (LazyList.hd !ll) = cur do
        ll := LazyList.tl !ll
        incr count
    LazyList.cons !count
        (LazyList.consf cur (fun() ->
            if LazyList.nonempty !ll then
                NextMorris !ll
            else
                LazyList.empty()))

// Morris : LazyList<int> -> LazyList<LazyList<int>>
let Morris s =
    let rec MakeMorris ll =
        LazyList.consf ll (fun () ->
            let next = NextMorris ll
            MakeMorris next
        )
    MakeMorris s

// "main"
// Print the nth iteration, up to a certain depth
[1] |> LazyList.of_list |> Morris |> Seq.nth 3125 |> PrintList 10
[1] |> LazyList.of_list |> Morris |> Seq.nth 3126 |> PrintList 10
[1] |> LazyList.of_list |> Morris |> Seq.nth 100000 |> PrintList 35
[1] |> LazyList.of_list |> Morris |> Seq.nth 100001 |> PrintList 35

UPDATE2

If you just want to count, that's fine too:

let LLLength ll =
    let rec Loop ll acc =
        match ll with
        | LazyList.Cons(_,rest) -> Loop rest (acc+1N)
        | _ -> acc
    Loop ll 0N

let Main() =
    // don't do line below, it leaks
    //let hundredth = [1] |> LazyList.of_list |> Morris |> Seq.nth 100
    // if we only want to count length, make sure we throw away the only
    // copy as we traverse it to count
    [1] |> LazyList.of_list |> Morris |> Seq.nth 100
        |> LLLength |> printfn "%A" 
Main()    

The memory usage stays flat (under 16M on my box)... hasn't finished running yet, but I computed the 55th length fast, even on my slow box, so I think this should work just fine. Note also that I used 'bignum's for the length, since I think this will overflow an 'int'.

长安忆 2024-07-27 23:53:12

我相信这里有两个主要问题:

  • 惰性非常低效,因此您可以预期惰性函数实现的运行速度会慢几个数量级。 例如,此处描述的 Haskell 实现比 F# 慢 2,400 倍我在下面给出。 如果您想要一种解决方法,最好的选择可能是通过将计算集中到急切的批次中来摊销计算,其中批次是按需生成的。

  • Seq.append 函数实际上是从 IEnumerable 调用 C# 代码,因此,它的尾部调用不会被消除,并且会泄漏更多的堆栈空间每次你经历它。 当您枚举序列时,就会出现这种情况。

在计算第 50 个子序列的长度时,以下代码比您的实现快 80 倍以上,但也许它对您来说还不够懒:

let next (xs: ResizeArray<_>) =
  let ys = ResizeArray()
  let add n x =
    if n > 0 then
      ys.Add n
      ys.Add x
  let mutable n = 0
  let mutable x = 0
  for i=0 to xs.Count-1 do
    let x' = xs.[i]
    if x=x' then
      n <- n + 1
    else
      add n x
      n <- 1
      x <- x'
  add n x
  ys

let morris =
  Seq.unfold (fun xs -> Some(xs, next xs)) (ResizeArray [1])

此函数的核心是对可因式分解的 ResizeArray 的折叠如果您使用结构体作为累加器,则可以在功能上使用,而不会降低太多性能。

I believe there are two main problems here:

  • Laziness is very inefficient so you can expect a lazy functional implementation to run orders of magnitude slower. For example, the Haskell implementation described here is 2,400× slower than the F# I give below. If you want a workaround, your best bet is probably to amortize the computations by bunching them together into eager batches where the batches are produced on-demand.

  • The Seq.append function is actually calling into C# code from IEnumerable and, consequently, its tail call doesn't get eliminated and you leak a bit more stack space every time you go through it. This shows up when you come to enumerate over the sequence.

The following is over 80× faster than your implementation at computing the length of the 50th subsequence but perhaps it is not lazy enough for you:

let next (xs: ResizeArray<_>) =
  let ys = ResizeArray()
  let add n x =
    if n > 0 then
      ys.Add n
      ys.Add x
  let mutable n = 0
  let mutable x = 0
  for i=0 to xs.Count-1 do
    let x' = xs.[i]
    if x=x' then
      n <- n + 1
    else
      add n x
      n <- 1
      x <- x'
  add n x
  ys

let morris =
  Seq.unfold (fun xs -> Some(xs, next xs)) (ResizeArray [1])

The core of this function is a fold over a ResizeArray that could be factored out and used functionally without too much performance degradation if you used a struct as the accumulator.

〃温暖了心ぐ 2024-07-27 23:53:12

只需保存您查找的前一个元素即可。

let morris2 data = seq {
    let cnt = ref 0
    let prev = ref (data |> Seq.nth 0)

     for cur in data do
        if cur <> !prev then
            yield! [!cnt; !prev]
            cnt := 1
            prev := cur
        else
            cnt := !cnt + 1

    yield! [!cnt; !prev]
}

let rec morrisSeq2 cur = seq {
    yield cur
    yield! morrisSeq2 (morris2 cur)
}

Just save the previous element that you looked for.

let morris2 data = seq {
    let cnt = ref 0
    let prev = ref (data |> Seq.nth 0)

     for cur in data do
        if cur <> !prev then
            yield! [!cnt; !prev]
            cnt := 1
            prev := cur
        else
            cnt := !cnt + 1

    yield! [!cnt; !prev]
}

let rec morrisSeq2 cur = seq {
    yield cur
    yield! morrisSeq2 (morris2 cur)
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文