当前位置：文江博客话题详情

导致“错误 C 堆栈溢出”的原因通常在哈斯克尔

发布于 2024-08-25 02:53:18 字数 43 浏览 5 评论 0原文

Hugs Haskell 实现中“错误 C 堆栈溢出”的常见原因是什么？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

羅雙樹 2024-09-01 02:53:19

如果您习惯于通常进行尾递归分解的函数式语言，则可能会出现这种情况。假设你有一个函数：

sum = go 0
    where
    go accum [] = accum
    go accum (x:xs) = go (accum+x) xs

顺便说一句，它与以下内容相同

sum = foldl (+) 0

如果我们评估该函数，我们可以看到问题：

sum [1,2,3,4]
go 0 [1,2,3,4]
go (0+1) [2,3,4]
go ((0+1)+2) [3,4]
go (((0+1)+2)+3) [4]
go ((((0+1)+2)+3)+4) []
(((0+1)+2)+3)+4

现在评估该表达式 Haskell 使用堆栈：

EXPR            | STACK
(((0+1)+2)+3)+4 | 
((0+1)+2)+3     | +4
(0+1)+2         | +3 +4
(0+1)           | +2 +3 +4
1               | +2 +3 +4
3               | +3 +4
6               | +4
10              |

并且 this 是可能发生溢出的地方。如果您计算 sum [1..10^6]，该堆栈将有一百万个条目深。

但请注意这里的病理。您递归列表只是为了构建一个巨大的 fscking 表达式（“thunk”），然后使用堆栈来评估它。我们宁愿在递归时通过严格尾部递归来评估它：

sum = go 0
    where
    go accum [] = accum
    go accum (x:xs) = accum `seq` go (accum+x) xs

这就是说在尝试评估递归调用之前评估accum，所以我们得到（这可能需要一些耐心来理解）：

sum [1,2,3,4]
go 0 [1,2,3,4]
let accum = 0 in accum `seq` go (accum+1) [2,3,4]
go (0+1) [2,3,4]
let accum = 0+1 in accum `seq` go (accum+2) [3,4]
go (1+2) [3,4]
let accum = 1+2 in accum `seq` go (accum+3) [4]
go (3+3) [4]
let accum = 3+3 in accum `seq` go (accum+4) []
go (6+4) []
6+4
10

所以当我们遍历列表中，我们正在计算总和，因此我们不必使用深堆栈来评估结果。这种修改后的尾递归模式封装在 Data.List.foldl' 中，因此：

sum = foldl' (+) 0

一个好的经验法则是永远不要使用foldl，因为它总是会构建巨大的重击。使用foldl' 代替。如果输出类型具有惰性结构（例如列表或函数），请使用foldr。但一般来说，没有避免堆栈溢出的灵丹妙药，您只需要了解程序的求值行为即可。有时这可能很难。

但是，如果我理解正确的话，堆栈溢出总是来自于尝试评估一个巨大的重击。因此，寻找可以创建这些内容的地方，并强制尽早进行评估。

This can come up if you are used to functional languages in which it is common to do tail-recursion factorization. Say you have a function:

sum = go 0
    where
    go accum [] = accum
    go accum (x:xs) = go (accum+x) xs

Which, incidentally, is the same as

sum = foldl (+) 0

If we evaluate the function we can see the problem:

sum [1,2,3,4]
go 0 [1,2,3,4]
go (0+1) [2,3,4]
go ((0+1)+2) [3,4]
go (((0+1)+2)+3) [4]
go ((((0+1)+2)+3)+4) []
(((0+1)+2)+3)+4

Now to evaluate that expression Haskell uses a stack:

EXPR            | STACK
(((0+1)+2)+3)+4 | 
((0+1)+2)+3     | +4
(0+1)+2         | +3 +4
(0+1)           | +2 +3 +4
1               | +2 +3 +4
3               | +3 +4
6               | +4
10              |

And this is where an overflow can occur. If you evaluated sum [1..10^6], that stack would be a million entries deep.

But note the pathology here. You recurse over a list only to build up a huge fscking expression ("thunk"), and then use a stack to evaluate it. We would rather evaluate it as we are recursing, by making the tail recursion strict:

sum = go 0
    where
    go accum [] = accum
    go accum (x:xs) = accum `seq` go (accum+x) xs

That says to evaluate accum before trying to evaluate the recursive call, so we get (this may take a some patience to understand):

sum [1,2,3,4]
go 0 [1,2,3,4]
let accum = 0 in accum `seq` go (accum+1) [2,3,4]
go (0+1) [2,3,4]
let accum = 0+1 in accum `seq` go (accum+2) [3,4]
go (1+2) [3,4]
let accum = 1+2 in accum `seq` go (accum+3) [4]
go (3+3) [4]
let accum = 3+3 in accum `seq` go (accum+4) []
go (6+4) []
6+4
10

So as we are traversing the list, we are computing the sum so we don't have to use a deep stack to evaluate the result. This modified tail recursion pattern is encapsulated in Data.List.foldl', so:

sum = foldl' (+) 0

A good rule of thumb is to never use foldl, because it always builds up giant thunks. Use foldl' instead. If the output type has lazy structure (eg. a list or a function), use foldr. But there is no silver bullet for avoiding a stack overflow in general, you just have to understand the evaluation behavior of your program. This can be hard sometimes.

But, if I understand correctly, a stack overflow always comes from trying to evaluate a gigantic thunk, though. So look for places where those could be created, and force evaluation to happen earlier.

回复收藏 0 原文