Haskell 中的惰性和尾递归,为什么会崩溃?
我有一个相当简单的函数来计算一个大列表的元素的平均值,使用两个累加器来保存到目前为止的总和和到目前为止的计数:
mean = go 0 0
where
go s l [] = s / fromIntegral l
go s l (x:xs) = go (s+x) (l+1) xs
main = do
putStrLn (show (mean [0..10000000]))
现在,在严格的语言中,这将是尾递归,并且会有没问题。然而,由于 Haskell 很懒,我的谷歌搜索让我明白 (s+x) 和 (l+1) 将作为 thunk 传递到递归中。所以这整个事情崩溃了:
Stack space overflow: current size 8388608 bytes.
经过进一步的谷歌搜索,我找到了 seq
和 $!
。我似乎不明白,因为我在这种情况下使用它们的所有尝试都被证明是徒劳的,错误消息说明了有关无限类型的内容。
最后我找到了 -XBangPatterns
,它通过更改递归调用解决了所有问题:
go !s !l (x:xs) = go (s+x) (l+1) xs
但我对此不满意,因为 -XBangPatterns
目前是一个扩展。我想知道如何在不使用 -XBangPatterns
的情况下使评估严格。 (也许也能学到一些东西!)
只是为了让你理解我缺乏理解,这就是我尝试过的(唯一编译的尝试,即):
go s l (x:xs) = go (seq s (s+x)) (seq l (l+1)) xs
根据我的理解, seq 应该在这里强制对 s 和 l 参数进行评估,从而避免了thunk带来的问题。但我仍然遇到堆栈溢出。
I have this fairly simple function to compute the mean of elements of a big list, using two accumulators to hold the sum so far and the count so far:
mean = go 0 0
where
go s l [] = s / fromIntegral l
go s l (x:xs) = go (s+x) (l+1) xs
main = do
putStrLn (show (mean [0..10000000]))
Now, in a strict language, this would be tail-recursive, and there would be no problem. However, as Haskell is lazy, my googling has led me to understand that (s+x) and (l+1) will be passed down the recursion as thunks. So this whole thing crashes and burns:
Stack space overflow: current size 8388608 bytes.
After further googling, I found seq
and $!
. Which it seems I don't understand because all my attempts at using them in this context proved futile, with error messages saying something about infinite types.
Finally I found -XBangPatterns
, which solves it all by changing the recursive call:
go !s !l (x:xs) = go (s+x) (l+1) xs
But I'm not happy with this, as -XBangPatterns
is currently an extension. I would like to know how to make the evaluation strict without the use of -XBangPatterns
. (And maybe learn something too!)
Just so you understand my lack of understanding, here's what I tried (the only try that compiled, that is):
go s l (x:xs) = go (seq s (s+x)) (seq l (l+1)) xs
From what I could understand, seq should here force the evaluation of the s and l argument, thus avoiding the problem caused by thunks. But I still get a stack overflow.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
一旦调用该函数,
seq
函数就会强制计算第一个参数。当您将seq s (s+x)
作为参数传递时,seq
函数不会立即被调用,因为不需要计算该参数的值。您希望在递归调用之前评估对 seq 的调用,以便反过来可以强制评估其参数。通常这是通过链接完成的:
这是 seq s (seq l (go (s+x) (l+1) xs)) 的语法变体。这里对
seq
的调用是表达式中最外层的函数调用。由于 Haskell 的惰性,这导致它们首先被求值:使用尚未求值的参数s
和seq l (go (s+x) (l +1) xs)
,评估参数被推迟到有人实际尝试访问其值的时候。现在,
seq
可以强制在返回表达式的其余部分之前计算其第一个参数。那么评估的下一步将是第二个seq
。如果对seq
的调用被隐藏在某个参数中的某个位置,它们可能很长时间都不会被执行,从而达不到它们的目的。随着
seq
位置的改变,程序可以正常执行,而不会使用过多的内存。该问题的另一个解决方案是在编译程序时在 GHC 中启用优化(
-O
或-O2
)。优化器认识到可有可无的惰性,并生成不分配不必要内存的代码。The
seq
function forces evaluation of the first parameter once the function is called. When you passseq s (s+x)
as a parameter theseq
function is not called immediately, because there is no need to evaluate the value of that parameter. You want the call toseq
to be evaluated before the recursive call, so that that in turn can force its parameter to be evaluated.Usually this is done link this:
This is a syntactic variation of
seq s (seq l (go (s+x) (l+1) xs))
. Here the calls toseq
are the outermost function calls in the expression. Because of Haskell's laziness this causes them to be evaluated first:seq
is called with the still unevaluated parameterss
andseq l (go (s+x) (l+1) xs)
, evaluating the parameters is deferred to the point where somebody actually tries to access their values.Now
seq
can force its first parameter to be evaluated before returning the rest of the expression. Then the next step in the evaluation would be the secondseq
. If the calls toseq
are buried somewhere in some parameter they might not be executed for a long time, defeating their purpose.With the changed positions of the
seq
s the program executes fine, without using excessive amounts of memory.Another solution to the problem would be to simply enable optimizations in GHC when the program is compiled (
-O
or-O2
). The optimizer recognizes the dispensable laziness and produces code that doesn't allocate unnecessary memory.我对此写了大量文章:
首先,是的,如果你想要求对累加器进行严格评估,请使用
seq
并留在 Haskell 98 中:其次:如果您提供一些类型注释,并使用 -O2 进行编译,则严格性分析将会启动:
因为“Double”是严格原子类型 Double# 的包装器,并且具有优化和精确类型,所以 GHC 运行严格性分析并推断:严格版本就可以了。
如上面 RWH 章节所述。
I've written extensively on this:
Firstly, yes, if you want to require strict evaluation of the accumulators use
seq
and stay in Haskell 98:Secondly: strictness analysis will kick in if you give some type annotations, and compile with -O2:
Because 'Double' is a wrapper over the strict atomic type Double#, with optimizations on, and a precise type, GHC runs strictness analysis and infers that the strict version will be ok.
As described in the RWH chapter above.
您的理解是正确的,
seq s (s+x)
强制对s
求值。但它不会强制s+x
,因此您仍在构建 thunk。通过使用
$!
您可以强制计算加法(对于两个参数,两次)。这实现了与使用 bang 模式相同的效果:使用
$!
函数将翻译go $! (s+x)
相当于:因此,
y
首先被强制为弱头范式,这意味着应用最外层函数。在y
的情况下,最外层的函数是+
,因此y
在传递给go< 之前被完全评估为数字/代码>。
哦,您可能会收到无限类型错误消息,因为您没有在正确的位置添加括号。当我第一次写下你的程序时,我遇到了同样的错误:-)
因为
$!
运算符是右结合的,没有括号go $! (s+x)$! (l+1)
的含义与:go $! ((s+x) $!(l+1))
,这显然是错误的。You are right in your understanding that
seq s (s+x)
forces the evaluation ofs
. But it doesn't forces+x
, therefore you're still building up thunks.By using
$!
you can force the evaluation of the addition (two times, for both arguments). This achieves the same effect as using the bang patterns:The use of the
$!
function will translate thego $! (s+x)
to the equivalent of:Thus
y
is first forced into weak head normal form, which means that the outermost function is applied. In the case ofy
, the outermost function is+
, thusy
is fully evaluated to a number before being passed togo
.Oh, and you probably got the infinite type error message because you didn't have the parenthesis in the right place. I got the same error when I first wrote your program down :-)
Because the
$!
operator is right associative, without parenthesisgo $! (s+x) $! (l+1)
means the same as:go $! ((s+x) $! (l+1))
, which is obviously wrong.