Haskell 空间溢出

发布于 2024-12-01 05:06:08 字数 698 浏览 4 评论 0原文

我已经编译了这个程序并正在尝试运行它。

import Data.List
import Data.Ord
import qualified Data.MemoCombinators as Memo

collatzLength :: Int -> Int
collatzLength = Memo.arrayRange (1, 1000000) collatzLength'
  where
    collatzLength' 1 = 1
    collatzLength' n | odd n  = 1 + collatzLength (3 * n + 1)
                     | even n = 1 + collatzLength (n `quot` 2)

main = print $ maximumBy (comparing fst) $ [(collatzLength n, n) | n <- [1..1000000]]

我从 GHC 得到以下信息,

Stack space overflow: current size 8388608 bytes.
Use `+RTS -Ksize -RTS' to increase it.

我认为这是我听说过的“空间溢出”事件之一。 (我对 Haskell 还很陌生。)我该如何修复它?我是否必须重写 collat​​zLength 才能进行尾递归?

I've compiled this program and am trying to run it.

import Data.List
import Data.Ord
import qualified Data.MemoCombinators as Memo

collatzLength :: Int -> Int
collatzLength = Memo.arrayRange (1, 1000000) collatzLength'
  where
    collatzLength' 1 = 1
    collatzLength' n | odd n  = 1 + collatzLength (3 * n + 1)
                     | even n = 1 + collatzLength (n `quot` 2)

main = print $ maximumBy (comparing fst) $ [(collatzLength n, n) | n <- [1..1000000]]

I'm getting the following from GHC

Stack space overflow: current size 8388608 bytes.
Use `+RTS -Ksize -RTS' to increase it.

I assume this is one of the "space overflow" things I've been hearing about. (I'm pretty new to Haskell.) How do I fix it? Do I have to rewrite collatzLength to be tail recursive?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

じее 2024-12-08 05:06:08

作为相关代码的作者,我现在有点尴尬,因为它没有一个而是两个可能的堆栈溢出错误。

  1. 它使用Int。在 32 位系统上,这会溢出,因为 Collat​​z 序列可能比起始数字高很多。当函数在负值和正值之间来回跳转时,这种溢出可能会导致无限递归。

    对于 1 到 100 万之间的数字,最差的起始点是 704511,在回落到 1 之前最高可达 56,991,483,520。这远远超出了 32 位范围。

  2. 它使用maximumBy。该函数并不严格,因此在长列表上使用时会导致堆栈溢出。对于默认堆栈大小来说,一百万个元素足以发生这种情况。不过,由于 GHC 执行了严格性分析,它仍然可以在启用优化的情况下工作。

    解决方案是使用严格版本。由于标准库中没有可用的,因此我们可以自己使用严格的左折叠。

这是一个更新版本,即使没有优化,它也应该(希望)没有堆栈溢出。

import Data.List
import Data.Ord
import qualified Data.MemoCombinators as Memo

collatzLength :: Integer -> Integer
collatzLength = Memo.arrayRange (1,1000000) collatzLength'
  where
    collatzLength' 1 = 1
    collatzLength' n | odd n  = 1 + collatzLength (3 * n + 1)
                     | even n = 1 + collatzLength (n `quot` 2)

main = print $ foldl1' max $ [(collatzLength n, n) | n <- [1..1000000]]

As the author of the code in question, I am now slightly embarrassed because it has not one but two possible stack overflow bugs.

  1. It uses Int. On a 32-bit system this will overflow, as the Collatz sequence can go quite a bit higher than the starting number. This overflow can cause infinite recursion as the function jumps back and forth between negative and positive values.

    In the case of numbers between one and a million, the worst starting point is 704511, which goes as high as 56,991,483,520 before coming back down towards 1. This is well outside the 32-bit range.

  2. It uses maximumBy. This function is not strict, so it will cause a stack overflow when used on long lists. One million elements is more than enough for this to happen with the default stack size. It still works with optimizations enabled, though, due to the strictness analysis performed by GHC.

    The solution is to use a strict version. Since none is available in the standard libraries, we can use the strict left fold ourselves.

Here is an updated version which should (hopefully) be stack overflow-free, even without optimizations.

import Data.List
import Data.Ord
import qualified Data.MemoCombinators as Memo

collatzLength :: Integer -> Integer
collatzLength = Memo.arrayRange (1,1000000) collatzLength'
  where
    collatzLength' 1 = 1
    collatzLength' n | odd n  = 1 + collatzLength (3 * n + 1)
                     | even n = 1 + collatzLength (n `quot` 2)

main = print $ foldl1' max $ [(collatzLength n, n) | n <- [1..1000000]]
π浅易 2024-12-08 05:06:08

这是一个较短的程序,但也以同样的方式失败:

main = print (maximum [0..1000000])

是的。

$ ghc --make harmless.hs && ./harmless
[1 of 1] Compiling Main             ( harmless.hs, harmless.o )
Linking harmless ...
Stack space overflow: current size 8388608 bytes.
Use `+RTS -Ksize -RTS' to increase it.

使用 -O2 就可以了。我对此有何看法?我不知道:(这些太空之谜是一个严重的问题。

编辑:

感谢哈马尔指出了罪魁祸首。

更改您的程序以使用

maximum' = foldl1' max

使其无需-O2Preludemaximum 的实现是惰性的,因此对于没有编译器魔尘的长列表来说不太有效。

Here's a shorter program that fails in the same way:

main = print (maximum [0..1000000])

Yep.

$ ghc --make harmless.hs && ./harmless
[1 of 1] Compiling Main             ( harmless.hs, harmless.o )
Linking harmless ...
Stack space overflow: current size 8388608 bytes.
Use `+RTS -Ksize -RTS' to increase it.

With -O2 it works. What do I make of it? I don't know :( These space mysteries are a serious gotcha.

Edit:

Thx to hammar for pointing out the culprit.

Changing your program to use

maximum' = foldl1' max

Makes it work without -O2. The implementation of Prelude's maximum is lazy and so doesn't quite work for long lists without compiler magic dust.

黑色毁心梦 2024-12-08 05:06:08

我认为您很可能会遇到一些 Collat​​z 序列的整数溢出,然后最终陷入一个包含溢出但从未达到 1 的“人为”循环。这将产生无限递归。

请记住,某些 Collat​​z 序列在最终(?)以 1 结束之前会比起始数字大得多。

尝试看看使用 Integer 而不是 Int 是否可以解决您的问题代码>.

I think it's most likely that you're hitting integer overflow with some of the Collatz sequences, and then ending up in an "artificial" cycle that contains overflows but never hits 1. That would produce an infinite recursion.

Remember that some Collatz sequences get very much larger than their starting number before they finally (?) end up at 1.

Try to see if it fixes your problem to use Integer instead of Int.

哽咽笑 2024-12-08 05:06:08

每当您担心性能时,请使用优化器(通过 -O2 标志)。 GHC 的优化不仅对于运行时间而且对于堆栈使用都非常重要。我已经用 GHC 7.2 对此进行了测试,优化可以解决您的问题。

编辑:此外,如果您使用的是 32 位计算机,请务必使用 Int64Word64。否则,您将溢出 32 位 int 的大小并导致非终止(感谢 Henning,支持他的答案)。

Use the optimizer (via the -O2 flag) any time you are concerned about performance. GHC's optimizations are hugely important not just to run time but to stack use. I've tested this with GHC 7.2 and optimization takes care of your issue.

EDIT: In addtion, if you're on a 32 bit machine be sure to use Int64 or Word64. You'll overflow the size of a 32 bit int and cause non-termination otherwise (thanks to Henning for this, upvote his answer).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文