为什么这个 scala 素数生成如此慢/内存密集?

发布于 2024-11-26 01:38:11 字数 729 浏览 1 评论 0 原文

我在查找第 10,001 个素数时内存不足。

object Euler0007 {
  def from(n: Int): Stream[Int] = n #:: from(n + 1)
  def sieve(s: Stream[Int]): Stream[Int] = s.head #:: sieve(s.filter(_ % s.head != 0))
  def primes = sieve(from(2))
  def main(args: Array[String]): Unit = {
    println(primes(10001))
  }
}

这是因为在每次“迭代”(在这种情况下这是正确的术语吗?)primes之后,我将要调用的函数堆栈增加一以获取下一个元素?

我在网上找到的一种不采用迭代解决方案(我想避免进入函数式编程/惯用的 scala)的解决方案是 this (问题 7):

lazy val ps: Stream[Int] = 2 #:: Stream.from(3).filter(i => ps.takeWhile(j => j * j <= i).forall(i % _ > 0))

据我所知,这不会导致这种类似递归的方式。这是一个好方法吗,或者您知道更好的方法吗?

I run out of memory while finding the 10,001th prime number.

object Euler0007 {
  def from(n: Int): Stream[Int] = n #:: from(n + 1)
  def sieve(s: Stream[Int]): Stream[Int] = s.head #:: sieve(s.filter(_ % s.head != 0))
  def primes = sieve(from(2))
  def main(args: Array[String]): Unit = {
    println(primes(10001))
  }
}

Is this because after each "iteration" (is this the correct term in this context?) of primes, I increase the stack of functions to be called to get the next element by one?

One solution that I've found on the web which doesn't resort to an iterative solution (which I'd like to avoid to get into functional programming/idiomatic scala) is this (Problem 7):

lazy val ps: Stream[Int] = 2 #:: Stream.from(3).filter(i => ps.takeWhile(j => j * j <= i).forall(i % _ > 0))

From what I can see, this does not lead to this recursion-like way. Is this a good way to do it, or do you know of a better way?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

听闻余生 2024-12-03 01:38:11

其缓慢的原因之一是它不是埃拉托斯特尼筛法。阅读 http://www.cs.hmc.edu/~oneill/ papers/Sieve-JFP.pdf 获取详细解释(示例是 Haskell 语言,但可以直接翻译成 Scala)。

我对欧拉问题#7的旧解决方案也不是“真正的”筛子,但它似乎对于小数字来说足够好:

object Sieve {

    val primes = 2 #:: sieve(3)

    def sieve(n: Int) : Stream[Int] =
          if (primes.takeWhile(p => p*p <= n).exists(n % _ == 0)) sieve(n + 2)
          else n #:: sieve(n + 2)

    def main(args: Array[String]) {
      println(primes(10000)) //note that indexes are zero-based
    }
}

我认为你的第一个版本的问题是你只有 def s 并且没有 val 收集结果并可以由生成函数查阅,因此您总是从头开始重新计算。

One reason why this is slow is that it isn't the sieve of Eratosthenes. Read http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf for a detailled explanation (the examples are in Haskell, but can be translated directly into Scala).

My old solution for Euler problem #7 wasn't the "true" sieve either, but it seems to work good enough for little numbers:

object Sieve {

    val primes = 2 #:: sieve(3)

    def sieve(n: Int) : Stream[Int] =
          if (primes.takeWhile(p => p*p <= n).exists(n % _ == 0)) sieve(n + 2)
          else n #:: sieve(n + 2)

    def main(args: Array[String]) {
      println(primes(10000)) //note that indexes are zero-based
    }
}

I think the problem with your first version is that you have only defs and no val which collects the results and can be consulted by the generating function, so you always recalculate from scratch.

无妨# 2024-12-03 01:38:11

是的,因为您“在每次“迭代”之后增加一个要调用的函数堆栈以获取下一个元素” - 即每次获得每个质数后在过滤器堆栈顶部添加一个新过滤器。 过滤器太多了

这意味着每个生成的素数都经过其前面的所有素数的测试 - 但只有那些低于其平方根的素数才是真正需要的。例如,要获取第 10001 个素数 104743,在运行时将创建 10000 个过滤器。但323104743的平方根)下面只有 66 个素数,因此实际上只需要 66 个滤波器。所有其他 9934 个都将毫无必要地在那里,占用内存,努力工作,绝对不会产生任何附加值。

这是“功能筛”的关键缺陷,它似乎起源于 1970 年代 David Turner 的代码,后来又进入SICP 书和其他地方。 不是它是一个试除筛(而不是埃拉托色尼筛)。这是一个太遥远的担忧。当优化实现时,试除法完全能够非常快速地产生第 10000 个素数。

该代码的关键缺陷是它没有将过滤器的创建推迟到正确的时刻,并且最终创建了太多过滤器。

现在谈论复杂性,“旧筛子”代码是< em>O(n2),产生 n 个素数。最优试除法为O(n1.5/log0.5(n)),埃拉托斯特尼筛法为O(n *log(n)*log(log(n)))。作为经验增长顺序,第一个通常被视为~ n^2,第二个为 ~ n^1.45,第三个 ~ n^1.2

您可以找到基于 Python 生成器的代码,以实现最佳试除法 在这个答案中(第二部分)。这是最初在这里讨论了处理 sieve 函数的 Haskell 等价物。


作为一个例子,旧筛子的 “可读伪代码” :) 是

primes = sieve [2..]  where
   sieve (x:xs) = x : sieve [ y | y <- xs, rem y x > 0 ]
                         -- list of 'y's, drawn from 'xs',
                         --         such that (y % x > 0)

并且对于最佳试除 (TD) 筛,在素数平方上同步,

primes = sieve [2..] primes   where
  sieve (x:xs) ps = x : (h ++ sieve [ y | y <- t, rem y p > 0 ] qs)
          where
            (p:qs) = ps     -- 'p' is head elt in 'ps', and 'qs' the rest
            (h,t)  = span (< p*p) xs    -- 'h' are elts below p^2 in 'xs'
                                        -- and 't' are the rest

并且对于 埃拉托斯特尼筛由 Richard Bird 设计,如另一个答案

primes = 2 : minus [3..] 
               (foldr (\p r-> p*p : union [p*p+p, p*p+2*p..] r) [] primes)
                      -- function of 'p' and 'r', that returns 
                      -- a list with p^2 as its head elt, ...

Short 中提到的 JFP 文章中所示而且快。 (减去 a b 是一个列表 a< /code> 中的 b 的所有 elt 逐渐从中删除; rel="nofollow noreferrer">union a b 是一个列表 a,其中 b 的所有 elt 逐渐添加到其中,且不重复;都处理有序的、非递减的列表)。 foldr 是列表的右折叠 。因为它是线性的,所以它在 ~ n^1.33 运行,要使其在 ~ n^1.2 运行 树状折叠函数可以使用foldi)。


第二个问题的答案也是。您的第二个代码用相同的“伪代码”重写,

ps = 2 : [i | i <- [3..], all ((> 0).rem i) (takeWhile ((<= i).(^2)) ps)]

与上面的最佳 TD 筛非常相似 - 两者都安排每个候选者通过低于其平方根的所有素数进行测试。虽然筛子通过延迟过滤器的运行时序列来安排它,但后一个定义为每个候选者重新重新获取所需的素数。根据编译器的不同,一种可能比另一种更快,但两者本质上是相同的。

第三个也是:埃拉托斯特尼筛法更好,

ps = 2 : 3 : minus [5,7..] (unionAll [[p*p, p*p+2*p..] | p <- drop 1 ps])

unionAll = foldi union' []          -- one possible implementation
union' (x:xs) ys = x : union xs ys
   -- unconditionally produce first elt of the 1st arg 
   -- to avoid run-away access to infinite lists

从其他代码片段的相似性来看,它看起来也可以在Scala中实现。 (虽然我不懂Scala)。 unionAll 这里实现了树状折叠结构(点击查看图片和完整代码),但也可以使用滑动数组、工作段来实现沿着素数倍数流分段。

TL;DR:是的,是的,是的。

Yes, it is because you "increase the stack of functions to be called to get the next element, by one after each "iteration" " - i.e. add a new filter on top of stack of filters each time after getting each prime. That's way too many filters.

This means that each produced prime gets tested by all its preceding primes - but only those below its square root are really needed. For instance, to get the 10001-th prime, 104743, there will be 10000 filters created, at run-time. But there are just 66 primes below 323, the square root of 104743, so only 66 filters were really needed. All the 9934 others will be there needlessly, taking up memory, hard at work producing absolutely no added value.

This is the key deficiency of that "functional sieve", which seems to have originated in the 1970s code by David Turner, and later have found its way into the SICP book and other places. It is not that it's a trial division sieve (rather than the sieve of Eratosthenes). That's far too remote a concern for it. Trial division, when optimally implemented, is perfectly capable of producing the 10000th prime very fast.

The key deficiency of that code is that it does not postpone the creation of filters to the right moment, and ends up creating far too many of them.

Talking complexities now, the "old sieve" code is O(n2), in n primes produced. The optimal trial division is O(n1.5/log0.5(n)), and the sieve of Eratosthenes is O(n*log(n)*log(log(n))). As empirical orders of growth the first is seen typically as ~ n^2, the second as ~ n^1.45 and the third ~ n^1.2.

You can find Python generators-based code for optimal trial division implemented in this answer (2nd half of it). It was originally discussed here dealing with the Haskell equivalent of your sieve function.


Just as an illustration, a "readable pseudocode" :) for the old sieve is

primes = sieve [2..]  where
   sieve (x:xs) = x : sieve [ y | y <- xs, rem y x > 0 ]
                         -- list of 'y's, drawn from 'xs',
                         --         such that (y % x > 0)

and for optimal trial division (TD) sieve, synchronized on primes' squares,

primes = sieve [2..] primes   where
  sieve (x:xs) ps = x : (h ++ sieve [ y | y <- t, rem y p > 0 ] qs)
          where
            (p:qs) = ps     -- 'p' is head elt in 'ps', and 'qs' the rest
            (h,t)  = span (< p*p) xs    -- 'h' are elts below p^2 in 'xs'
                                        -- and 't' are the rest

and for a sieve of Eratosthenes, devised by Richard Bird, as seen in that JFP article mentioned in another answer here,

primes = 2 : minus [3..] 
               (foldr (\p r-> p*p : union [p*p+p, p*p+2*p..] r) [] primes)
                      -- function of 'p' and 'r', that returns 
                      -- a list with p^2 as its head elt, ...

Short and fast. (minus a b is a list a with all the elts of b progressively removed from it; union a b is a list a with all the elts of b progressively added to it without duplicates; both dealing with ordered, non-decreasing lists). foldr is the right fold of a list. Because it is linear this runs at ~ n^1.33, to make it run at ~ n^1.2 the tree-like folding function foldi can be used).


The answer to your second question is also a yes. Your second code, re-written in same "pseudocode",

ps = 2 : [i | i <- [3..], all ((> 0).rem i) (takeWhile ((<= i).(^2)) ps)]

is very similar to the optimal TD sieve above - both arrange for each candidate to be tested by all primes below its square root. While the sieve arranges that with a run-time sequence of postponed filters, the latter definition re-fetches the needed primes anew for each candidate. One might be faster than another depending on a compiler, but both are essentially the same.

And the third is also a yes: the sieve of Eratosthenes is better,

ps = 2 : 3 : minus [5,7..] (unionAll [[p*p, p*p+2*p..] | p <- drop 1 ps])

unionAll = foldi union' []          -- one possible implementation
union' (x:xs) ys = x : union xs ys
   -- unconditionally produce first elt of the 1st arg 
   -- to avoid run-away access to infinite lists

It looks like it can be implemented in Scala too, judging by the similarity of other code snippets. (Though I don't know Scala). unionAll here implements tree-like folding structure (click for a picture and full code) but could also be implemented with a sliding array, working segment by segment along the streams of primes' multiples.

TL;DR: yes, yes, and yes.

知足的幸福 2024-12-03 01:38:11

FWIW,这是一个真正的埃拉托斯特尼筛:

def sieve(n: Int) = (2 to math.sqrt(n).toInt).foldLeft((2 to n).toSet) { (ps, x) => 
    if (ps(x)) ps -- (x * x to n by x) 
    else ps
}

这是一个无限的素数流,使用了埃拉托斯特尼筛的变体,保留了其基本属性:

case class Cross(next: Int, incr: Int)

def adjustCrosses(crosses: List[Cross], current: Int) = {
  crosses map {
    case cross @ Cross(`current`, incr) => cross copy (next = current + incr)
    case unchangedCross                 => unchangedCross
  }
}

def notPrime(crosses: List[Cross], current: Int) = crosses exists (_.next == current)

def sieve(s: Stream[Int], crosses: List[Cross]): Stream[Int] = {
    val current #:: rest = s

    if (notPrime(crosses, current)) sieve(rest, adjustCrosses(crosses, current))
    else current #:: sieve(rest, Cross(current * current, current) :: crosses)
}

def primes = sieve(Stream from 2, Nil)

但是,这有点难以使用,因为流的每个元素> 是使用 crosses 列表组成的,该列表具有与素数一样多的数字,并且似乎由于某种原因,这些列表被保存在内存中Stream 中的每个数字。

例如,在注释的提示下,primes take 6000 contains 56993 会抛出 GC 异常,而 primes drop 5000 take 1000 contains 56993 在我的测试中会相当快地返回结果。

FWIW, here's a real Sieve of Eratosthenes:

def sieve(n: Int) = (2 to math.sqrt(n).toInt).foldLeft((2 to n).toSet) { (ps, x) => 
    if (ps(x)) ps -- (x * x to n by x) 
    else ps
}

Here's an infinite stream of primes using a variation on the Sieve of Eratosthenes that preserves its fundamental properties:

case class Cross(next: Int, incr: Int)

def adjustCrosses(crosses: List[Cross], current: Int) = {
  crosses map {
    case cross @ Cross(`current`, incr) => cross copy (next = current + incr)
    case unchangedCross                 => unchangedCross
  }
}

def notPrime(crosses: List[Cross], current: Int) = crosses exists (_.next == current)

def sieve(s: Stream[Int], crosses: List[Cross]): Stream[Int] = {
    val current #:: rest = s

    if (notPrime(crosses, current)) sieve(rest, adjustCrosses(crosses, current))
    else current #:: sieve(rest, Cross(current * current, current) :: crosses)
}

def primes = sieve(Stream from 2, Nil)

This is somewhat difficult to use, however, since each element of the Stream is composed using the crosses list, which has as many numbers as there have been primes up to a number, and it seems that, for some reason, these lists are being kept in memory for each number in the Stream.

For example, prompted by a comment, primes take 6000 contains 56993 would throw a GC exception whereas primes drop 5000 take 1000 contains 56993 would return a result rather fast on my tests.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文