关于 Clojure 中堆和垃圾的初学者问题
我有一个关于 Clojure 的问题: 我试图通过 Project Euler 来学习该语言,但我不明白什么是幕后发生的事情:以下代码旨在使用返回 lim
之前的所有素数的列表。我认为堆空间中的复杂度应该是 O(n),因为我列出了 lim
之前的所有数字,然后将它们一一过滤掉,同时将第一个移动到新的列表。 (我知道我实际上每次都会重复创建新列表,但我不认为它们会占用更多内存?)无论如何,我正在运行这个,
(defn getAllPrimes [lim]
(defn getPrimes [primes numlist]
(if (not-empty numlist) ;base case;
(recur (cons (first numlist) primes) ;put the prime on to the prime list
(filter
(fn [x] (not (div? x (first numlist)))) ;remove the prime and all its multiples from the numlist
(rest numlist)))
primes)); return the primes
(getPrimes () (range 2 lim))) ;call the recursive function with and empty prime list to be filled up and a full numlist to be emptied
并且当我调用时,我一直用尽堆空间,
(apply + (getAllPrimes 2000000))
但我不运行空间不足
(apply + (filter even? (range 2000000)))
所以我认为我一定不明白列表是如何在调用 recur 时进行垃圾收集的,并且实际上正在使用 O(n*n) 堆或其他东西。
I have a question about Clojure:
I am trying to learn the language by going through Project Euler and I don't understand what is going on under the hood: The following code is designed to use return a list of all prime numbers up to lim
. I would think that it should be O(n) in heap-space because I make a list of all the numbers up to lim
, and then filter them away one by one while moving the first to a new list. (I know that I am actually making new lists each recur but I didn't think they would take more memory?) Anyway I am running this with
(defn getAllPrimes [lim]
(defn getPrimes [primes numlist]
(if (not-empty numlist) ;base case;
(recur (cons (first numlist) primes) ;put the prime on to the prime list
(filter
(fn [x] (not (div? x (first numlist)))) ;remove the prime and all its multiples from the numlist
(rest numlist)))
primes)); return the primes
(getPrimes () (range 2 lim))) ;call the recursive function with and empty prime list to be filled up and a full numlist to be emptied
And I keep running out of heap space, when I call
(apply + (getAllPrimes 2000000))
, but I don't run out of space on
(apply + (filter even? (range 2000000)))
So I think that I must not understand how lists are garbage collected in calls to recur and am actually using O(n*n) heap or something.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为问题在于,每次重复时,您都会创建一个引用最后一个的新惰性序列,因此经过几次迭代后,您将持有一个 seq,该 seq 持有一个 seq 的头部,该序列持有一个 seq 的头部持有 seq 的头部。 ...所有中间序列都填满了你的堆。
尽管编写素数筛是一项值得练习的练习,但如果您想得到答案,Clojure 确实在其标准库中包含了素数序列:clojure.contrib.lazy-seqs/primes。这一特定欧拉问题的标准解决方案是单行代码。
作为一个风格点,内部定义并不是一个好主意。实际效果与 defn 位于顶层相同,但如果我没记错的话,每次调用 getAllPrimes 时都会重新分配 var,并且强烈建议不要在运行时重新定义 var。由于代码只是定义一个 var,因此 getPrimes 仍然与 getAllPrimes 一样可见。在这种情况下,getPrimes 可以很容易地重写为循环/递归,没有内部函数、匿名或命名函数。这对解决惰性序列链问题没有帮助,但它确实使代码看起来更标准:
我也会避免使用驼峰命名法。此函数的 Clojure 标准名称为 get-all-primes。
不过,回到实际问题,要让代码正常工作,您可以做的最少的工作就是在每次迭代中强制执行每个 seq,即,将过滤器调用包装在 doall 中。我尝试了这个,虽然它仍然运行缓慢,但它至少在没有耗尽堆的情况下运行完成:
I believe the problem is that with each recur, you're creating a new lazy sequence referring to the last one, so after a few iterations you're holding a seq that holds the head of a seq that holds the head of a seq that holds the head of a seq. ... All the intermediate seqs are filling up your heap.
Though writing a prime sieve is a worthwhile exercise, if you want to get to the answer, Clojure does include the sequence of prime numbers in its standard library: clojure.contrib.lazy-seqs/primes. The standard solution to this paricular Euler problem is a one-liner.
As a style point, an inner defn is not a good idea. The practical effect is the same as if the defn were at the top level, but if I'm not mistaken, the var gets reassigned every time getAllPrimes is called, and redefining vars at runtime is very strongly discouraged. Since the code is just defining a var, getPrimes is still just as visible as getAllPrimes. In this case, getPrimes could easily be rewritten as a loop/recur with no inner function, anonymous or named. That doesn't help your chain-of-lazy-seqs problem, but it does make the code a little more standard-looking:
I would also avoid the use of camelCase. The Clojure standard name for this function would be get-all-primes.
Getting back to the practical problem, though, the least work you could do to get your code working would be to force each seq on each iteration, i.e., wrap your filter call in a doall. I tried this, and while it still runs slowly, it at least does run to completion without running out of heap: