没有 OutOfMemory 错误的 Scala 流的功能处理
是否可以将函数式编程应用于 Scala 流,以便按顺序处理流,但可以对流中已处理的部分进行垃圾收集?
例如,我定义一个包含从 start
到 end
的数字的 Stream
:
def fromToStream(start: Int, end: Int) : Stream[Int] = {
if (end < start) Stream.empty
else start #:: fromToStream(start+1, end)
}
如果我以函数式风格对这些值求和:
println(fromToStream(1,10000000).reduceLeft(_+_))
我得到OutOfMemoryError
- 可能是因为调用 reduceLeft
的堆栈帧保存了对流头部的引用。但如果我以迭代方式执行此操作,它会起作用:
var sum = 0
for (i <- fromToStream(1,10000000)) {
sum += i
}
有没有一种方法可以以函数式方式执行此操作而不会出现 OutOfMemory
?
更新:这是 scala 中的一个错误现在已修复。所以现在这或多或少已经过时了。
Is it possible to apply functional programming to Scala streams such that the stream is processed sequentially, but the already processed part of the stream can be garbage collected?
For example, I define a Stream
that contains the numbers from start
to end
:
def fromToStream(start: Int, end: Int) : Stream[Int] = {
if (end < start) Stream.empty
else start #:: fromToStream(start+1, end)
}
If I sum up the values in a functional style:
println(fromToStream(1,10000000).reduceLeft(_+_))
I get an OutOfMemoryError
- perhaps since the stackframe of the call to reduceLeft
holds a reference to the head of the stream. But if I do this in iterative style, it works:
var sum = 0
for (i <- fromToStream(1,10000000)) {
sum += i
}
Is there a way to do this in a functional style without getting an OutOfMemory
?
UPDATE: This was a bug in scala that is fixed now. So this is more or less out of date now.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
当我开始学习
Stream
时,我觉得这很酷。然后我意识到 Iterator 是我几乎一直想使用的东西。如果您确实需要
Stream
但想让reduceLeft
工作:如果您尝试上面的行,它会很好地进行垃圾收集。我发现使用 Stream 很棘手,因为很容易在没有意识到的情况下抓住头。有时标准库会以非常微妙的方式为您保留它。
When I started learning about
Stream
I thought this was cool. Then I realizedIterator
is what I want to use nearly all the time.In case you do need
Stream
but want to makereduceLeft
work:If you try the line above, it will garbage collect just fine. I have found that using Stream is tricky as it's easy to hold on to the head without realizing it. Sometimes the standard lib will hold on to it for you - in very subtle ways.
是的,你可以。技巧是使用尾递归方法,以便本地堆栈帧包含对 Stream 实例的唯一引用。由于该方法是尾递归的,一旦递归调用自身,对前一个 Stream 头的本地引用就会被删除,从而使 GC 能够收集到 Stream 的开头随你走。
另外,您必须确保传递给上面方法
last
的内容在堆栈上只有一个引用。如果将Stream
存储到局部变量或值中,则当您调用last
方法时,它不会被垃圾回收,因为它的参数不是留给的唯一引用。代码>流。下面的代码耗尽了内存。
总结一下:
编辑:
请注意,这也有效,并且不会导致内存不足错误:
EDIT2:
并且在您需要的reduceLeft
的情况下,您必须为结果定义一个带有累加器参数的辅助方法。对于reduceLeft,您需要一个累加器参数,您可以使用默认参数将其设置为某个值。一个简化的例子:
Yes, you can. The trick is to use tail recursive methods, so that the local stack frame contains the only reference to the
Stream
instance. Since the method is tail-recursive, the local reference to the previousStream
head will be erased once it recursively calls itself, thus enabling the GC to collect the start of theStream
as you go.Also, you must ensure that the thing you pass to the method
last
above has only one reference on the stack. If you store aStream
into a local variable or value, it will not be garbage collected when you call thelast
method, since its argument is not the only reference left toStream
. The code below runs out of memory.To summarize:
Stream
EDIT:
Note that this also works and does not result in an out of memory error:
EDIT2:
And in the case ofreduceLeft
that you require, you would have to define a helper method with an accumulator argument for the result.For reduceLeft, you need an accumulator argument, which you can set to a certain value using default arguments. A simplified example:
您可能想查看 Scalaz 的 临时流。
You may want to look at Scalaz's ephemeral streams.
事实证明,这是当前实现中的一个错误的reduceLeft。问题是reduceLeft调用foldLeft,因此reduceLeft的栈帧在整个调用过程中保存了对流头部的引用。 FoldLeft 使用尾递归来避免这个问题。比较:
它们在语义上是等效的。在Scala 2.8.0 版本中,对foldLeft 的调用可以工作,但对reduceLeft 的调用会抛出OutOfMemory。如果reduceLeft 能够完成自己的工作,则不会出现此问题。
As it turns out, this is a bug in the current implementation of reduceLeft. The problem is that reduceLeft calls foldLeft, and thus the stackframe of reduceLeft holds a reference to the head of the stream during the whole call. foldLeft uses tail-recursion to avoid this problem. Compare:
These are semantically equivalent. In Scala version 2.8.0 the call to foldLeft works, but the call to reduceLeft throws an OutOfMemory. If reduceLeft would do its own work, this problem would not occur.