我应该如何避免无意中捕获函数文字中的局部范围?

发布于 2024-09-27 13:08:13 字数 1273 浏览 5 评论 0原文

我将用 Scala 示例来询问这个问题,但这很可能会影响其他允许混合命令式和函数式样式的语言。

这是一个简短的示例(更新,见下文):

def method: Iterator[Int] {
    // construct some large intermediate value
    val huge = (1 to 1000000).toList        
    val small = List.fill(5)(scala.util.Random.nextInt)
    // accidentally use huge in a literal
    small.iterator filterNot ( huge contains _ )    
}

现在 iterator.filterNot 可以延迟工作,这很棒!因此,我们预计返回的迭代器不会消耗太多内存(实际上,O(1))。然而遗憾的是,我们犯了一个可怕的错误:由于 filterNot 是惰性的,它保留了对函数文字 huge contains _ 的引用。

因此,虽然我们认为该方法在运行时需要大量内存,并且该内存可以在该方法终止后立即释放,但实际上内存会被卡住,直到我们忘记返回的迭代器。

(我刚刚犯了这样一个错误,花了很长时间才找到!你可以通过查看堆转储来发现这样的事情......)

避免此问题的最佳做法是什么?

似乎唯一的解决方案是仔细检查在作用域结束后仍然存在的函数文字,以及捕获中间变量的函数文字。如果您正在构建一个非严格的集合并计划返回它,这有点尴尬。任何人都可以想出一些不错的技巧(Scala 特定的或其他的)来避免这个问题并让我编写漂亮的代码吗?

更新:我之前给出的例子很愚蠢,正如下面 huynhjl 的回答所示。曾经是:

def method: Iterator[Int] {
    val huge = (1 to 1000000).toList // construct some large intermediate value
    val n = huge.last                // do some calculation based on it
    (1 to n).iterator map (_ + 1)    // return some small value 
}

事实上,既然我更好地了解了这些事情是如何运作的,我就不那么担心了!

I'll ask this with a Scala example, but it may well be that this affects other languages which allow hybrid imperative and functional styles.

Here's a short example (UPDATED, see below):

def method: Iterator[Int] {
    // construct some large intermediate value
    val huge = (1 to 1000000).toList        
    val small = List.fill(5)(scala.util.Random.nextInt)
    // accidentally use huge in a literal
    small.iterator filterNot ( huge contains _ )    
}

Now iterator.filterNot works lazily, which is great! As a result, we'd expect that the returned iterator won't consume much memory (indeed, O(1)). Sadly, however, we've made a terrible mistake: since filterNot is lazy, it keeps a reference to the function literal huge contains _.

Thus while we thought that the method would require a large amount of memory while it was running, and that that memory could be freed up immediately after the termination of the method, in fact that memory is stuck until we forget the returned Iterator.

(I just made such a mistake, which took a long time to track down! You can catch such things looking at heap dumps ...)

What are best practices for avoiding this problem?

It seems that the only solution is to carefully check for function literals which survive the end of the scope, and which captured intermediate variables. This is a bit awkward if you're constructing a non-strict collection and planning on returning it. Can anyone think of some nice tricks, Scala-specific or otherwise, that avoid this problem and let me write nice code?

UPDATE: the example I'd given previously was stupid, as huynhjl's answer below demonstrates. It had been:

def method: Iterator[Int] {
    val huge = (1 to 1000000).toList // construct some large intermediate value
    val n = huge.last                // do some calculation based on it
    (1 to n).iterator map (_ + 1)    // return some small value 
}

In fact, now that I understand a bit better how these things work, I'm not so worried!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

呆° 2024-10-04 13:08:13

您确定您没有过度简化测试用例吗?这是我运行的内容:

object Clos {
  def method: Iterator[Int] = {
    val huge = (1 to 2000000).toList
    val n = huge.last
    (1 to n).iterator map (_ + 1)
  }

  def gc() { println("GC!!"); Runtime.getRuntime.gc }

  def main(args:Array[String]) {
    val list = List(method, method, method)
    list.foreach(m => println(m.next))
    gc()
    list.foreach(m => println(m.next))
    list.foreach(m => println(m.next))
  }
}

如果我理解正确的话,因为即使在 gc() 调用之后 main 仍在使用迭代器,JVM 将保留 巨大的物体。

这就是我运行它的方式:

JAVA_OPTS="-verbose:gc" scala -cp classes Clos

这就是它最后打印的内容:

[Full GC 57077K->57077K(60916K), 0.3340941 secs]
[Full GC 60852K->60851K(65088K), 0.3653304 secs]
2
2
2
GC!!
[Full GC 62959K->247K(65088K), 0.0610994 secs]
3
3
3
4
4
4

所以在我看来,好像巨大的对象被回收了......

Are you sure you're not oversimplifying the test case? Here is what I run:

object Clos {
  def method: Iterator[Int] = {
    val huge = (1 to 2000000).toList
    val n = huge.last
    (1 to n).iterator map (_ + 1)
  }

  def gc() { println("GC!!"); Runtime.getRuntime.gc }

  def main(args:Array[String]) {
    val list = List(method, method, method)
    list.foreach(m => println(m.next))
    gc()
    list.foreach(m => println(m.next))
    list.foreach(m => println(m.next))
  }
}

If I understand you correctly, because main is using the iterators even after the gc() call, the JVM would be holding onto the huge objects.

This is how I run it:

JAVA_OPTS="-verbose:gc" scala -cp classes Clos

This is what it prints towards the end:

[Full GC 57077K->57077K(60916K), 0.3340941 secs]
[Full GC 60852K->60851K(65088K), 0.3653304 secs]
2
2
2
GC!!
[Full GC 62959K->247K(65088K), 0.0610994 secs]
3
3
3
4
4
4

So it looks to me as if the huge objects were reclaimed...

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文