奇怪的并行收集行为

发布于 2024-11-30 16:23:15 字数 1536 浏览 0 评论 0原文

我正在尝试使用 scala 并行集合来实现一些 cpu 密集型任务，我想抽象算法的执行方式（顺序、并行甚至分布式），但代码不能像我那样工作会怀疑，我不知道我做错了什么。

我想抽象这个问题的方式被模拟如下：

// just measures time a block of code runs
def time(block: => Unit) : Long = {
  val start = System.currentTimeMillis
  block
  val stop = System.currentTimeMillis
  stop - start
}

// "lengthy" task
def work = {
  Thread.sleep(100)
  println("done")
  1
}

import scala.collection.GenSeq


abstract class ContextTransform {
  def apply[T](genSeq: GenSeq[T]): GenSeq[T]
}

object ParContextTransform extends ContextTransform {
  override def apply[T](genSeq: GenSeq[T]): GenSeq[T] = genSeq.par
}

// this works as expected
def callingParDirectly = {
  val range = (1 to 10).par

  // make sure we really got a ParSeq
  println(range) 
  for (i <- range) yield work
}

// this doesn't 
def callingParWithContextTransform(contextTransform: ContextTransform) = {
  val range = contextTransform(1 to 10)

  // make sure we really got a ParSeq
  println(range)
  for (i <- range) yield work
}

解释器的结果：

scala> time(callingParDirectly)
ParRange(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
done
// ...
done
res20: Long = 503

scala> time(callingParWithContextTransform(ParContextTransform))
ParRange(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
done
// ...
done
res21: Long = 1002

我的第一个赌注是集合没有正确分割，并且 println 的 “完成”确实表明...但是如果我不屈服，上面的代码就可以很好地工作任何东西（只需运行工作方法）。

我不明白为什么 callingParWithContextTransform 方法不起作用就像callingParDirectly；我缺少什么？

原文

I'm trying to use scala parallel collections to implement some cpu-intensive
task, I've wanted to abstract the way the algorithm can be executed
(sequentially, parallel or even distributed), but the code dosn't work as I
would suspect and I have no idea what am I doing wrong.

The way I wanted to abstract this problem is mocked below:

// just measures time a block of code runs
def time(block: => Unit) : Long = {
  val start = System.currentTimeMillis
  block
  val stop = System.currentTimeMillis
  stop - start
}

// "lengthy" task
def work = {
  Thread.sleep(100)
  println("done")
  1
}

import scala.collection.GenSeq


abstract class ContextTransform {
  def apply[T](genSeq: GenSeq[T]): GenSeq[T]
}

object ParContextTransform extends ContextTransform {
  override def apply[T](genSeq: GenSeq[T]): GenSeq[T] = genSeq.par
}

// this works as expected
def callingParDirectly = {
  val range = (1 to 10).par

  // make sure we really got a ParSeq
  println(range) 
  for (i <- range) yield work
}

// this doesn't 
def callingParWithContextTransform(contextTransform: ContextTransform) = {
  val range = contextTransform(1 to 10)

  // make sure we really got a ParSeq
  println(range)
  for (i <- range) yield work
}

The result from the interpreter:

scala> time(callingParDirectly)
ParRange(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
done
// ...
done
res20: Long = 503

scala> time(callingParWithContextTransform(ParContextTransform))
ParRange(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
done
// ...
done
res21: Long = 1002

My first bet was that the collection doesn't split properly and the println's of
"done" indeed suggest that... but the above code works well if I don't yield
anything (just run the work method).

I can't understand why the callingParWithContextTransform method doesn't work
like callingParDirectly; what am I missing?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

却一份温柔 2024-12-07 16:23:15

可能的罪魁祸首：SI-4843。

回复收藏 0 原文

苦笑流年记忆 2024-12-07 16:23:15

Daniel Sobral 是对的，这是一个已知的错误。我可以使用 Scala 2.9.1.RC3 重现您的结果，但它已修复在主干中。下面是一个演示速度减慢的简化版本：

  // just measures time a block of code runs
  def time(block: => Unit) : Long = {
      val start = System.currentTimeMillis
      block
      val stop = System.currentTimeMillis
      stop - start
  }

  // "lengthy" task
  def work = {
      Thread.sleep(100)
      1
  }

  def run() {
    import scala.collection.GenSeq

    print("Iterating over ParRange: ")
    println(time(for (i <- (1 to 10).par) yield work))

    print("Iterating over GenSeq: ")
    println(time(for (i <- (1 to 10).par: GenSeq[Int]) yield work))
  }

  run()

我在 2.9.1.RC3 上得到的输出是

Iterating over ParRange: 202
Iterating over GenSeq: 1002

2.10 的夜间版本，两个版本的运行时间约为 200 毫秒。

Daniel Sobral is right, this is a known bug. I can reproduce your results with Scala 2.9.1.RC3, but it's fixed in trunk. Here's a simplified version that demonstrates the slowdown:

  // just measures time a block of code runs
  def time(block: => Unit) : Long = {
      val start = System.currentTimeMillis
      block
      val stop = System.currentTimeMillis
      stop - start
  }

  // "lengthy" task
  def work = {
      Thread.sleep(100)
      1
  }

  def run() {
    import scala.collection.GenSeq

    print("Iterating over ParRange: ")
    println(time(for (i <- (1 to 10).par) yield work))

    print("Iterating over GenSeq: ")
    println(time(for (i <- (1 to 10).par: GenSeq[Int]) yield work))
  }

  run()

The output I get on 2.9.1.RC3 is

Iterating over ParRange: 202
Iterating over GenSeq: 1002

but on a nightly build of 2.10, both versions run in about 200ms.

回复收藏 0 原文

~没有更多了~

关于作者

北方。的韩爷

暂无简介

0 文章

0 评论

23 人气

关注发私信

友情链接

文江博客

奇怪的并行收集行为

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

游缘惊梦

小兔几

Glik

生生漫

Luxian

Champion-Ming

友情链接

奇怪的并行收集行为

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

游缘惊梦

小兔几

Glik

生生漫

Luxian

Champion-Ming

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。