奇怪的并行收集行为
我正在尝试使用 scala 并行集合来实现一些 cpu 密集型 任务,我想抽象算法的执行方式 (顺序、并行甚至分布式),但代码不能像我那样工作 会怀疑,我不知道我做错了什么。
我想抽象这个问题的方式被模拟如下:
// just measures time a block of code runs
def time(block: => Unit) : Long = {
val start = System.currentTimeMillis
block
val stop = System.currentTimeMillis
stop - start
}
// "lengthy" task
def work = {
Thread.sleep(100)
println("done")
1
}
import scala.collection.GenSeq
abstract class ContextTransform {
def apply[T](genSeq: GenSeq[T]): GenSeq[T]
}
object ParContextTransform extends ContextTransform {
override def apply[T](genSeq: GenSeq[T]): GenSeq[T] = genSeq.par
}
// this works as expected
def callingParDirectly = {
val range = (1 to 10).par
// make sure we really got a ParSeq
println(range)
for (i <- range) yield work
}
// this doesn't
def callingParWithContextTransform(contextTransform: ContextTransform) = {
val range = contextTransform(1 to 10)
// make sure we really got a ParSeq
println(range)
for (i <- range) yield work
}
解释器的结果:
scala> time(callingParDirectly)
ParRange(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
done
// ...
done
res20: Long = 503
scala> time(callingParWithContextTransform(ParContextTransform))
ParRange(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
done
// ...
done
res21: Long = 1002
我的第一个赌注是集合没有正确分割,并且 println 的 “完成”确实表明...但是如果我不屈服,上面的代码就可以很好地工作 任何东西(只需运行工作方法)。
我不明白为什么 callingParWithContextTransform
方法不起作用 就像callingParDirectly
;我缺少什么?
I'm trying to use scala parallel collections to implement some cpu-intensive
task, I've wanted to abstract the way the algorithm can be executed
(sequentially, parallel or even distributed), but the code dosn't work as I
would suspect and I have no idea what am I doing wrong.
The way I wanted to abstract this problem is mocked below:
// just measures time a block of code runs
def time(block: => Unit) : Long = {
val start = System.currentTimeMillis
block
val stop = System.currentTimeMillis
stop - start
}
// "lengthy" task
def work = {
Thread.sleep(100)
println("done")
1
}
import scala.collection.GenSeq
abstract class ContextTransform {
def apply[T](genSeq: GenSeq[T]): GenSeq[T]
}
object ParContextTransform extends ContextTransform {
override def apply[T](genSeq: GenSeq[T]): GenSeq[T] = genSeq.par
}
// this works as expected
def callingParDirectly = {
val range = (1 to 10).par
// make sure we really got a ParSeq
println(range)
for (i <- range) yield work
}
// this doesn't
def callingParWithContextTransform(contextTransform: ContextTransform) = {
val range = contextTransform(1 to 10)
// make sure we really got a ParSeq
println(range)
for (i <- range) yield work
}
The result from the interpreter:
scala> time(callingParDirectly)
ParRange(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
done
// ...
done
res20: Long = 503
scala> time(callingParWithContextTransform(ParContextTransform))
ParRange(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
done
// ...
done
res21: Long = 1002
My first bet was that the collection doesn't split properly and the println's of
"done" indeed suggest that... but the above code works well if I don't yield
anything (just run the work method).
I can't understand why the callingParWithContextTransform
method doesn't work
like callingParDirectly
; what am I missing?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
可能的罪魁祸首:SI-4843。
Possible culprit: SI-4843.
Daniel Sobral 是对的,这是一个已知的错误。我可以使用 Scala 2.9.1.RC3 重现您的结果,但它已修复在主干中。下面是一个演示速度减慢的简化版本:
我在 2.9.1.RC3 上得到的输出是
2.10 的夜间版本,两个版本的运行时间约为 200 毫秒。
Daniel Sobral is right, this is a known bug. I can reproduce your results with Scala 2.9.1.RC3, but it's fixed in trunk. Here's a simplified version that demonstrates the slowdown:
The output I get on 2.9.1.RC3 is
but on a nightly build of 2.10, both versions run in about 200ms.