为什么我的 scala future 效率不高？

发布于 2024-09-17 09:44:02 字数 1831 浏览 10 评论 0原文

我正在 32 位四核 Core2 系统上运行此 scala 代码：（

def job(i:Int,s:Int):Long = {
  val r=(i to 500000000 by s).map(_.toLong).foldLeft(0L)(_+_)
  println("Job "+i+" done")
  r
}

import scala.actors.Future
import scala.actors.Futures._

val JOBS=4

val jobs=(0 until JOBS).toList.map(i=>future {job(i,JOBS)})
println("Running...")
val results=jobs.map(f=>f())
println(results.foldLeft(0L)(_+_))

是的，我确实知道有更多更有效的方法来对一系列整数求和；只是给CPU一些事情做）。

根据我设置的 JOBS，代码会在以下时间运行：

JOBS=1 : 31.99user 0.84system 0:28.87elapsed 113%CPU
JOBS=2 : 27.71user 1.12system 0:14.74elapsed 195%CPU
JOBS=3 : 33.19user 0.39system 0:13.02elapsed 257%CPU
JOBS=4 : 49.08user 8.46system 0:22.71elapsed 253%CPU

令我惊讶的是，这并没有真正超出 2 个“正在运行”的 future。我编写了大量多线程 C++ 代码，毫无疑问，如果我使用 Intel 的 TBB 或 boost::threads 编写此类代码，我会很好地扩展到 4 个核心，并看到 >390% CPU 利用率code> （当然它会更加冗长）。

那么：发生了什么事以及如何才能扩展到我期望看到的 4 核？这是否受到 scala 或 JVM 中某些内容的限制？在我看来，我实际上并不知道 scala 的 futures 在“哪里”运行...是每个 future 生成的线程，还是“Futures”提供了一个专门用于运行它们的线程池？

[我在带有 sun-java6 (6-20-0lennnny1) 的 Lenny 系统上使用 Debian/Squeeze 的 scala 2.7.7 软件包。]

更新：

根据 Rex 的回答中的建议，我重新编码为避免创建对象。

def job(i:Long,s:Long):Long = {
  var t=0L
  var v=i
  while (v<=10000000000L) {
    t+=v
    v+=s
  }
  println("Job "+i+" done")
  t
}
// Rest as above...

这太快了，我必须显着增加迭代次数才能运行任意时间！结果是：

JOBS=1: 28.39user 0.06system 0:29.25elapsed 97%CPU
JOBS=2: 28.46user 0.04system 0:14.95elapsed 190%CPU
JOBS=3: 24.66user 0.06system 0:10.26elapsed 240%CPU
JOBS=4: 28.32user 0.12system 0:07.85elapsed 362%CPU

这更像是我希望看到的情况（尽管 3 份工作的情况有点奇怪，其中一项任务总是比其他两项早几秒钟完成）。

更进一步，在四核超线程 i7 上，后一个带有 JOBS=8 的版本比 JOBS=1 实现了 x4.4 的加速，CPU 使用率为 571%。

原文

I'm running this scala code on a 32-bit quad-core Core2 system:

def job(i:Int,s:Int):Long = {
  val r=(i to 500000000 by s).map(_.toLong).foldLeft(0L)(_+_)
  println("Job "+i+" done")
  r
}

import scala.actors.Future
import scala.actors.Futures._

val JOBS=4

val jobs=(0 until JOBS).toList.map(i=>future {job(i,JOBS)})
println("Running...")
val results=jobs.map(f=>f())
println(results.foldLeft(0L)(_+_))

(Yes, I do know there are much more efficient ways to sum a series of integers; it's just to give the CPU something to do).

Depending on what I set JOBS to, the code runs in the following times:

JOBS=1 : 31.99user 0.84system 0:28.87elapsed 113%CPU
JOBS=2 : 27.71user 1.12system 0:14.74elapsed 195%CPU
JOBS=3 : 33.19user 0.39system 0:13.02elapsed 257%CPU
JOBS=4 : 49.08user 8.46system 0:22.71elapsed 253%CPU

I'm surprised that this doesn't really scale well beyond 2 futures "in play". I do a lot of multithreaded C++ code and have no doubt I'd get good scaling up to 4 cores and see >390% CPU utilisation if I coded this sort of thing with Intel's TBB or boost::threads (it'd be considerably more verbose of course).

So: what's going on and how can I get the scaling to 4 cores I'd expect to see ? Is this limited by something in scala or the JVM ? It occurs to me I don't actually know "where" scala's futures run... is a thread spawned per future, or does "Futures" provide a thread pool dedicated to running them ?

[I'm using the scala 2.7.7 packages from Debian/Squeeze on a Lenny system with sun-java6 (6-20-0lennny1).]

Update:

As suggested in Rex's answer, I recoded to avoid object creation.

def job(i:Long,s:Long):Long = {
  var t=0L
  var v=i
  while (v<=10000000000L) {
    t+=v
    v+=s
  }
  println("Job "+i+" done")
  t
}
// Rest as above...

This was so much faster I had to significantly increase the iteration count to run for any amount of time! Results are:

JOBS=1: 28.39user 0.06system 0:29.25elapsed 97%CPU
JOBS=2: 28.46user 0.04system 0:14.95elapsed 190%CPU
JOBS=3: 24.66user 0.06system 0:10.26elapsed 240%CPU
JOBS=4: 28.32user 0.12system 0:07.85elapsed 362%CPU

which is much more like what I'd hope to see (although the 3 jobs case is a little odd, with one task consistently completing a couple of seconds before the other two).

Pushing it a bit further, on a quad-core hyperthreaded i7 the latter version with JOBS=8 achieves an x4.4 speedup vs JOBS=1, with 571% CPU usage.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

离去的眼神 2024-09-24 09:44:10

尝试

(i to 500000000 by s).view.map(_.toLong).foldLeft(0L)(_+_)

view 的应用程序应该（据我理解 id）通过提供简单的包装器来避免重复迭代和对象创建。

另请注意，您可以使用 reduceLeft(_+_) 而不是折叠。

Try

(i to 500000000 by s).view.map(_.toLong).foldLeft(0L)(_+_)

The application of view is supposed to (as I understood id) to avoid repeated iteration and object creation by providing simple wrappers.

Note also that you can use reduceLeft(_+_) instead of fold.

回复收藏 0 原文

鸠书 2024-09-24 09:44:09

我的猜测是，垃圾收集器所做的工作比添加本身还要多。因此，您受到垃圾收集器可以管理的内容的限制。尝试使用不创建任何对象的东西再次运行测试（例如使用 while 循环而不是范围/地图/折叠）。如果您的实际应用程序对 GC 的影响如此之大，您还可以使用并行 GC 选项。

回复收藏 0 原文

~没有更多了~

关于作者

待＂谢繁草

暂无简介

0 文章

0 评论

23 人气

关注发私信

友情链接

文江博客

为什么我的 scala future 效率不高？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

束缚ｍ

alipaysp_VP2a8Q4rgx

α

一口甜

厌味

转身泪倾城

友情链接

为什么我的 scala future 效率不高？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

束缚ｍ

alipaysp_VP2a8Q4rgx

α

一口甜

厌味

转身泪倾城

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。