Scala 2.9 并行集合在幕后如何工作?

发布于 2024-11-15 04:35:09 字数 221 浏览 2 评论 0原文

Scala 2.9 引入了并行集合。对于某些任务来说,它们确实是一个很棒的工具。但是,它们内部如何工作以及我是否能够影响行为/配置?

他们使用什么方法来计算最佳线程数?如果我对结果不满意,是否需要调整任何配置参数?

我不仅对实际创建的线程数量感兴趣,还对实际工作如何在线程之间分配的方式感兴趣。结果是如何收集的以及幕后有多少魔力。 Scala 是否以某种方式测试集合是否足够大以从并行处理中受益?

Scala 2.9 introduced parallel collections. They are a really great tool for certain tasks. However, how do they work internally and am I able to influence the behavior/configuration?

What method do they use to figure out the optimal number of threads? If I am not satisfied with the result are there any configuration parameters to adjust?

I'm not only interested how many threads are actually created, I am also interested in the way how the actual work is distributed amongst them. How the results are collected and how much magic is going on behind the scenes. Does Scala somehow test if a collection is large enough to benefit from parallel processing?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

静若繁花 2024-11-22 04:35:09

简而言之,您的操作如何并行化有两个正交的方面:

  1. 对于可并行操作(例如 mapfilter
  2. 用于底层 fork-join 池(在其上执行并行任务)的线程数

对于#2,这是由池本身管理的,池本身会发现的“理想”水平运行时的并行性(参见java.lang.Runtime.getRuntime.availableProcessors

对于#1,这是一个单独的问题,scala 并行集合 API 通过工作窃取的概念来实现这一点(自适应调度)。也就是说,当完成一项特定的工作时,工作人员将尝试从其他工作队列中窃取工作。如果没有可用的处理器,则表明所有处理器都非常繁忙,因此应该承担更大的工作量。

该库的实现者 Aleksandar Prokopec 在今年的 ScalaDays 上发表了演讲,该演讲很快就会上线。他还在 ScalaDays2010 上发表了精彩演讲详细描述了如何拆分和重新连接操作(有许多问题并不是立即显而易见的,而且其中也有一些可爱的聪明之处!)。

描述并行集合 API 的 PDF 中提供了更全面的答案。

Briefly, there are two orthogonal aspects to how your operations are parallelized:

  1. The extent to which your collection is split into chunks (i.e. the size of the chunks) for a parallelizable operation (such as map or filter)
  2. The number of threads to use for the underlying fork-join pool (on which the parallel tasks are executed)

For #2, this is managed by the pool itself, which discovers the "ideal" level of parallelism at runtime (see java.lang.Runtime.getRuntime.availableProcessors)

For #1, this is a separate problem and the scala parallel collections API does this via the concept of work-stealing (adaptive scheduling). That is, when a particular piece of work is done, a worker will attempt to steal work from other work-queues. If none is available, this is an indication that all of the processors are very busy and hence a bigger chunk of work should be taken.

Aleksandar Prokopec, who implemented the library gave a talk at this year's ScalaDays which will be online shortly. He also gave a great talk at ScalaDays2010 where he describes in detail how the operations are split and re-joined (there are a number of issues that are not immediately obvious and some lovely bits of cleverness in there too!).

A more comprehensive answer is available in the PDF describing the parallel collections API.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文