当前位置：文江博客话题详情

scala并行集合的并行度

发布于 2024-10-27 06:01:37 字数 105 浏览 4 评论 0原文

scala 并行集合中是否有与 LINQ 的 withDegreeOfParallelism 等效的设置将运行查询的线程数？我想并行运行一个操作，需要运行一定数量的线程。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

魄砕の薆 2024-11-03 06:01:37

对于使用 JVM 1.6 或更高版本的最新主干，请使用：

collection.parallel.ForkJoinTasks.defaultForkJoinPool.setParallelism(parlevel: Int)

不过，这可能会在将来发生变化。计划在下一个版本中采用更统一的方法来配置所有 Scala 任务并行 API。

但请注意，虽然这将确定查询使用的处理器数量，但这可能不是运行查询所涉及的实际线程数。由于并行集合支持嵌套并行性，因此实际线程池实现如果检测到有必要，可能会分配更多线程来运行查询。

编辑：

从 Scala 2.10 开始，设置并行级别的首选方法是将 tasksupport 字段设置为新的 TaskSupport 对象，如下例所示：

scala> import scala.collection.parallel._
import scala.collection.parallel._

scala> val pc = mutable.ParArray(1, 2, 3)
pc: scala.collection.parallel.mutable.ParArray[Int] = ParArray(1, 2, 3)

scala> pc.tasksupport = new ForkJoinTaskSupport(new scala.concurrent.forkjoin.ForkJoinPool(2))
pc.tasksupport: scala.collection.parallel.TaskSupport = scala.collection.parallel.ForkJoinTaskSupport@4a5d484a

scala> pc map { _ + 1 }
res0: scala.collection.parallel.mutable.ParArray[Int] = ParArray(2, 3, 4)

在实例化具有 fork join 池的 ForkJoinTaskSupport 对象，fork join 池的并行度必须设置为所需的值（示例中的 2）。

With the newest trunk, using the JVM 1.6 or newer, use the:

collection.parallel.ForkJoinTasks.defaultForkJoinPool.setParallelism(parlevel: Int)

This may be a subject to changes in the future, though. A more unified approach to configuring all Scala task parallel APIs is planned for the next releases.

Note, however, that while this will determine the number of processors the query utilizes, this may not be the actual number of threads involved in running a query. Since parallel collections support nested parallelism, the actual thread pool implementation may allocate more threads to run the query if it detects this is necessary.

EDIT:

Starting from Scala 2.10, the preferred way to set the parallelism level is through setting the tasksupport field to a new TaskSupport object, as in the following example:

scala> import scala.collection.parallel._
import scala.collection.parallel._

scala> val pc = mutable.ParArray(1, 2, 3)
pc: scala.collection.parallel.mutable.ParArray[Int] = ParArray(1, 2, 3)

scala> pc.tasksupport = new ForkJoinTaskSupport(new scala.concurrent.forkjoin.ForkJoinPool(2))
pc.tasksupport: scala.collection.parallel.TaskSupport = scala.collection.parallel.ForkJoinTaskSupport@4a5d484a

scala> pc map { _ + 1 }
res0: scala.collection.parallel.mutable.ParArray[Int] = ParArray(2, 3, 4)

While instantiating the ForkJoinTaskSupport object with a fork join pool, the parallelism level of the fork join pool must be set to the desired value (2 in the example).

回复收藏 0 原文

红颜悴 2024-11-03 06:01:37

独立于 JVM 版本，在 Scala 2.9+（引入了并行集合）中，您还可以使用 grouped(Int) 和 par 函数的组合来执行并行作业小块，如下所示：

scala> val c = 1 to 5
c: scala.collection.immutable.Range.Inclusive = Range(1, 2, 3, 4, 5)

scala> c.grouped(2).seq.flatMap(_.par.map(_ * 2)).toList
res11: List[Int] = List(2, 4, 6, 8, 10)

grouped(2) 创建长度为 2 或更小的块，seq 确保块的集合不是并行的（在本例中无用），然后_ * 2 函数在小并行块（使用 par 创建）上执行，从而确保最多 2 个线程并行执行。

然而，这可能比设置工作池参数的效率稍低，我对此不确定。

Independently of the JVM version, with Scala 2.9+ (introduced parallel collections), you can also use a combination of the grouped(Int) and par functions to execute parallel jobs on small chunks, like this:

scala> val c = 1 to 5
c: scala.collection.immutable.Range.Inclusive = Range(1, 2, 3, 4, 5)

scala> c.grouped(2).seq.flatMap(_.par.map(_ * 2)).toList
res11: List[Int] = List(2, 4, 6, 8, 10)

grouped(2) creates chunks of length 2 or less, seq makes sure the collection of chunks is not parallel (useless in this example), then the _ * 2 function is executed on the small parallel chunks (created with par), thus insuring that at most 2 threads is executed in parallel.

This might be however slightly less efficient than setting the worker pool parameter, I'm not sure about that.

回复收藏 0 原文

~没有更多了~