并行化 for 循环
我有一个 for 循环,其中迭代 i
时的计算不依赖于先前迭代中完成的计算。
我想并行化for
循环(我的代码是java),以便多个迭代的计算可以在多个处理器上同时运行。我是否应该为每次迭代的计算创建一个线程,即要创建的线程数等于迭代次数(for循环中迭代次数很大)?如何做到这一点?
I have a for loop where the computation at iteration i
does not depend on the computations done in the previous iterations.
I want to parallelize the for
loop(my code is in java) so that the computation of multiple iterations can be run concurrently on multiple processors. Should I create a thread for the computation of each iteration, i.e. number of threads to be created is equal to the number of iterations(number of iterations are large in the for loop)? How to do this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
这是一个小示例,您可能会发现它对开始并行化很有帮助。它假设:
Input
对象,其中包含每次计算迭代的输入。Output
对象,其中包含计算每次迭代的输入的输出。如果您的计算非常简单,那么您可能会考虑批量处理它们。您可以通过在每个输入中输入 100 来做到这一点。它使用与系统中的处理器一样多的线程。如果您正在处理纯粹的 CPU 密集型任务,那么这可能就是您想要的数字。如果他们因等待其他东西(磁盘、网络、数据库等)而受阻,您会想要更高
Here's a small example that you might find helpful to get started with parallelization. It assumes that:
Input
object that contains the input for each iteration of your computation.Output
object that contains the output from computing the input of each iteration.If your computation is really simple then you'll probably want to consider processing them in batches. You could do that by putting say 100 in each input. It uses as many threads as there are processors in your system. If you're dealing with purely CPU intensive tasks then that's probably the number you want. You'd want to go higher if they're blocked waiting for something else (disk, network, database, etc.)
您不应该手动进行线程处理。相反:
Future
对象。请注意,如果每次计算仅包含少量工作,这将产生大量开销,甚至可能比单线程程序慢。在这种情况下,请提交按照 mdma 建议执行计算数据包的作业。You should not do the thread handling manually. Instead:
Future
objects. Note that if each computation consists of only a small amount of work, this will create a lot of overhead and possibly even be slower than a single-threaded program. In that case, submit jobs that do packets of computation as mdma suggests.Future
s (it will implicitly wait until all computations have finished)不,您不应该为每次迭代创建一个线程。最佳线程数与可用处理器的数量有关 - 线程太多,会浪费太多时间进行上下文切换,而不会增加性能。
如果您不完全热衷于 Java,您可能想尝试并行高性能 C 系统,例如 OpenMPI。 OpenMPI 适合解决这类问题。
No, you should not create one thread for each iteration. The optimum number of threads is related to the number of processors available - too many threads, and you waste too much time context switching for no added performance.
If you're not totally attached to Java, you might want to try a parallel high-performance C system like OpenMPI. OpenMPI is suitable for this kind of problem.
不要自己创建线程。我建议您使用 fork/join 框架 (jsr166y) 并创建迭代给定范围的项目的任务。它将为您处理线程管理,使用硬件支持的尽可能多的线程。
任务粒度是这里的主要问题。如果每次迭代的计算量相对较低(例如少于 100 个操作),那么将每次迭代作为单独的任务执行将引入大量任务调度开销。最好让每个任务接受要计算的参数列表,并将结果作为列表返回。这样,您可以让每个任务计算 1 个、10 个或数千个元素,将任务粒度保持在合理的水平,从而在保持工作可用和减少任务管理开销之间取得平衡。
jsr166z 中还有一个 ParallelArray 类,允许对数组进行重复计算。如果您正在计算的值是原始类型,那么这可能对您有用。
Don't create the threads yourself. I recommend you use the fork/join framework (jsr166y) and create tasks that iterate over a given range of items. It will take care of the thread management for you, using as many threads as the hardware supports.
Task granularity is the main issue here. If each iteration is relatively low computation (say less than 100 operations) then having each iteration executed as a separate task will introduce a lot of overhead of task scheduling. It's better to have each task accept a List of arguments to compute, and return the result as a list. That way you can have each task compute 1, 10 or thousands of elements, to keep task granulary at a reasonable level that balances keeping work available, and reducing task management overhead.
There is also a ParallelArray class in jsr166z, that allows repeated computation over an array. That may work for you, if the values you are computing are primitive types.