并行化 for 循环

发布于 2024-11-01 21:59:30 字数 182 浏览 1 评论 0原文

我有一个 for 循环,其中迭代 i 时的计算不依赖于先前迭代中完成的计算。

我想并行化for循环(我的代码是java),以便多个迭代的计算可以在多个处理器上同时运行。我是否应该为每次迭代的计算创建一个线程,即要创建的线程数等于迭代次数(for循环中迭代次数很大)?如何做到这一点?

I have a for loop where the computation at iteration i does not depend on the computations done in the previous iterations.

I want to parallelize the for loop(my code is in java) so that the computation of multiple iterations can be run concurrently on multiple processors. Should I create a thread for the computation of each iteration, i.e. number of threads to be created is equal to the number of iterations(number of iterations are large in the for loop)? How to do this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

数理化全能战士 2024-11-08 21:59:30

这是一个小示例,您可能会发现它对开始并行化很有帮助。它假设:

  1. 您创建一个 Input 对象,其中包含每次计算迭代的输入。
  2. 您创建一个 Output 对象,其中包含计算每次迭代的输入的输出。
  3. 您希望传入一个输入列表并立即返回一个输出列表。
  4. 您的输入是一个合理的工作量,因此开销并不太高。

如果您的计算非常简单,那么您可能会考虑批量处理它们。您可以通过在每个输入中输入 100 来做到这一点。它使用与系统中的处理器一样多的线程。如果您正在处理纯粹的 CPU 密集型任务,那么这可能就是您想要的数字。如果他们因等待其他东西(磁盘、网络、数据库等)而受阻,您会想要更高

public List<Output> processInputs(List<Input> inputs)
        throws InterruptedException, ExecutionException {

    int threads = Runtime.getRuntime().availableProcessors();
    ExecutorService service = Executors.newFixedThreadPool(threads);

    List<Future<Output>> futures = new ArrayList<Future<Output>>();
    for (final Input input : inputs) {
        Callable<Output> callable = new Callable<Output>() {
            public Output call() throws Exception {
                Output output = new Output();
                // process your input here and compute the output
                return output;
            }
        };
        futures.add(service.submit(callable));
    }

    service.shutdown();

    List<Output> outputs = new ArrayList<Output>();
    for (Future<Output> future : futures) {
        outputs.add(future.get());
    }
    return outputs;
}

Here's a small example that you might find helpful to get started with parallelization. It assumes that:

  1. You create an Input object that contains the input for each iteration of your computation.
  2. You create an Output object that contains the output from computing the input of each iteration.
  3. You want to pass in a list of inputs and get back a list of outputs all at once.
  4. Your input is a reasonable chunk of work to do, so overhead isn't too high.

If your computation is really simple then you'll probably want to consider processing them in batches. You could do that by putting say 100 in each input. It uses as many threads as there are processors in your system. If you're dealing with purely CPU intensive tasks then that's probably the number you want. You'd want to go higher if they're blocked waiting for something else (disk, network, database, etc.)

public List<Output> processInputs(List<Input> inputs)
        throws InterruptedException, ExecutionException {

    int threads = Runtime.getRuntime().availableProcessors();
    ExecutorService service = Executors.newFixedThreadPool(threads);

    List<Future<Output>> futures = new ArrayList<Future<Output>>();
    for (final Input input : inputs) {
        Callable<Output> callable = new Callable<Output>() {
            public Output call() throws Exception {
                Output output = new Output();
                // process your input here and compute the output
                return output;
            }
        };
        futures.add(service.submit(callable));
    }

    service.shutdown();

    List<Output> outputs = new ArrayList<Output>();
    for (Future<Output> future : futures) {
        outputs.add(future.get());
    }
    return outputs;
}
鹤仙姿 2024-11-08 21:59:30

您不应该手动进行线程处理。相反:

  • 合理地创建一个 大小的线程池执行器服务(如果您的计算不进行 IO,则使用与核心数量一样多的线程)。
  • 运行一个循环,将每个单独的计算提交给执行器服务并保留生成的 Future 对象。请注意,如果每次计算仅包含少量工作,这将产生大量开销,甚至可能比单线程程序慢。在这种情况下,请提交按照 mdma 建议执行计算数据包的作业。
  • 运行第二个循环,收集所有 Future 的结果(它将隐式等待,直到所有计算完成)
  • 关闭执行程序服务

You should not do the thread handling manually. Instead:

  • create a reasonably-sized thread pool executor service (if your computations do no IO, use as many threads as you have cores).
  • Run a loop that submits each individual computation to the executor service and keeps the resulting Future objects. Note that if each computation consists of only a small amount of work, this will create a lot of overhead and possibly even be slower than a single-threaded program. In that case, submit jobs that do packets of computation as mdma suggests.
  • Run a second loop that collects the results from all the Futures (it will implicitly wait until all computations have finished)
  • shut down the executor service
刘备忘录 2024-11-08 21:59:30

不,您不应该为每次迭代创建一个线程。最佳线程数与可用处理器的数量有关 - 线程太多,会浪费太多时间进行上下文切换,而不会增加性能。

如果您不完全热衷于 Java,您可能想尝试并行高性能 C 系统,例如 OpenMPI。 OpenMPI 适合解决这类问题。

No, you should not create one thread for each iteration. The optimum number of threads is related to the number of processors available - too many threads, and you waste too much time context switching for no added performance.

If you're not totally attached to Java, you might want to try a parallel high-performance C system like OpenMPI. OpenMPI is suitable for this kind of problem.

衣神在巴黎 2024-11-08 21:59:30

不要自己创建线程。我建议您使用 fork/join 框架 (jsr166y) 并创建迭代给定范围的项目的任务。它将为您处理线程管理,使用硬件支持的尽可能多的线程。

任务粒度是这里的主要问题。如果每次迭代的计算量相对较低(例如少于 100 个操作),那么将每次迭代作为单独的任务执行将引入大量任务调度开销。最好让每个任务接受要计算的参数列表,并将结果作为列表返回。这样,您可以让每个任务计算 1 个、10 个或数千个元素,将任务粒度保持在合理的水平,从而在保持工作可用和减少任务管理开销之间取得平衡。

jsr166z 中还有一个 ParallelArray 类,允许对数组进行重复计算。如果您正在计算的值是原始类型,那么这可能对您有用。

Don't create the threads yourself. I recommend you use the fork/join framework (jsr166y) and create tasks that iterate over a given range of items. It will take care of the thread management for you, using as many threads as the hardware supports.

Task granularity is the main issue here. If each iteration is relatively low computation (say less than 100 operations) then having each iteration executed as a separate task will introduce a lot of overhead of task scheduling. It's better to have each task accept a List of arguments to compute, and return the result as a list. That way you can have each task compute 1, 10 or thousands of elements, to keep task granulary at a reasonable level that balances keeping work available, and reducing task management overhead.

There is also a ParallelArray class in jsr166z, that allows repeated computation over an array. That may work for you, if the values you are computing are primitive types.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文