当前位置：文江博客话题详情

并行化 for 循环

发布于 2024-11-01 21:59:30 字数 182 浏览 1 评论 0原文

我有一个 for 循环，其中迭代 i 时的计算不依赖于先前迭代中完成的计算。

我想并行化for循环（我的代码是java），以便多个迭代的计算可以在多个处理器上同时运行。我是否应该为每次迭代的计算创建一个线程，即要创建的线程数等于迭代次数（for循环中迭代次数很大）？如何做到这一点？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

数理化全能战士 2024-11-08 21:59:30

这是一个小示例，您可能会发现它对开始并行化很有帮助。它假设：

您创建一个 Input 对象，其中包含每次计算迭代的输入。
您创建一个 Output 对象，其中包含计算每次迭代的输入的输出。
您希望传入一个输入列表并立即返回一个输出列表。
您的输入是一个合理的工作量，因此开销并不太高。

如果您的计算非常简单，那么您可能会考虑批量处理它们。您可以通过在每个输入中输入 100 来做到这一点。它使用与系统中的处理器一样多的线程。如果您正在处理纯粹的 CPU 密集型任务，那么这可能就是您想要的数字。如果他们因等待其他东西（磁盘、网络、数据库等）而受阻，您会想要更高

public List<Output> processInputs(List<Input> inputs)
        throws InterruptedException, ExecutionException {

    int threads = Runtime.getRuntime().availableProcessors();
    ExecutorService service = Executors.newFixedThreadPool(threads);

    List<Future<Output>> futures = new ArrayList<Future<Output>>();
    for (final Input input : inputs) {
        Callable<Output> callable = new Callable<Output>() {
            public Output call() throws Exception {
                Output output = new Output();
                // process your input here and compute the output
                return output;
            }
        };
        futures.add(service.submit(callable));
    }

    service.shutdown();

    List<Output> outputs = new ArrayList<Output>();
    for (Future<Output> future : futures) {
        outputs.add(future.get());
    }
    return outputs;
}

Here's a small example that you might find helpful to get started with parallelization. It assumes that:

You create an Input object that contains the input for each iteration of your computation.
You create an Output object that contains the output from computing the input of each iteration.
You want to pass in a list of inputs and get back a list of outputs all at once.
Your input is a reasonable chunk of work to do, so overhead isn't too high.

If your computation is really simple then you'll probably want to consider processing them in batches. You could do that by putting say 100 in each input. It uses as many threads as there are processors in your system. If you're dealing with purely CPU intensive tasks then that's probably the number you want. You'd want to go higher if they're blocked waiting for something else (disk, network, database, etc.)

public List<Output> processInputs(List<Input> inputs)
        throws InterruptedException, ExecutionException {

    int threads = Runtime.getRuntime().availableProcessors();
    ExecutorService service = Executors.newFixedThreadPool(threads);

    List<Future<Output>> futures = new ArrayList<Future<Output>>();
    for (final Input input : inputs) {
        Callable<Output> callable = new Callable<Output>() {
            public Output call() throws Exception {
                Output output = new Output();
                // process your input here and compute the output
                return output;
            }
        };
        futures.add(service.submit(callable));
    }

    service.shutdown();

    List<Output> outputs = new ArrayList<Output>();
    for (Future<Output> future : futures) {
        outputs.add(future.get());
    }
    return outputs;
}

回复收藏 0 原文

鹤仙姿 2024-11-08 21:59:30

您不应该手动进行线程处理。相反：

合理地创建一个大小的线程池执行器服务（如果您的计算不进行 IO，则使用与核心数量一样多的线程）。
运行一个循环，将每个单独的计算提交给执行器服务并保留生成的 Future 对象。请注意，如果每次计算仅包含少量工作，这将产生大量开销，甚至可能比单线程程序慢。在这种情况下，请提交按照 mdma 建议执行计算数据包的作业。
运行第二个循环，收集所有 Future 的结果（它将隐式等待，直到所有计算完成）
关闭执行程序服务

回复收藏 0 原文

刘备忘录 2024-11-08 21:59:30

不，您不应该为每次迭代创建一个线程。最佳线程数与可用处理器的数量有关 - 线程太多，会浪费太多时间进行上下文切换，而不会增加性能。

如果您不完全热衷于 Java，您可能想尝试并行高性能 C 系统，例如 OpenMPI。 OpenMPI 适合解决这类问题。

回复收藏 0 原文

衣神在巴黎 2024-11-08 21:59:30

不要自己创建线程。我建议您使用 fork/join 框架 (jsr166y) 并创建迭代给定范围的项目的任务。它将为您处理线程管理，使用硬件支持的尽可能多的线程。

任务粒度是这里的主要问题。如果每次迭代的计算量相对较低（例如少于 100 个操作），那么将每次迭代作为单独的任务执行将引入大量任务调度开销。最好让每个任务接受要计算的参数列表，并将结果作为列表返回。这样，您可以让每个任务计算 1 个、10 个或数千个元素，将任务粒度保持在合理的水平，从而在保持工作可用和减少任务管理开销之间取得平衡。

jsr166z 中还有一个 ParallelArray 类，允许对数组进行重复计算。如果您正在计算的值是原始类型，那么这可能对您有用。

回复收藏 0 原文

~没有更多了~