使用 OpenMPI 并行运行 Eigen

发布于 2025-01-19 20:22:45 字数 2470 浏览 2 评论 0原文

我是 Eigen 的新手,正在编写一些简单的代码来测试其性能。我使用的是配备M1 Pro芯片的MacBook Pro(不知道是否是ARM架构导致问题)。代码是一个简单的拉普拉斯方程求解器

#include <iostream>
#include "mpi.h"
#include "Eigen/Dense"
#include <chrono>
 
using namespace Eigen;
using namespace std;

const size_t num = 1000UL;

MatrixXd initilize(){
    MatrixXd u = MatrixXd::Zero(num, num);
    u(seq(1, fix<num-2>), seq(1, fix<num-2>)).setConstant(10);
    return u;
}

void laplace(MatrixXd &u){
    setNbThreads(8);
    MatrixXd u_old = u;

    u(seq(1,last-1),seq(1,last-1)) =
    ((  u_old(seq(0,last-2,fix<1>),seq(1,last-1,fix<1>)) + u_old(seq(2,last,fix<1>),seq(1,last-1,fix<1>)) +
        u_old(seq(1,last-1,fix<1>),seq(0,last-2,fix<1>)) + u_old(seq(1,last-1,fix<1>),seq(2,last,fix<1>)) )*4.0 +
        u_old(seq(0,last-2,fix<1>),seq(0,last-2,fix<1>)) + u_old(seq(0,last-2,fix<1>),seq(2,last,fix<1>)) +
        u_old(seq(2,last,fix<1>),seq(0,last-2,fix<1>))   + u_old(seq(2,last,fix<1>),seq(2,last,fix<1>)) ) /20.0;
}


int main(int argc, const char * argv[]) {
    initParallel();
    setNbThreads(0);
    cout << nbThreads() << endl;
    MatrixXd u = initilize();
    
    auto start  = std::chrono::high_resolution_clock::now();
    
    for (auto i=0UL; i<100; i++) {
        laplace(u);
    }
    
    auto stop  = std::chrono::high_resolution_clock::now();
    auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(stop - start);
    
    // cout << u(seq(0, fix<10>), seq(0, fix<10>)) << endl;
    cout << "Execution time (ms): " << duration.count() << endl;
    return 0;
}

使用 gcc 编译并启用 OpenMPI

james@MBP14 tests % g++-11 -fopenmp  -O3 -I/usr/local/include -I/opt/homebrew/Cellar/open-mpi/4.1.3/include -o test4 test.cpp

直接运行二进制文件

james@MBP14 tests % ./test4
8
Execution time (ms): 273

使用 mpirun 运行并指定 8 个线程

james@MBP14 tests % mpirun -np 8 test4
8
8
8
8
8
8
8
8
Execution time (ms): 348
Execution time (ms): 347
Execution time (ms): 353
Execution time (ms): 356
Execution time (ms): 350
Execution time (ms): 353
Execution time (ms): 357
Execution time (ms): 355

所以显然矩阵运算不是并行运行的,而是每个线程都运行相同的代码副本。

应该怎么做才能解决这个问题?我对使用 OpenMPI 是否有一些误解?

I am new to Eigen and is writing some simple code to test its performance. I am using a MacBook Pro with M1 Pro chip (I do not know whether the ARM architecture causes the problem). The code is a simple Laplace equation solver

#include <iostream>
#include "mpi.h"
#include "Eigen/Dense"
#include <chrono>
 
using namespace Eigen;
using namespace std;

const size_t num = 1000UL;

MatrixXd initilize(){
    MatrixXd u = MatrixXd::Zero(num, num);
    u(seq(1, fix<num-2>), seq(1, fix<num-2>)).setConstant(10);
    return u;
}

void laplace(MatrixXd &u){
    setNbThreads(8);
    MatrixXd u_old = u;

    u(seq(1,last-1),seq(1,last-1)) =
    ((  u_old(seq(0,last-2,fix<1>),seq(1,last-1,fix<1>)) + u_old(seq(2,last,fix<1>),seq(1,last-1,fix<1>)) +
        u_old(seq(1,last-1,fix<1>),seq(0,last-2,fix<1>)) + u_old(seq(1,last-1,fix<1>),seq(2,last,fix<1>)) )*4.0 +
        u_old(seq(0,last-2,fix<1>),seq(0,last-2,fix<1>)) + u_old(seq(0,last-2,fix<1>),seq(2,last,fix<1>)) +
        u_old(seq(2,last,fix<1>),seq(0,last-2,fix<1>))   + u_old(seq(2,last,fix<1>),seq(2,last,fix<1>)) ) /20.0;
}


int main(int argc, const char * argv[]) {
    initParallel();
    setNbThreads(0);
    cout << nbThreads() << endl;
    MatrixXd u = initilize();
    
    auto start  = std::chrono::high_resolution_clock::now();
    
    for (auto i=0UL; i<100; i++) {
        laplace(u);
    }
    
    auto stop  = std::chrono::high_resolution_clock::now();
    auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(stop - start);
    
    // cout << u(seq(0, fix<10>), seq(0, fix<10>)) << endl;
    cout << "Execution time (ms): " << duration.count() << endl;
    return 0;
}

Compile with gcc and enable OpenMPI

james@MBP14 tests % g++-11 -fopenmp  -O3 -I/usr/local/include -I/opt/homebrew/Cellar/open-mpi/4.1.3/include -o test4 test.cpp

Direct run the binary file

james@MBP14 tests % ./test4
8
Execution time (ms): 273

Run with mpirun and specify 8 threads

james@MBP14 tests % mpirun -np 8 test4
8
8
8
8
8
8
8
8
Execution time (ms): 348
Execution time (ms): 347
Execution time (ms): 353
Execution time (ms): 356
Execution time (ms): 350
Execution time (ms): 353
Execution time (ms): 357
Execution time (ms): 355

So obviously the matrix operation is not running in parallel, instead, every thread is running the same copy of the code.

What should be done to solve this problem? Do I have some misunderstanding about using OpenMPI?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

少女情怀诗 2025-01-26 20:22:45

您将OpenMPI与OpenMP混淆。

  • gcc flag -fopenmp启用 openmp 。这是一种通过使用Special #pragma OMP语句在代码中并行化应用程序的一种方法。并行化发生在单个cpu (或确切地说是计算节点,如果计算节点具有多个CPU)。这允许使用该CPU的所有核心。 OpenMP不能用于通过多个计算节点并行化应用程序。
  • 另一方面,

要使用MPI,您需要称呼“特殊”功能,并自己分发数据。如果您不这样做,请使用Mpirun调用一个应用程序,简单地创建了几个相同的过程(而不是线程!),以执行完全相同的计算。您尚未并行化应用程序,而只是执行了8次。

没有启用MPI的编译器标志。 MPI并未内置任何编译器。相反,MPI是一种标准,OpenMPI是实现该标准的特定库。您应该阅读有关MPI和OpenMPI的教程或书籍(Google出现例如,这个)。

注意:通常,MPI库,例如带有可执行文件/脚本的OpenMPI船(例如mpicc),其行为像编译器一样。但是它们只是围绕编译器的薄包装,例如gcc。这些包装器用于自动告诉实际的编译器,包括目录和库。但是同样,编译器本身并不了解MPI。

You are confusing OpenMPI with OpenMP.

  • The gcc flag -fopenmp enables OpenMP. It is one way to parallelize an application by using special #pragma omp statements in the code. The parallelization happens on a single CPU (or, to be precise, compute node, in case the compute node has multiple CPUs). This allows to employ all cores of that CPU. OpenMP cannot be used to parallelize an application over multiple compute nodes.
  • On the other hand, MPI (where OpenMPI is one particular implementation) can be used to parallelize a code over multiple compute nodes (i.e., roughly speaking, over multiple computers that are connected). It can also be used to parallelize some code over multiple cores on a single computer. So MPI is more general, but also much more difficult to use.

To use MPI, you need to call "special" functions and do the hard work of distributing data yourself. If you do not do this, calling an application with mpirun simply creates several identical processes (not threads!) that perform exactly the same computation. You have not parallelized your application, you just executed it 8 times.

There are no compiler flags that enable MPI. MPI is not built into any compiler. Rather, MPI is a standard and OpenMPI is one specific library that implements that standard. You should read a tutorial or book about MPI and OpenMPI (google turned up this one, for example).

Note: Usually, MPI libraries such as OpenMPI ship with executables/scripts (e.g. mpicc) that behave like compilers. But they are just thin wrappers around compilers such as gcc. These wrappers are used to automatically tell the actual compiler the include directories and libraries to link with. But again, the compilers themselves to not know anything about MPI.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文