使用 OpenMPI 并行运行 Eigen

发布于 2025-01-19 20:22:45 字数 2470 浏览 2 评论 0原文

我是 Eigen 的新手，正在编写一些简单的代码来测试其性能。我使用的是配备M1 Pro芯片的MacBook Pro（不知道是否是ARM架构导致问题）。代码是一个简单的拉普拉斯方程求解器

#include <iostream>
#include "mpi.h"
#include "Eigen/Dense"
#include <chrono>
 
using namespace Eigen;
using namespace std;

const size_t num = 1000UL;

MatrixXd initilize(){
    MatrixXd u = MatrixXd::Zero(num, num);
    u(seq(1, fix<num-2>), seq(1, fix<num-2>)).setConstant(10);
    return u;
}

void laplace(MatrixXd &u){
    setNbThreads(8);
    MatrixXd u_old = u;

    u(seq(1,last-1),seq(1,last-1)) =
    ((  u_old(seq(0,last-2,fix<1>),seq(1,last-1,fix<1>)) + u_old(seq(2,last,fix<1>),seq(1,last-1,fix<1>)) +
        u_old(seq(1,last-1,fix<1>),seq(0,last-2,fix<1>)) + u_old(seq(1,last-1,fix<1>),seq(2,last,fix<1>)) )*4.0 +
        u_old(seq(0,last-2,fix<1>),seq(0,last-2,fix<1>)) + u_old(seq(0,last-2,fix<1>),seq(2,last,fix<1>)) +
        u_old(seq(2,last,fix<1>),seq(0,last-2,fix<1>))   + u_old(seq(2,last,fix<1>),seq(2,last,fix<1>)) ) /20.0;
}


int main(int argc, const char * argv[]) {
    initParallel();
    setNbThreads(0);
    cout << nbThreads() << endl;
    MatrixXd u = initilize();
    
    auto start  = std::chrono::high_resolution_clock::now();
    
    for (auto i=0UL; i<100; i++) {
        laplace(u);
    }
    
    auto stop  = std::chrono::high_resolution_clock::now();
    auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(stop - start);
    
    // cout << u(seq(0, fix<10>), seq(0, fix<10>)) << endl;
    cout << "Execution time (ms): " << duration.count() << endl;
    return 0;
}

使用 gcc 编译并启用 OpenMPI

james@MBP14 tests % g++-11 -fopenmp  -O3 -I/usr/local/include -I/opt/homebrew/Cellar/open-mpi/4.1.3/include -o test4 test.cpp

直接运行二进制文件

james@MBP14 tests % ./test4
8
Execution time (ms): 273

使用 mpirun 运行并指定 8 个线程

james@MBP14 tests % mpirun -np 8 test4
8
8
8
8
8
8
8
8
Execution time (ms): 348
Execution time (ms): 347
Execution time (ms): 353
Execution time (ms): 356
Execution time (ms): 350
Execution time (ms): 353
Execution time (ms): 357
Execution time (ms): 355

所以显然矩阵运算不是并行运行的，而是每个线程都运行相同的代码副本。

应该怎么做才能解决这个问题？我对使用 OpenMPI 是否有一些误解？

原文

I am new to Eigen and is writing some simple code to test its performance. I am using a MacBook Pro with M1 Pro chip (I do not know whether the ARM architecture causes the problem). The code is a simple Laplace equation solver

#include <iostream>
#include "mpi.h"
#include "Eigen/Dense"
#include <chrono>
 
using namespace Eigen;
using namespace std;

const size_t num = 1000UL;

MatrixXd initilize(){
    MatrixXd u = MatrixXd::Zero(num, num);
    u(seq(1, fix<num-2>), seq(1, fix<num-2>)).setConstant(10);
    return u;
}

void laplace(MatrixXd &u){
    setNbThreads(8);
    MatrixXd u_old = u;

    u(seq(1,last-1),seq(1,last-1)) =
    ((  u_old(seq(0,last-2,fix<1>),seq(1,last-1,fix<1>)) + u_old(seq(2,last,fix<1>),seq(1,last-1,fix<1>)) +
        u_old(seq(1,last-1,fix<1>),seq(0,last-2,fix<1>)) + u_old(seq(1,last-1,fix<1>),seq(2,last,fix<1>)) )*4.0 +
        u_old(seq(0,last-2,fix<1>),seq(0,last-2,fix<1>)) + u_old(seq(0,last-2,fix<1>),seq(2,last,fix<1>)) +
        u_old(seq(2,last,fix<1>),seq(0,last-2,fix<1>))   + u_old(seq(2,last,fix<1>),seq(2,last,fix<1>)) ) /20.0;
}


int main(int argc, const char * argv[]) {
    initParallel();
    setNbThreads(0);
    cout << nbThreads() << endl;
    MatrixXd u = initilize();
    
    auto start  = std::chrono::high_resolution_clock::now();
    
    for (auto i=0UL; i<100; i++) {
        laplace(u);
    }
    
    auto stop  = std::chrono::high_resolution_clock::now();
    auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(stop - start);
    
    // cout << u(seq(0, fix<10>), seq(0, fix<10>)) << endl;
    cout << "Execution time (ms): " << duration.count() << endl;
    return 0;
}

Compile with gcc and enable OpenMPI

james@MBP14 tests % g++-11 -fopenmp  -O3 -I/usr/local/include -I/opt/homebrew/Cellar/open-mpi/4.1.3/include -o test4 test.cpp

Direct run the binary file

james@MBP14 tests % ./test4
8
Execution time (ms): 273

Run with mpirun and specify 8 threads

james@MBP14 tests % mpirun -np 8 test4
8
8
8
8
8
8
8
8
Execution time (ms): 348
Execution time (ms): 347
Execution time (ms): 353
Execution time (ms): 356
Execution time (ms): 350
Execution time (ms): 353
Execution time (ms): 357
Execution time (ms): 355

So obviously the matrix operation is not running in parallel, instead, every thread is running the same copy of the code.

What should be done to solve this problem? Do I have some misunderstanding about using OpenMPI?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

少女情怀诗 2025-01-26 20:22:45

您将OpenMPI与OpenMP混淆。

gcc flag -fopenmp启用 openmp 。这是一种通过使用Special #pragma OMP语句在代码中并行化应用程序的一种方法。并行化发生在单个cpu （或确切地说是计算节点，如果计算节点具有多个CPU）。这允许使用该CPU的所有核心。 OpenMP不能用于通过多个计算节点并行化应用程序。
另一方面，

要使用MPI，您需要称呼“特殊”功能，并自己分发数据。如果您不这样做，请使用Mpirun调用一个应用程序，简单地创建了几个相同的过程（而不是线程！），以执行完全相同的计算。您尚未并行化应用程序，而只是执行了8次。

没有启用MPI的编译器标志。 MPI并未内置任何编译器。相反，MPI是一种标准，OpenMPI是实现该标准的特定库。您应该阅读有关MPI和OpenMPI的教程或书籍（Google出现例如，这个）。

注意：通常，MPI库，例如带有可执行文件/脚本的OpenMPI船（例如mpicc），其行为像编译器一样。但是它们只是围绕编译器的薄包装，例如gcc。这些包装器用于自动告诉实际的编译器，包括目录和库。但是同样，编译器本身并不了解MPI。