使用 OpenMPI 并行运行 Eigen
我是 Eigen 的新手,正在编写一些简单的代码来测试其性能。我使用的是配备M1 Pro芯片的MacBook Pro(不知道是否是ARM架构导致问题)。代码是一个简单的拉普拉斯方程求解器
#include <iostream>
#include "mpi.h"
#include "Eigen/Dense"
#include <chrono>
using namespace Eigen;
using namespace std;
const size_t num = 1000UL;
MatrixXd initilize(){
MatrixXd u = MatrixXd::Zero(num, num);
u(seq(1, fix<num-2>), seq(1, fix<num-2>)).setConstant(10);
return u;
}
void laplace(MatrixXd &u){
setNbThreads(8);
MatrixXd u_old = u;
u(seq(1,last-1),seq(1,last-1)) =
(( u_old(seq(0,last-2,fix<1>),seq(1,last-1,fix<1>)) + u_old(seq(2,last,fix<1>),seq(1,last-1,fix<1>)) +
u_old(seq(1,last-1,fix<1>),seq(0,last-2,fix<1>)) + u_old(seq(1,last-1,fix<1>),seq(2,last,fix<1>)) )*4.0 +
u_old(seq(0,last-2,fix<1>),seq(0,last-2,fix<1>)) + u_old(seq(0,last-2,fix<1>),seq(2,last,fix<1>)) +
u_old(seq(2,last,fix<1>),seq(0,last-2,fix<1>)) + u_old(seq(2,last,fix<1>),seq(2,last,fix<1>)) ) /20.0;
}
int main(int argc, const char * argv[]) {
initParallel();
setNbThreads(0);
cout << nbThreads() << endl;
MatrixXd u = initilize();
auto start = std::chrono::high_resolution_clock::now();
for (auto i=0UL; i<100; i++) {
laplace(u);
}
auto stop = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(stop - start);
// cout << u(seq(0, fix<10>), seq(0, fix<10>)) << endl;
cout << "Execution time (ms): " << duration.count() << endl;
return 0;
}
使用 gcc
编译并启用 OpenMPI
james@MBP14 tests % g++-11 -fopenmp -O3 -I/usr/local/include -I/opt/homebrew/Cellar/open-mpi/4.1.3/include -o test4 test.cpp
直接运行二进制文件
james@MBP14 tests % ./test4
8
Execution time (ms): 273
使用 mpirun
运行并指定 8 个线程
james@MBP14 tests % mpirun -np 8 test4
8
8
8
8
8
8
8
8
Execution time (ms): 348
Execution time (ms): 347
Execution time (ms): 353
Execution time (ms): 356
Execution time (ms): 350
Execution time (ms): 353
Execution time (ms): 357
Execution time (ms): 355
所以显然矩阵运算不是并行运行的,而是每个线程都运行相同的代码副本。
应该怎么做才能解决这个问题?我对使用 OpenMPI 是否有一些误解?
I am new to Eigen and is writing some simple code to test its performance. I am using a MacBook Pro with M1 Pro chip (I do not know whether the ARM architecture causes the problem). The code is a simple Laplace equation solver
#include <iostream>
#include "mpi.h"
#include "Eigen/Dense"
#include <chrono>
using namespace Eigen;
using namespace std;
const size_t num = 1000UL;
MatrixXd initilize(){
MatrixXd u = MatrixXd::Zero(num, num);
u(seq(1, fix<num-2>), seq(1, fix<num-2>)).setConstant(10);
return u;
}
void laplace(MatrixXd &u){
setNbThreads(8);
MatrixXd u_old = u;
u(seq(1,last-1),seq(1,last-1)) =
(( u_old(seq(0,last-2,fix<1>),seq(1,last-1,fix<1>)) + u_old(seq(2,last,fix<1>),seq(1,last-1,fix<1>)) +
u_old(seq(1,last-1,fix<1>),seq(0,last-2,fix<1>)) + u_old(seq(1,last-1,fix<1>),seq(2,last,fix<1>)) )*4.0 +
u_old(seq(0,last-2,fix<1>),seq(0,last-2,fix<1>)) + u_old(seq(0,last-2,fix<1>),seq(2,last,fix<1>)) +
u_old(seq(2,last,fix<1>),seq(0,last-2,fix<1>)) + u_old(seq(2,last,fix<1>),seq(2,last,fix<1>)) ) /20.0;
}
int main(int argc, const char * argv[]) {
initParallel();
setNbThreads(0);
cout << nbThreads() << endl;
MatrixXd u = initilize();
auto start = std::chrono::high_resolution_clock::now();
for (auto i=0UL; i<100; i++) {
laplace(u);
}
auto stop = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(stop - start);
// cout << u(seq(0, fix<10>), seq(0, fix<10>)) << endl;
cout << "Execution time (ms): " << duration.count() << endl;
return 0;
}
Compile with gcc
and enable OpenMPI
james@MBP14 tests % g++-11 -fopenmp -O3 -I/usr/local/include -I/opt/homebrew/Cellar/open-mpi/4.1.3/include -o test4 test.cpp
Direct run the binary file
james@MBP14 tests % ./test4
8
Execution time (ms): 273
Run with mpirun
and specify 8 threads
james@MBP14 tests % mpirun -np 8 test4
8
8
8
8
8
8
8
8
Execution time (ms): 348
Execution time (ms): 347
Execution time (ms): 353
Execution time (ms): 356
Execution time (ms): 350
Execution time (ms): 353
Execution time (ms): 357
Execution time (ms): 355
So obviously the matrix operation is not running in parallel, instead, every thread is running the same copy of the code.
What should be done to solve this problem? Do I have some misunderstanding about using OpenMPI
?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您将OpenMPI与OpenMP混淆。
-fopenmp
启用 openmp 。这是一种通过使用Special#pragma OMP
语句在代码中并行化应用程序的一种方法。并行化发生在单个cpu (或确切地说是计算节点,如果计算节点具有多个CPU)。这允许使用该CPU的所有核心。 OpenMP不能用于通过多个计算节点并行化应用程序。要使用MPI,您需要称呼“特殊”功能,并自己分发数据。如果您不这样做,请使用
Mpirun
调用一个应用程序,简单地创建了几个相同的过程(而不是线程!),以执行完全相同的计算。您尚未并行化应用程序,而只是执行了8次。没有启用MPI的编译器标志。 MPI并未内置任何编译器。相反,MPI是一种标准,OpenMPI是实现该标准的特定库。您应该阅读有关MPI和OpenMPI的教程或书籍(Google出现例如,这个)。
注意:通常,MPI库,例如带有可执行文件/脚本的OpenMPI船(例如
mpicc
),其行为像编译器一样。但是它们只是围绕编译器的薄包装,例如gcc
。这些包装器用于自动告诉实际的编译器,包括目录和库。但是同样,编译器本身并不了解MPI。You are confusing OpenMPI with OpenMP.
-fopenmp
enables OpenMP. It is one way to parallelize an application by using special#pragma omp
statements in the code. The parallelization happens on a single CPU (or, to be precise, compute node, in case the compute node has multiple CPUs). This allows to employ all cores of that CPU. OpenMP cannot be used to parallelize an application over multiple compute nodes.To use MPI, you need to call "special" functions and do the hard work of distributing data yourself. If you do not do this, calling an application with
mpirun
simply creates several identical processes (not threads!) that perform exactly the same computation. You have not parallelized your application, you just executed it 8 times.There are no compiler flags that enable MPI. MPI is not built into any compiler. Rather, MPI is a standard and OpenMPI is one specific library that implements that standard. You should read a tutorial or book about MPI and OpenMPI (google turned up this one, for example).
Note: Usually, MPI libraries such as OpenMPI ship with executables/scripts (e.g.
mpicc
) that behave like compilers. But they are just thin wrappers around compilers such asgcc
. These wrappers are used to automatically tell the actual compiler the include directories and libraries to link with. But again, the compilers themselves to not know anything about MPI.