在同一程序中使用OpenMP和MPI
我目前正在处理并行编程类的分配,其中我需要顺序编写相同的程序,然后使用OpenMP并行化,然后使用MPI并行化。
对于上下文,分配是关于在随机字符的矩阵中搜索palindromes。我已经有大多数代码工作,我的问题是关于如何构建,编译和运行项目。
我可以创建三个单独的程序并独立运行它们,但是我想将它们全部结合在同一项目中,因此三个版本接一个地运行,并在同一初始矩阵上运行。这使我可以计时每个版本比较它们。
我正在使用CMAKE作为构建工具。
我的cmakelists.txt:
cmake_minimum_required(VERSION 3.21)
project(<project-name> C)
set(CMAKE_C_STANDARD 23)
find_package(MPI REQUIRED)
include_directories(SYSTEM ${MPI_INCLUDE_PATH})
find_package(OpenMP REQUIRED)
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${OpenMP_C_FLAGS}")
set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${OpenMP_EXE_LINKER_FLAGS}")
add_executable(<project-name> <source-files>)
target_link_libraries(<project-name> ${MPI_C_LIBRARIES})
我使用以下命令我的主要功能构建项目
mkdir build && cd build && cmake .. && make
:
// All the header inclusions
int main(int argc, char **argv) {
// Initialisation.
srand(time(NULL));
omp_set_num_threads(omp_get_num_procs());
double start_time;
ushort number_of_palindromes = 0;
ushort palindrome_length = 5;
ushort rows = 25000;
ushort cols = 25000;
char **matrix = create_matrix_of_chars(rows, cols);
printf("Matrix of size %dx%d, searching for palindromes of size %d.\n", rows, cols, palindrome_length);
// Run sequentially.
printf("%-45s", "Running sequentially ... ");
start_time = omp_get_wtime();
number_of_palindromes = find_palindromes_sequentially(matrix, rows, cols, palindrome_length);
printf("Found %4d palindromes in %7.4f seconds.\n", number_of_palindromes, omp_get_wtime() - start_time);
// Run using OpenMP.
printf("Running with OpenMP on %d %-20s", omp_get_num_procs(), "threads ... ");
start_time = omp_get_wtime();
number_of_palindromes = find_palindromes_using_openmp(matrix, rows, cols, palindrome_length);
printf("Found %4d palindromes in %7.4f seconds.\n", number_of_palindromes, omp_get_wtime() - start_time);
// Run using MPI.
int num_procs, rank;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &num_procs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
printf("%d: hello (p=%d)\n", rank, num_procs);
MPI_Finalize();
// Cleanup and exit.
free_matrix(matrix, rows);
return 0;
}
运行./& lt; project-name&gt;
顺序和OpenMP版本正确地运行一个。但是,运行mpirun -us-use-hwthread-cpus ./& lt; project-name&gt;
该程序启动了整个项目的8个实例(“大小矩阵...”行打印了8时代)。
我的理解是,MPI区域是由mpi_init(...)
和mpi_finalize()
。我该如何解决这个问题?
事先感谢您的回答。
I am currently working on an assignment for my parallel programming class in which I need to write the same program sequentially, then parallelized using OpenMP then parallelized using MPI.
For context, the assignment is about searching for palindromes in a matrix of random characters. I already have most of the code working, my question is about how to structure, compile and run the project.
I could create three separate programs and run them independently, but I would like to combine them all in the same project so the three versions run one after the other and on the same initial matrix. This allows me to time each version to compare them.
I am using CMake as the build tool.
My CMakeLists.txt :
cmake_minimum_required(VERSION 3.21)
project(<project-name> C)
set(CMAKE_C_STANDARD 23)
find_package(MPI REQUIRED)
include_directories(SYSTEM ${MPI_INCLUDE_PATH})
find_package(OpenMP REQUIRED)
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${OpenMP_C_FLAGS}")
set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${OpenMP_EXE_LINKER_FLAGS}")
add_executable(<project-name> <source-files>)
target_link_libraries(<project-name> ${MPI_C_LIBRARIES})
I build the project using the following commands
mkdir build && cd build && cmake .. && make
My main function :
// All the header inclusions
int main(int argc, char **argv) {
// Initialisation.
srand(time(NULL));
omp_set_num_threads(omp_get_num_procs());
double start_time;
ushort number_of_palindromes = 0;
ushort palindrome_length = 5;
ushort rows = 25000;
ushort cols = 25000;
char **matrix = create_matrix_of_chars(rows, cols);
printf("Matrix of size %dx%d, searching for palindromes of size %d.\n", rows, cols, palindrome_length);
// Run sequentially.
printf("%-45s", "Running sequentially ... ");
start_time = omp_get_wtime();
number_of_palindromes = find_palindromes_sequentially(matrix, rows, cols, palindrome_length);
printf("Found %4d palindromes in %7.4f seconds.\n", number_of_palindromes, omp_get_wtime() - start_time);
// Run using OpenMP.
printf("Running with OpenMP on %d %-20s", omp_get_num_procs(), "threads ... ");
start_time = omp_get_wtime();
number_of_palindromes = find_palindromes_using_openmp(matrix, rows, cols, palindrome_length);
printf("Found %4d palindromes in %7.4f seconds.\n", number_of_palindromes, omp_get_wtime() - start_time);
// Run using MPI.
int num_procs, rank;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &num_procs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
printf("%d: hello (p=%d)\n", rank, num_procs);
MPI_Finalize();
// Cleanup and exit.
free_matrix(matrix, rows);
return 0;
}
When running ./<project-name>
the sequential and OpenMP versions run one after the other correctly. However, when running mpirun --use-hwthread-cpus ./<project-name>
the program starts 8 instances of the entire project (the line "Matrix of size ..." gets printed 8 times).
My understanding was that the MPI region is delimited by MPI_Init(...)
and MPI_Finalize()
but that does not seem to be the case. How would I go about solving this ?
Thanking you in advance for your answers.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
没有“ MPI区域”之类的东西。 MPI使用仅通过网络通信/同步的独立过程。含义:您的整个可执行文件在您启动的许多实例中运行。每个语句甚至在
mpi_init
之前都由每个实例执行。There is no such thing as an "MPI region". MPI uses independent process that only communicate/synchronize through the network. Meaning: the whole of your executable run in as many instances as you start it. Each and every statement, even before
MPI_Init
is executed by each and every instance.这可能会有几个含义:
mpi_init
之前运行序列化代码(例如,您可以使用MPI在程序中运行顺序版本,但这将导致您并行化的时间不准确代码和与等级0的重负载不平衡为您的版本使用MPI做更多的工作,并且对于代码的一个单独实例,没有OpenMP指令,该实例可以使用OpenMP在程序中进行并行化的任何操作。根据您要确切的目标,您需要考虑和调整代码(第三种情况下还需要增加配置,这将很快出现)。
坏主意。多个MPI流程将运行您的程序,并且您将无法获得准确的时机(即,即使您可以使用代码与设置的代码一起使用)进行线程的工作份额。 (我假设您打算将共享内存并行性的组成部分计时,即线程而不是过程,因为您使用
op_get_wtime()
>)以达到最接近的<< em>实际每种方法的时间安排,理想情况下,您希望将三个程序分开并分别为自己,要么使用某些内部功能调用,例如
mpi_wtime()代码> MPI等级的OMP_GET_WTIME()
(可能想分别在此减少mpi_max
)和OpenMP线程,或或或使用一些外部可以测量时间的工具,例如perf
(如果您想计时整个程序,而不是代码中的特定部分,则可以将其纳入二进制执行语句中。
但是,如果您真的都想使用两者并进行混合设置,那么您必须将线程支持级别作为呼叫中的添加参数提供给
mpi_init(),指定您所需的方式来进行多线程(三个选项)。有关这些级别的更多信息,请检查文档对于您想要的选项。 (对于您的情况,我建议使用汇合的配置)
您还可以以巧妙的方式编写程序,该程序通过将每种方法的每个方法都包含在
#if parametername ==&lt; value&gt;中来包含三种方法。 ... #endif
block,以便根据您指定并可以设置的参数,一次仅运行一种方法(或两种方法(或两种)(或两种)当您编译时(添加-d&lt; parametername&gt; =&lt; value&gt;
)时。This could have several implications:
MPI_Init
(you can for example, make the process with rank 0 run the sequential version in your program using MPI but this will lead to inaccurate timings for your parallelized code, and heavy load imbalance with rank 0 doing much more work) for your version with MPI, and have no OpenMP directives for one separate instance of the code which does whatever you're trying to parallelize in your program using OpenMP.Based on what you're exactly trying to achieve, you will need to think and tweak your code (with an added configuration for the third case, which I'll come to soon).
Bad idea. Multiple MPI processes will be running your program, and you wouldn't get the accurate timing (i.e., assuming even if you get your code to work with the setup you got) for a thread's share of work. (I'm assuming you intend to time the components of shared-memory parallelism, i.e. the threads and not the processes, given that you're using
omp_get_wtime()
)To get to the closest of the actual timings for each approach, you would ideally want to have the three programs separate and time them each to their own, either using some internal function calls such as
MPI_Wtime()
andomp_get_wtime()
for MPI ranks (might want to do a reduction such asMPI_MAX
on top of that) and OpenMP threads respectively, or use some external tool that can measure the time, such asperf
(ideal if you want to time the entire program instead of a particular section in your code) which you can incorporate in your binary execution statement in your makefile.But if you really want to use both and go for the hybrid setup, then you would have to supply the level of thread support as an added argument in your call to
MPI_Init()
, specifying your desired way to go about multi-threading (three options for this) with processes. For more information on these levels, check the documentation for the option you would want to go with. (for your case, I would recommend the funneled configuration)You could also write your program in a clever way that incorporates all of the three approaches by emplacing each within a
#if parameterName==<value> ... #endif
block, so as to run only one approach (or maybe two, depending on how you go about it) at a time, based on a parameter which you specify and can set (have different values for the different blocks) when you compile (adding-D<parameterName>=<value>
to your other compilation flags).