在同一程序中使用OpenMP和MPI

发布于 2025-01-24 02:39:39 字数 2746 浏览 3 评论 0原文

我目前正在处理并行编程类的分配，其中我需要顺序编写相同的程序，然后使用OpenMP并行化，然后使用MPI并行化。

对于上下文，分配是关于在随机字符的矩阵中搜索palindromes。我已经有大多数代码工作，我的问题是关于如何构建，编译和运行项目。

我可以创建三个单独的程序并独立运行它们，但是我想将它们全部结合在同一项目中，因此三个版本接一个地运行，并在同一初始矩阵上运行。这使我可以计时每个版本比较它们。

我正在使用CMAKE作为构建工具。

我的cmakelists.txt：

cmake_minimum_required(VERSION 3.21)
project(<project-name> C)

set(CMAKE_C_STANDARD 23)

find_package(MPI REQUIRED)
include_directories(SYSTEM ${MPI_INCLUDE_PATH})

find_package(OpenMP REQUIRED)
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${OpenMP_C_FLAGS}")
set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${OpenMP_EXE_LINKER_FLAGS}")

add_executable(<project-name> <source-files>)
target_link_libraries(<project-name> ${MPI_C_LIBRARIES})

我使用以下命令我的主要功能构建项目

mkdir build && cd build && cmake .. && make

：

// All the header inclusions

int main(int argc, char **argv) {

    // Initialisation.
    srand(time(NULL));
    omp_set_num_threads(omp_get_num_procs());
    double start_time;
    ushort number_of_palindromes = 0;
    ushort palindrome_length = 5;
    ushort rows = 25000;
    ushort cols = 25000;
    char **matrix = create_matrix_of_chars(rows, cols);
    printf("Matrix of size %dx%d, searching for palindromes of size %d.\n", rows, cols, palindrome_length);

    // Run sequentially.
    printf("%-45s", "Running sequentially ... ");
    start_time = omp_get_wtime();
    number_of_palindromes = find_palindromes_sequentially(matrix, rows, cols, palindrome_length);
    printf("Found %4d palindromes in %7.4f seconds.\n", number_of_palindromes, omp_get_wtime() - start_time);

    // Run using OpenMP.
    printf("Running with OpenMP on %d %-20s", omp_get_num_procs(), "threads ... ");
    start_time = omp_get_wtime();
    number_of_palindromes = find_palindromes_using_openmp(matrix, rows, cols, palindrome_length);
    printf("Found %4d palindromes in %7.4f seconds.\n", number_of_palindromes, omp_get_wtime() - start_time);

    // Run using MPI.
    int num_procs, rank;
    MPI_Init(&argc, &argv);
    MPI_Comm_size(MPI_COMM_WORLD, &num_procs);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    printf("%d: hello (p=%d)\n", rank, num_procs);
    MPI_Finalize();

    // Cleanup and exit.
    free_matrix(matrix, rows);
    return 0;

}

运行./& lt; project-name＆gt;顺序和OpenMP版本正确地运行一个。但是，运行mpirun -us-use-hwthread-cpus ./& lt; project-name＆gt;该程序启动了整个项目的8个实例（“大小矩阵...”行打印了8时代）。

我的理解是，MPI区域是由mpi_init（...）和mpi_finalize（） 。我该如何解决这个问题？

事先感谢您的回答。

原文

I am currently working on an assignment for my parallel programming class in which I need to write the same program sequentially, then parallelized using OpenMP then parallelized using MPI.

For context, the assignment is about searching for palindromes in a matrix of random characters. I already have most of the code working, my question is about how to structure, compile and run the project.

I could create three separate programs and run them independently, but I would like to combine them all in the same project so the three versions run one after the other and on the same initial matrix. This allows me to time each version to compare them.

I am using CMake as the build tool.

My CMakeLists.txt :

cmake_minimum_required(VERSION 3.21)
project(<project-name> C)

set(CMAKE_C_STANDARD 23)

find_package(MPI REQUIRED)
include_directories(SYSTEM ${MPI_INCLUDE_PATH})

find_package(OpenMP REQUIRED)
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${OpenMP_C_FLAGS}")
set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${OpenMP_EXE_LINKER_FLAGS}")

add_executable(<project-name> <source-files>)
target_link_libraries(<project-name> ${MPI_C_LIBRARIES})

I build the project using the following commands

mkdir build && cd build && cmake .. && make

My main function :

// All the header inclusions

int main(int argc, char **argv) {

    // Initialisation.
    srand(time(NULL));
    omp_set_num_threads(omp_get_num_procs());
    double start_time;
    ushort number_of_palindromes = 0;
    ushort palindrome_length = 5;
    ushort rows = 25000;
    ushort cols = 25000;
    char **matrix = create_matrix_of_chars(rows, cols);
    printf("Matrix of size %dx%d, searching for palindromes of size %d.\n", rows, cols, palindrome_length);

    // Run sequentially.
    printf("%-45s", "Running sequentially ... ");
    start_time = omp_get_wtime();
    number_of_palindromes = find_palindromes_sequentially(matrix, rows, cols, palindrome_length);
    printf("Found %4d palindromes in %7.4f seconds.\n", number_of_palindromes, omp_get_wtime() - start_time);

    // Run using OpenMP.
    printf("Running with OpenMP on %d %-20s", omp_get_num_procs(), "threads ... ");
    start_time = omp_get_wtime();
    number_of_palindromes = find_palindromes_using_openmp(matrix, rows, cols, palindrome_length);
    printf("Found %4d palindromes in %7.4f seconds.\n", number_of_palindromes, omp_get_wtime() - start_time);

    // Run using MPI.
    int num_procs, rank;
    MPI_Init(&argc, &argv);
    MPI_Comm_size(MPI_COMM_WORLD, &num_procs);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    printf("%d: hello (p=%d)\n", rank, num_procs);
    MPI_Finalize();

    // Cleanup and exit.
    free_matrix(matrix, rows);
    return 0;

}

When running ./<project-name> the sequential and OpenMP versions run one after the other correctly. However, when running mpirun --use-hwthread-cpus ./<project-name> the program starts 8 instances of the entire project (the line "Matrix of size ..." gets printed 8 times).

My understanding was that the MPI region is delimited by MPI_Init(...) and MPI_Finalize() but that does not seem to be the case. How would I go about solving this ?

Thanking you in advance for your answers.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

旧话新听 2025-01-31 02:39:39

没有“ MPI区域”之类的东西。 MPI使用仅通过网络通信/同步的独立过程。含义：您的整个可执行文件在您启动的许多实例中运行。每个语句甚至在mpi_init之前都由每个实例执行。

回复收藏 0 原文

仅此而已 2025-01-31 02:39:39

我需要顺序编写相同的程序，然后使用OpenMP并行化，然后使用MPI并行化。

这可能会有几个含义：

您正在为这三个版本一个一个一个版本创建三个单独的程序。（我知道这是您要避免的，但是在上阅读，因为这是您实际想要的，如果这三种方法之间的基于时间的比较是您追求的）
您首先使用OpenMP首先创建顺序程序的并行版本，然后使用MPI，反之亦然，但都可以分开。第一种方法之间的区别是，如果您想在程序中进行“顺序与并行”比较，则可以在这些并行版本中分别具有顺序代码，因为这不会太不准确。例如，您可以在mpi_init之前运行序列化代码（例如，您可以使用MPI在程序中运行顺序版本，但这将导致您并行化的时间不准确代码和与等级0的重负载不平衡为您的版本使用MPI做更多的工作，并且对于代码的一个单独实例，没有OpenMP指令，该实例可以使用OpenMP在程序中进行并行化的任何操作。
您正在创建一个具有多个MPI等级和OpenMP线程的程序，即它使用混合设置，该设置更复杂，需要适当的设置。如果您也想将顺序版本也放在此处，请使用我上面提到的内容，但是作为两者的组合，即在初始化MPI环境之前，不使用OpenMP指令和条款。

根据您要确切的目标，您需要考虑和调整代码（第三种情况下还需要增加配置，这将很快出现）。

我可以创建三个单独的程序并独立运行它们，但是我想将它们全部组合在同一项目中，因此三个版本一个接一个地运行，并以同一初始矩阵运行。这使我可以计时每个版本比较它们。

坏主意。多个MPI流程将运行您的程序，并且您将无法获得准确的时机（即，即使您可以使用代码与设置的代码一起使用）进行线程的工作份额。（我假设您打算将共享内存并行性的组成部分计时，即线程而不是过程，因为您使用op_get_wtime（）>）

以达到最接近的<< em>实际每种方法的时间安排，理想情况下，您希望将三个程序分开并分别为自己，要么使用某些内部功能调用，例如mpi_wtime（）代码> MPI等级的OMP_GET_WTIME（）（可能想分别在此减少mpi_max）和OpenMP线程，或或或使用一些外部可以测量时间的工具，例如 perf（如果您想计时整个程序，而不是代码中的特定部分，则可以将其纳入二进制执行语句中。

但是，如果您真的都想使用两者并进行混合设置，那么您必须将线程支持级别作为呼叫中的添加参数提供给mpi_init（），指定您所需的方式来进行多线程（三个选项）。有关这些级别的更多信息，请检查文档对于您想要的选项。（对于您的情况，我建议使用汇合的配置）

您还可以以巧妙的方式编写程序，该程序通过将每种方法的每个方法都包含在#if parametername ==＆lt; value＆gt;中来包含三种方法。 ... #endif block，以便根据您指定并可以设置的参数，一次仅运行一种方法（或两种方法（或两种）（或两种）当您编译时（添加-d＆lt; parametername＆gt; =＆lt; value＆gt;）时。

I need to write the same program sequentially, then parallelized using OpenMP then parallelized using MPI.

This could have several implications:

You're creating three separate programs for those three versions one by one. (I know this is what you're trying to avoid, but read on, as this is what you would actually want if a time-based comparison in between these three approaches is what you're after)
You're creating a parallelized version of the sequential program first by using OpenMP, and next by using MPI, or vice versa, but both separate. Difference between the first approach and this is that you can have the sequential code separately within these parallelized versions if you would want to go for 'sequential vs parallel' comparisons inside the program, as that wouldn't be too inaccurate. For instance, you can have the serialized code running before MPI_Init (you can for example, make the process with rank 0 run the sequential version in your program using MPI but this will lead to inaccurate timings for your parallelized code, and heavy load imbalance with rank 0 doing much more work) for your version with MPI, and have no OpenMP directives for one separate instance of the code which does whatever you're trying to parallelize in your program using OpenMP.
You're creating a program with both multiple MPI ranks and OpenMP threads, i.e. it uses a hybrid setup, which is more complicated and needs a proper setup. If you want to throw the sequential version in here as well, go with what I mentioned above, but as a combination of both, i.e. prior to initialization of the MPI environment and with no use of OpenMP directives and clauses.

Based on what you're exactly trying to achieve, you will need to think and tweak your code (with an added configuration for the third case, which I'll come to soon).

I could create three separate programs and run them independently, but I would like to combine them all in the same project so the three versions run one after the other and on the same initial matrix. This allows me to time each version to compare them.

Bad idea. Multiple MPI processes will be running your program, and you wouldn't get the accurate timing (i.e., assuming even if you get your code to work with the setup you got) for a thread's share of work. (I'm assuming you intend to time the components of shared-memory parallelism, i.e. the threads and not the processes, given that you're using omp_get_wtime())

To get to the closest of the actual timings for each approach, you would ideally want to have the three programs separate and time them each to their own, either using some internal function calls such as MPI_Wtime() and omp_get_wtime() for MPI ranks (might want to do a reduction such as MPI_MAX on top of that) and OpenMP threads respectively, or use some external tool that can measure the time, such as perf (ideal if you want to time the entire program instead of a particular section in your code) which you can incorporate in your binary execution statement in your makefile.

But if you really want to use both and go for the hybrid setup, then you would have to supply the level of thread support as an added argument in your call to MPI_Init(), specifying your desired way to go about multi-threading (three options for this) with processes. For more information on these levels, check the documentation for the option you would want to go with. (for your case, I would recommend the funneled configuration)

You could also write your program in a clever way that incorporates all of the three approaches by emplacing each within a #if parameterName==<value> ... #endif block, so as to run only one approach (or maybe two, depending on how you go about it) at a time, based on a parameter which you specify and can set (have different values for the different blocks) when you compile (adding -D<parameterName>=<value> to your other compilation flags).

回复收藏 0 原文

~没有更多了~