OpenMPI MPI_Barrier 问题

发布于 2024-10-19 20:56:09 字数 559 浏览 2 评论 0原文

我在使用 MPI_Barrier 的 OpenMPI 实现时遇到一些同步问题:

int rank;
int nprocs;

int rc = MPI_Init(&argc, &argv);

if(rc != MPI_SUCCESS) {
    fprintf(stderr, "Unable to set up MPI");
    MPI_Abort(MPI_COMM_WORLD, rc);
}

MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);


printf("P%d\n", rank);
fflush(stdout);

MPI_Barrier(MPI_COMM_WORLD);

printf("P%d again\n", rank);

MPI_Finalize();

对于 mpirun -n 2 ./a.out

输出应该是: P0 P1 ...

输出有时: P0 再次P0 P1 又P1了,

怎么回事?

I having some synchronization issues using the OpenMPI implementation of MPI_Barrier:

int rank;
int nprocs;

int rc = MPI_Init(&argc, &argv);

if(rc != MPI_SUCCESS) {
    fprintf(stderr, "Unable to set up MPI");
    MPI_Abort(MPI_COMM_WORLD, rc);
}

MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);


printf("P%d\n", rank);
fflush(stdout);

MPI_Barrier(MPI_COMM_WORLD);

printf("P%d again\n", rank);

MPI_Finalize();

for mpirun -n 2 ./a.out

output should be:
P0
P1
...

output is sometimes:
P0
P0 again
P1
P1 again

what's going on?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

许仙没带伞 2024-10-26 20:56:09

打印输出行在终端上出现的顺序不一定是打印内容的顺序。您为此使用共享资源(stdout),因此总是存在排序问题。 (并且 fflush 在这里没有帮助,stdout 无论如何都是行缓冲的。)

您可以尝试在输出中添加时间戳前缀并将所有这些保存到不同的文件中,一个每个 MPI 进程。

然后,要检查日志,您可以将两个文件合并在一起并根据时间戳进行排序。

那么你的问题应该消失了。

The order in which your print out lines appear on your terminal is not necessarily the order in which things are printed. You are using a shared resource (stdout) for that so there always must be an ordering problem. (And fflush doesn't help here, stdout is line buffered anyhow.)

You could try to prefix your output with a timestamp and save all of this to different files, one per MPI process.

Then to inspect your log you could merge the two files together and sort according to the timestamp.

Your problem should disappear, then.

云淡风轻 2024-10-26 20:56:09

MPI_Barrier() 没有任何问题。

正如 Jens 提到的,您没有看到预期输出的原因是因为标准输出在每个进程上进行缓冲。不保证来自多个进程的打印将按顺序显示在调用进程上。 (如果将每个进程的标准输出传输到主进程进行实时打印,这将导致大量不必要的通信!)

如果您想说服自己屏障有效,您可以尝试写入文件。让多个进程写入单个文件可能会导致额外的复杂性,因此您可以让每个进程写入一个文件,然后在屏障之后交换它们写入的文件。例如:

    Proc-0           Proc-1
      |                 |
 f0.write(..)     f1.write(...) 
      |                 |
      x  ~~ barrier ~~  x
      |                 |
 f1.write(..)     f0.write(...) 
      |                 |
     END               END

示例实现:

#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv) {
    char filename[20];
    int rank, size;
    FILE *fp;

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);

    if (rank < 2) { /* proc 0 and 1 only */ 
        sprintf(filename, "file_%d.out", rank);
        fp = fopen(filename, "w");
        fprintf(fp, "P%d: before Barrier\n", rank);
        fclose(fp);
    }

    MPI_Barrier(MPI_COMM_WORLD);

    if (rank < 2) { /* proc 0 and 1 only */ 
        sprintf(filename, "file_%d.out", (rank==0)?1:0 );
        fp = fopen(filename, "a");
        fprintf(fp, "P%d: after Barrier\n", rank);
        fclose(fp);
    }

    MPI_Finalize();
    return 0;

}

运行代码后,您应该得到以下结果:

[me@home]$ cat file_0.out
P0: before Barrier
P1: after Barrier

[me@home]$ cat file_1.out
P1: before Barrier
P0: after Barrier

对于所有文件,“after Barrier”语句始终会稍后出现。

There is nothing wrong with MPI_Barrier().

As Jens mentioned, the reason why you are not seeing the output you expected is because stdout is buffered on each processes. There is no guarantee that prints from multiple processes will be displayed on the calling process in order. (If stdout from each process is be transferred to the main process for printing in real time, that will lead to lots of unnecessary communication!)

If you want to convince yourself that the barrier works, you could try writing to a file instead. Having multiple processes writing to a single file may lead to extra complications, so you could have each proc writing to one file, then after the barrier, swap the files they write to. For example:

    Proc-0           Proc-1
      |                 |
 f0.write(..)     f1.write(...) 
      |                 |
      x  ~~ barrier ~~  x
      |                 |
 f1.write(..)     f0.write(...) 
      |                 |
     END               END

Sample implementation:

#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv) {
    char filename[20];
    int rank, size;
    FILE *fp;

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);

    if (rank < 2) { /* proc 0 and 1 only */ 
        sprintf(filename, "file_%d.out", rank);
        fp = fopen(filename, "w");
        fprintf(fp, "P%d: before Barrier\n", rank);
        fclose(fp);
    }

    MPI_Barrier(MPI_COMM_WORLD);

    if (rank < 2) { /* proc 0 and 1 only */ 
        sprintf(filename, "file_%d.out", (rank==0)?1:0 );
        fp = fopen(filename, "a");
        fprintf(fp, "P%d: after Barrier\n", rank);
        fclose(fp);
    }

    MPI_Finalize();
    return 0;

}

After running the code, you should get the following results:

[me@home]$ cat file_0.out
P0: before Barrier
P1: after Barrier

[me@home]$ cat file_1.out
P1: before Barrier
P0: after Barrier

For all files, the "after Barrier" statements will always appear later.

雪若未夕 2024-10-26 20:56:09

MPI 程序中不保证输出顺序。

这与 MPI_Barrier 完全无关。

另外,我不会花太多时间担心 MPI 程序的输出排序。

如果您确实想要的话,实现此目的的最优雅的方法是让进程将其消息发送到一个等级,例如等级 0,并让等级 0 按照接收消息的顺序或按等级排序打印输出。

再次强调,不要花太多时间尝试对 MPI 程序的输出进行排序。不实用,用处不大。

Output ordering is not guaranteed in MPI programs.

This is not related to MPI_Barrier at all.

Also, I would not spend too much time on worrying about output ordering with MPI programs.

The most elegant way to achieve this, if you really want to, is to let the processes send their messages to one rank, say, rank 0, and let rank 0 print the output in the order it received them or ordered by ranks.

Again, dont spend too much time on trying to order the output from MPI programs. It is not practical and is of little use.

栀子花开つ 2024-10-26 20:56:09

添加到此处的先前答案,您的 MPI_BARRIER 工作正常。

不过,如果您只是想看看它的工作原理,您可以强制暂停执行 (SLEEP(1)) 一会儿,让输出跟上。

Adding to the previous answers here, your MPI_BARRIER works fine.

Though, if you just intend to see it working, you can force pause the execution (SLEEP(1)) for a moment to let the output catch up.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文