MPI_Barrier 无法正常工作
我编写了下面的 C 应用程序来帮助我理解 MPI,以及为什么 MPI_Barrier() 在我庞大的 C++ 应用程序中不起作用。然而,我能够用一个小得多的 C 应用程序在我的巨大应用程序中重现我的问题。本质上,我在 for 循环内调用 MPI_Barrier(),并且 MPI_Barrier() 对所有节点都可见,但在循环迭代 2 次后,程序陷入死锁。有什么想法吗?
#include <mpi.h>
#include <stdio.h>
int main(int argc, char* argv[]) {
MPI_Init(&argc, &argv);
int i=0, numprocs, rank, namelen;
char processor_name[MPI_MAX_PROCESSOR_NAME];
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Get_processor_name(processor_name, &namelen);
printf("%s: Rank %d of %d\n", processor_name, rank, numprocs);
for(i=1; i <= 100; i++) {
if (rank==0) printf("Before barrier (%d:%s)\n",i,processor_name);
MPI_Barrier(MPI_COMM_WORLD);
if (rank==0) printf("After barrier (%d:%s)\n",i,processor_name);
}
MPI_Finalize();
return 0;
}
输出:
alienone: Rank 1 of 4
alienfive: Rank 3 of 4
alienfour: Rank 2 of 4
alientwo: Rank 0 of 4
Before barrier (1:alientwo)
After barrier (1:alientwo)
Before barrier (2:alientwo)
After barrier (2:alientwo)
Before barrier (3:alientwo)
我正在使用 GCC 4.4,从 Ubuntu 10.10 存储库中打开 MPI 1.3。
另外,在我庞大的 C++ 应用程序中,MPI 广播不起作用。只有一半的节点收到广播,其他节点则陷入等待状态。
预先感谢您的任何帮助或见解!
更新:升级到Open MPI 1.4.4,从源代码编译到/usr/local/。
更新:将 GDB 附加到正在运行的进程会显示一个有趣的结果。在我看来,MPI 系统在屏障处死亡,但 MPI 仍然认为程序正在运行:
附加 GDB 会产生一个有趣的结果。看起来所有节点都在 MPI 屏障处死亡,但 MPI 仍然认为它们正在运行:
0x00007fc235cbd1c8 in __poll (fds=0x15ee360, nfds=8, timeout=<value optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:83
83 ../sysdeps/unix/sysv/linux/poll.c: No such file or directory.
in ../sysdeps/unix/sysv/linux/poll.c
(gdb) bt
#0 0x00007fc235cbd1c8 in __poll (fds=0x15ee360, nfds=8, timeout=<value optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:83
#1 0x00007fc236a45141 in poll_dispatch () from /usr/local/lib/libopen-pal.so.0
#2 0x00007fc236a43f89 in opal_event_base_loop () from /usr/local/lib/libopen-pal.so.0
#3 0x00007fc236a38119 in opal_progress () from /usr/local/lib/libopen-pal.so.0
#4 0x00007fc236eff525 in ompi_request_default_wait_all () from /usr/local/lib/libmpi.so.0
#5 0x00007fc23141ad76 in ompi_coll_tuned_sendrecv_actual () from /usr/local/lib/openmpi/mca_coll_tuned.so
#6 0x00007fc2314247ce in ompi_coll_tuned_barrier_intra_recursivedoubling () from /usr/local/lib/openmpi/mca_coll_tuned.so
#7 0x00007fc236f15f12 in PMPI_Barrier () from /usr/local/lib/libmpi.so.0
#8 0x0000000000400b32 in main (argc=1, argv=0x7fff5883da58) at barrier_test.c:14
(gdb)
更新: 我也有这样的代码:
#include <mpi.h>
#include <stdio.h>
#include <math.h>
int main( int argc, char *argv[] ) {
int n = 400, myid, numprocs, i;
double PI25DT = 3.141592653589793238462643;
double mypi, pi, h, sum, x;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
printf("MPI Rank %i of %i.\n", myid, numprocs);
while (1) {
h = 1.0 / (double) n;
sum = 0.0;
for (i = myid + 1; i <= n; i += numprocs) {
x = h * ((double)i - 0.5);
sum += (4.0 / (1.0 + x*x));
}
mypi = h * sum;
MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
if (myid == 0)
printf("pi is approximately %.16f, Error is %.16f\n", pi, fabs(pi - PI25DT));
}
MPI_Finalize();
return 0;
}
尽管有无限循环,但循环中的 printf() 只有一个输出:
mpirun -n 24 --machinefile /etc/machines a.out
MPI Rank 0 of 24.
MPI Rank 3 of 24.
MPI Rank 1 of 24.
MPI Rank 4 of 24.
MPI Rank 17 of 24.
MPI Rank 15 of 24.
MPI Rank 5 of 24.
MPI Rank 7 of 24.
MPI Rank 16 of 24.
MPI Rank 2 of 24.
MPI Rank 11 of 24.
MPI Rank 9 of 24.
MPI Rank 8 of 24.
MPI Rank 20 of 24.
MPI Rank 23 of 24.
MPI Rank 19 of 24.
MPI Rank 12 of 24.
MPI Rank 13 of 24.
MPI Rank 21 of 24.
MPI Rank 6 of 24.
MPI Rank 10 of 24.
MPI Rank 18 of 24.
MPI Rank 22 of 24.
MPI Rank 14 of 24.
pi is approximately 3.1415931744231269, Error is 0.0000005208333338
有什么想法吗?
I wrote the C application below to help me understand MPI, and why MPI_Barrier() isn't functioning in my huge C++ application. However, I was able to reproduce my problem in my huge application with a much smaller C application. Essentially, I call MPI_Barrier() inside a for loop, and MPI_Barrier() is visible to all nodes, yet after 2 iterations of the loop, the program becomes deadlocked. Any thoughts?
#include <mpi.h>
#include <stdio.h>
int main(int argc, char* argv[]) {
MPI_Init(&argc, &argv);
int i=0, numprocs, rank, namelen;
char processor_name[MPI_MAX_PROCESSOR_NAME];
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Get_processor_name(processor_name, &namelen);
printf("%s: Rank %d of %d\n", processor_name, rank, numprocs);
for(i=1; i <= 100; i++) {
if (rank==0) printf("Before barrier (%d:%s)\n",i,processor_name);
MPI_Barrier(MPI_COMM_WORLD);
if (rank==0) printf("After barrier (%d:%s)\n",i,processor_name);
}
MPI_Finalize();
return 0;
}
The output:
alienone: Rank 1 of 4
alienfive: Rank 3 of 4
alienfour: Rank 2 of 4
alientwo: Rank 0 of 4
Before barrier (1:alientwo)
After barrier (1:alientwo)
Before barrier (2:alientwo)
After barrier (2:alientwo)
Before barrier (3:alientwo)
I am using GCC 4.4, Open MPI 1.3 from the Ubuntu 10.10 repositories.
Also, in my huge C++ application, MPI Broadcasts don't work. Only half the nodes receive the broadcast, the others are stuck waiting for it.
Thank you in advance for any help or insights!
Update: Upgraded to Open MPI 1.4.4, compiled from source into /usr/local/.
Update: Attaching GDB to the running process shows an interesting result. It looks to me that the MPI system died at the barrier, but MPI still thinks the program is running:
Attaching GDB yields an interesting result. It seems all nodes have died at the MPI barrier, but MPI still thinks they are running:
0x00007fc235cbd1c8 in __poll (fds=0x15ee360, nfds=8, timeout=<value optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:83
83 ../sysdeps/unix/sysv/linux/poll.c: No such file or directory.
in ../sysdeps/unix/sysv/linux/poll.c
(gdb) bt
#0 0x00007fc235cbd1c8 in __poll (fds=0x15ee360, nfds=8, timeout=<value optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:83
#1 0x00007fc236a45141 in poll_dispatch () from /usr/local/lib/libopen-pal.so.0
#2 0x00007fc236a43f89 in opal_event_base_loop () from /usr/local/lib/libopen-pal.so.0
#3 0x00007fc236a38119 in opal_progress () from /usr/local/lib/libopen-pal.so.0
#4 0x00007fc236eff525 in ompi_request_default_wait_all () from /usr/local/lib/libmpi.so.0
#5 0x00007fc23141ad76 in ompi_coll_tuned_sendrecv_actual () from /usr/local/lib/openmpi/mca_coll_tuned.so
#6 0x00007fc2314247ce in ompi_coll_tuned_barrier_intra_recursivedoubling () from /usr/local/lib/openmpi/mca_coll_tuned.so
#7 0x00007fc236f15f12 in PMPI_Barrier () from /usr/local/lib/libmpi.so.0
#8 0x0000000000400b32 in main (argc=1, argv=0x7fff5883da58) at barrier_test.c:14
(gdb)
Update:
I also have this code:
#include <mpi.h>
#include <stdio.h>
#include <math.h>
int main( int argc, char *argv[] ) {
int n = 400, myid, numprocs, i;
double PI25DT = 3.141592653589793238462643;
double mypi, pi, h, sum, x;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
printf("MPI Rank %i of %i.\n", myid, numprocs);
while (1) {
h = 1.0 / (double) n;
sum = 0.0;
for (i = myid + 1; i <= n; i += numprocs) {
x = h * ((double)i - 0.5);
sum += (4.0 / (1.0 + x*x));
}
mypi = h * sum;
MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
if (myid == 0)
printf("pi is approximately %.16f, Error is %.16f\n", pi, fabs(pi - PI25DT));
}
MPI_Finalize();
return 0;
}
And despite the infinite loop, there is only one output from the printf() in the loop:
mpirun -n 24 --machinefile /etc/machines a.out
MPI Rank 0 of 24.
MPI Rank 3 of 24.
MPI Rank 1 of 24.
MPI Rank 4 of 24.
MPI Rank 17 of 24.
MPI Rank 15 of 24.
MPI Rank 5 of 24.
MPI Rank 7 of 24.
MPI Rank 16 of 24.
MPI Rank 2 of 24.
MPI Rank 11 of 24.
MPI Rank 9 of 24.
MPI Rank 8 of 24.
MPI Rank 20 of 24.
MPI Rank 23 of 24.
MPI Rank 19 of 24.
MPI Rank 12 of 24.
MPI Rank 13 of 24.
MPI Rank 21 of 24.
MPI Rank 6 of 24.
MPI Rank 10 of 24.
MPI Rank 18 of 24.
MPI Rank 22 of 24.
MPI Rank 14 of 24.
pi is approximately 3.1415931744231269, Error is 0.0000005208333338
Any thoughts?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
当进程在最后一个屏障之后的不同时间遇到屏障时,OpenMPI 中的 MPI_Barrier() 有时会挂起,但据我所知,这不是您的情况。无论如何,尝试使用 MPI_Reduce() 代替或在真正调用 MPI_Barrier() 之前。这并不直接等同于屏障,但任何涉及通信器中所有进程的几乎没有有效负载的同步调用都应该像屏障一样工作。我还没有在 LAM/MPI 或 MPICH2 甚至 WMPI 中看到 MPI_Barrier() 的这种行为,但这是 OpenMPI 的一个真正问题。
MPI_Barrier() in OpenMPI sometimes hangs when processes come across the barrier at different times passed after last barrier, however that's not your case as I can see. Anyway, try using MPI_Reduce() instead or before the real call to MPI_Barrier(). This is not a direct equivalent to barrier, but any synchronous call with almost no payload involving all processes in a communicator should work like a barrier. I haven't seen such behavior of MPI_Barrier() in LAM/MPI or MPICH2 or even WMPI, but it was a real issue with OpenMPI.
你有什么互连?它是像 InfiniBand 或 Myrinet 这样的专业产品,还是只是在以太网上使用普通 TCP?如果使用 TCP 传输运行,您是否有多个已配置的网络接口?
此外,Open MPI 是模块化的——有许多模块提供实现各种集体操作的算法。您可以尝试使用 MCA 参数来摆弄它们,例如,您可以通过传递
mpirun
之类的内容来开始调试应用程序的行为,增加 btl 组件的详细程度,例如--mca btl_base_verbose 30
。查找类似以下内容的内容:在这种情况下,某些(或所有)节点具有多个已配置的已启动网络接口,但并非所有节点都可以通过所有接口访问。例如,如果节点运行带有默认启用的 Xen 支持(RHEL?)的最新 Linux 发行版,或者在其上安装了其他虚拟化软件来启动虚拟网络接口,则可能会发生这种情况。
默认情况下,Open MPI 是惰性的,即按需打开连接。如果选择了正确的接口,则第一次发送/接收通信可能会成功,但后续操作可能会选择备用路径之一,以便最大化带宽。如果通过第二个接口无法访问另一个节点,则可能会发生超时,并且通信将失败,因为 Open MPI 会认为另一个节点已关闭或有问题。
解决方案是使用 TCP
btl
模块的 MCA 参数隔离非连接网络或网络接口:--mca btl_tcp_if_include 192.168 .2.0/24
--mca btl_tcp_if_include eth0,eth1
lo
):--mca btl_tcp_if_exclude lo,virt0
请参阅打开 MPI 运行时 TCP 调整常见问题解答了解更多详细信息。
What interconnect do you have? Is it a specialisied one like InfiniBand or Myrinet or are you just using plain TCP over Ethernet? Do you have more than one configured network interfaces if running with the TCP transport?
Besides, Open MPI is modular -- there are many modules that provide algorithms implementing the various collective operations. You can try to fiddle with them using MCA parameters, e.g. you can start debugging your application's behaviour with increasing the verbosity of the btl component by passing
mpirun
something like--mca btl_base_verbose 30
. Look for something similar to:In that case some (or all) nodes have more than one configured network interface that is up but not all nodes are reachable through all the interfaces. This might happen, e.g. if nodes run recent Linux distro with per-default enabled Xen support (RHEL?) or have other virtualisation software installed on them that brings up virtual network interfaces.
By default Open MPI is lazy, that is connectinos are opened on demand. The first send/receive communication may succeed if the right interface is picked up, but subsequent operations are likely to pick up one of the alternate paths in order to maximise the bandwidth. If the other node is unreachable through the second interface a time out is likely to occur and the communication will fail as Open MPI will consider the other node down or problematic.
The solution is to isolate the non-connecting networks or network interfaces using MCA parameters of the TCP
btl
module:--mca btl_tcp_if_include 192.168.2.0/24
--mca btl_tcp_if_include eth0,eth1
lo
):--mca btl_tcp_if_exclude lo,virt0
Refer to the Open MPI run-time TCP tuning FAQ for more details.