在简单的奇偶发送中遇到死锁
我正在尝试使用 MPI 解决一个简单的问题,我的实现是 MPICH2,我的代码是 Fortran 语言。我使用了阻塞发送和接收,这个想法很简单,但是当我运行它时它崩溃了!我完全不知道出了什么问题?有人可以就这个问题报价吗?有一段代码:
integer, parameter :: IM=100, JM=100
REAL, ALLOCATABLE :: T(:,:), TF(:,:)
CALL MPI_COMM_RANK(MPI_COMM_WORLD,RNK,IERR)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD,SIZ,IERR)
prv = rnk-1
nxt = rnk+1
LIM = INT(IM/SIZ)
IF (rnk==0) THEN
ALLOCATE(TF(IM,JM))
prv = MPI_PROC_NULL
ELSEIF(rnk==siz-1) THEN
NXT = MPI_PROC_NULL
LIM = LIM+MOD(IM,SIZ)
END IF
IF (MOD(RNK,2)==0) THEN
CALL MPI_SEND(T(2,:),JM+2,MPI_REAL,PRV,10,MPI_COMM_WORLD,IERR)
CALL MPI_RECV(T(1,:),JM+2,MPI_REAL,PRV,20,MPI_COMM_WORLD,STAT,IERR)
ELSE
CALL MPI_RECV(T(LIM+2,:),JM+2,MPI_REAL,NXT,10,MPI_COMM_WORLD,STAT,IERR)
CALL MPI_SEND(T(LIM+1,:),JM+2,MPI_REAL,NXT,20,MPI_COMM_WORLD,IERR)
END IF
据我了解,即使是奇数进程也没有收到任何内容,而奇数进程成功完成发送,在某些情况下,当我添加一些打印来观察发生了什么时,我看到变量 NXT 在发送过程中发生变化程序!!!例如,所有奇怪的进程都向进程 0 发送消息,而不是向下一个进程发送消息!
I'm trying to solve a simple problem with MPI, my implementation is MPICH2 and my code is in fortran. I have used the blocking send and receive, the idea is so simple but when I run it it crashes!!! I have absolutely no idea what is wrong? can anyone make quote on this issue please? there is a piece of the code:
integer, parameter :: IM=100, JM=100
REAL, ALLOCATABLE :: T(:,:), TF(:,:)
CALL MPI_COMM_RANK(MPI_COMM_WORLD,RNK,IERR)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD,SIZ,IERR)
prv = rnk-1
nxt = rnk+1
LIM = INT(IM/SIZ)
IF (rnk==0) THEN
ALLOCATE(TF(IM,JM))
prv = MPI_PROC_NULL
ELSEIF(rnk==siz-1) THEN
NXT = MPI_PROC_NULL
LIM = LIM+MOD(IM,SIZ)
END IF
IF (MOD(RNK,2)==0) THEN
CALL MPI_SEND(T(2,:),JM+2,MPI_REAL,PRV,10,MPI_COMM_WORLD,IERR)
CALL MPI_RECV(T(1,:),JM+2,MPI_REAL,PRV,20,MPI_COMM_WORLD,STAT,IERR)
ELSE
CALL MPI_RECV(T(LIM+2,:),JM+2,MPI_REAL,NXT,10,MPI_COMM_WORLD,STAT,IERR)
CALL MPI_SEND(T(LIM+1,:),JM+2,MPI_REAL,NXT,20,MPI_COMM_WORLD,IERR)
END IF
as I understood even processes are not receiving anything while the odd ones finish sending successfully, in some cases when I added some print to observe what is going on I saw that the variable NXT is changing during the sending procedure!!! for example all the odd process was sending message to process 0 not their next one!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
数组 T 未分配,因此读取或写入它是错误的。
The array T is not allocated so reading or writing from it is an error.
我看不到整个程序,但对我所看到的进行一些观察:
1)确保 rnk、size 和 prv 是整数。很可能,prv 是真实的(默认情况下
输入规则)并且您正在将实数发送到整数,因此标签不匹配,因此死锁。
2)我会使用sendrcv而不是send/recv;接收/发送代码部分。两个 sendrecv 更干净(2 行代码与 7 行代码),保证不会死锁,并且当您有双向链接时速度更快(几乎总是如此。)
I can't see the whole program, but some observation on what I can see:
1) make sure that rnk, size, and prv are integers. Likely, prv is real (by default
typing rules) and you are sending a real to an integer, so tags don't match, hence deadlock.
2) I'd use sendrcv rather than send / recv ; recv / send code section. Two sendrecv's are cleaner ( 2 lines of code vs. 7), guaranteed not to deadlock, and is faster when you have bi-directional links (almost always true.)