CUDA FORTRAN:如果我传递变量而不是数字,函数会给出不同的答案
我正在尝试使用 ISHFT() 函数使用 CUDA FORTRAN 并行位移一些 32 位整数。
问题是,即使 var = -4
,我对 ISHFT(-4,-1)
和 ISHFT(var,-1)
得到不同的答案代码>.这是我编写的测试代码:
module testshift
integer :: test
integer, device :: d_test
contains
attributes(global) subroutine testshft ()
integer :: var
var = -4
d_test = ISHFT(var,-1)
end subroutine testshft
end module testshift
program foo
use testshift
integer :: i
call testshft<<<1,1>>>() ! carry out ishft on gpu
test = d_test ! copy device result to host
i = ISHFT(-4,-1) ! carry out ishft on cpu
print *, i, test ! print the results
end program foo
然后编译并执行:
pgf90 testishft.f90 -Mcuda
./a.out
2147483646 -2
如果工作正常,两者都应该是 2147483646。如果将 var
替换为 4
,我会得到正确的答案。
我该如何解决这个问题? 感谢您的帮助
I'm trying to use the ISHFT()
function to bitshift some 32-bit integers in parallel, using CUDA FORTRAN.
The problem is that I get different answers to ISHFT(-4,-1)
and ISHFT(var,-1)
even though var = -4
. This is the test code I've written:
module testshift
integer :: test
integer, device :: d_test
contains
attributes(global) subroutine testshft ()
integer :: var
var = -4
d_test = ISHFT(var,-1)
end subroutine testshft
end module testshift
program foo
use testshift
integer :: i
call testshft<<<1,1>>>() ! carry out ishft on gpu
test = d_test ! copy device result to host
i = ISHFT(-4,-1) ! carry out ishft on cpu
print *, i, test ! print the results
end program foo
I then compile and execute:
pgf90 testishft.f90 -Mcuda
./a.out
2147483646 -2
Both should be 2147483646 if working correctly. I get the right answer if I replace var
with 4
.
How do I fix this problem?
Thanks for the help
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
当我从上面的程序中删除特定于 GPU 的代码时,我从 g95 编译器中得到 2147483646 2147483646,如您所料。您是否尝试过使用 pgf90 编译器运行程序的“标量”版本?如果标量版本可以工作但 GPU 版本不能,这有助于隔离问题。如果问题是 pgf90/CUDA 特定的,也许提出问题的最佳位置是
PGI 用户论坛 论坛索引 ->编程与编译
http://www.pgroup.com/userforum/viewforum.php?f=4 。
When I remove the GPU-specific code from the above program I get 2147483646 2147483646 from the g95 compiler, as you expect. Have you tried running a "scalar" version of the program using the pgf90 compiler? If the scalar version works but the GPU version does not, that helps to isolate the problem. If the problem is pgf90/CUDA specific, perhaps the best place to ask your question is
PGI User Forum Forum Index -> Programming and Compiling
http://www.pgroup.com/userforum/viewforum.php?f=4 .
我找到了一个解决方法,发布在这个论坛中:
http://www.pgroup .com/userforum/viewtopic.php?t=2455&postdays=0&postorder=asc&start=15
我不使用 ISHFT,而是使用 IBITS,如下所述:http://gcc.gnu.org/onlinedocs/gfortran/IBITS.html
此后该问题已在版本中得到修复11.3 PGI编译器
http://www.pgroup.com/support/release_tprs_2011.htm
I've found a workaround, which is posted in this forum:
http://www.pgroup.com/userforum/viewtopic.php?t=2455&postdays=0&postorder=asc&start=15
Instead of using ISHFT I use IBITS, which is described here: http://gcc.gnu.org/onlinedocs/gfortran/IBITS.html
Also the problem has since been fixed in version 11.3 of the PGI compiler
http://www.pgroup.com/support/release_tprs_2011.htm