CUDA FORTRAN:如果我传递变量而不是数字,函数会给出不同的答案

发布于 2024-10-17 14:45:08 字数 939 浏览 5 评论 0原文

我正在尝试使用 ISHFT() 函数使用 CUDA FORTRAN 并行位移一些 32 位整数。

问题是,即使 var = -4 ,我对 ISHFT(-4,-1)ISHFT(var,-1) 得到不同的答案代码>.这是我编写的测试代码:

module testshift 

  integer :: test 
  integer, device :: d_test 

contains 

  attributes(global) subroutine testshft () 
    integer :: var
    var = -4
    d_test = ISHFT(var,-1)
  end subroutine testshft

end module testshift

program foo 
  use testshift 

  integer :: i
  call testshft<<<1,1>>>() ! carry out ishft on gpu
  test = d_test            ! copy device result to host
  i = ISHFT(-4,-1)         ! carry out ishft on cpu
  print *, i, test         ! print the results
end program foo

然后编译并执行:

pgf90 testishft.f90 -Mcuda
./a.out 
   2147483646           -2

如果工作正常,两者都应该是 2147483646。如果将 var 替换为 4,我会得到正确的答案。

我该如何解决这个问题? 感谢您的帮助

I'm trying to use the ISHFT() function to bitshift some 32-bit integers in parallel, using CUDA FORTRAN.

The problem is that I get different answers to ISHFT(-4,-1) and ISHFT(var,-1) even though var = -4. This is the test code I've written:

module testshift 

  integer :: test 
  integer, device :: d_test 

contains 

  attributes(global) subroutine testshft () 
    integer :: var
    var = -4
    d_test = ISHFT(var,-1)
  end subroutine testshft

end module testshift

program foo 
  use testshift 

  integer :: i
  call testshft<<<1,1>>>() ! carry out ishft on gpu
  test = d_test            ! copy device result to host
  i = ISHFT(-4,-1)         ! carry out ishft on cpu
  print *, i, test         ! print the results
end program foo

I then compile and execute:

pgf90 testishft.f90 -Mcuda
./a.out 
   2147483646           -2

Both should be 2147483646 if working correctly. I get the right answer if I replace var with 4.

How do I fix this problem?
Thanks for the help

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

滥情哥ㄟ 2024-10-24 14:45:08

当我从上面的程序中删除特定于 GPU 的代码时,我从 g95 编译器中得到 2147483646 2147483646,如您所料。您是否尝试过使用 pgf90 编译器运行程序的“标量”版本?如果标量版本可以工作但 GPU 版本不能,这有助于隔离问题。如果问题是 pgf90/CUDA 特定的,也许提出问题的最佳位置是

PGI 用户论坛 论坛索引 ->编程与编译
http://www.pgroup.com/userforum/viewforum.php?f=4

When I remove the GPU-specific code from the above program I get 2147483646 2147483646 from the g95 compiler, as you expect. Have you tried running a "scalar" version of the program using the pgf90 compiler? If the scalar version works but the GPU version does not, that helps to isolate the problem. If the problem is pgf90/CUDA specific, perhaps the best place to ask your question is

PGI User Forum Forum Index -> Programming and Compiling
http://www.pgroup.com/userforum/viewforum.php?f=4 .

你的他你的她 2024-10-24 14:45:08

我找到了一个解决方法,发布在这个论坛中:
http://www.pgroup .com/userforum/viewtopic.php?t=2455&postdays=0&postorder=asc&start=15

我不使用 ISHFT,而是使用 IBITS,如下所述:http://gcc.gnu.org/onlinedocs/gfortran/IBITS.html

此后该问题已在版本中得到修复11.3 PGI编译器
http://www.pgroup.com/support/release_tprs_2011.htm

I've found a workaround, which is posted in this forum:
http://www.pgroup.com/userforum/viewtopic.php?t=2455&postdays=0&postorder=asc&start=15

Instead of using ISHFT I use IBITS, which is described here: http://gcc.gnu.org/onlinedocs/gfortran/IBITS.html

Also the problem has since been fixed in version 11.3 of the PGI compiler
http://www.pgroup.com/support/release_tprs_2011.htm

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文