CUDA 和 nvcc:使用预处理器在 float 或 double 之间进行选择
问题:
有一个.h,如果为c/c++或计算能力>= 1.3的cuda编译,我想将real定义为double。如果针对计算能力<<的cuda编译1.3 然后将real定义为float。
我发现了这一点(这不起作用),
# if defined(__CUDACC__) # warning * making definitions for cuda # if defined(__CUDA_ARCH__) # warning __CUDA_ARCH__ is defined # else # warning __CUDA_ARCH__ is NOT defined # endif # if (__CUDA_ARCH__ >= 130) # define real double # warning using double in cuda # elif (__CUDA_ARCH__ >= 0) # define real float # warning using float in cuda # warning how the hell is this printed when __CUDA_ARCH__ is not defined? # else # define real # error what the hell is the value of __CUDA_ARCH__ and how can I print it # endif # else # warning * making definitions for c/c++ # define real double # warning using double for c/c++ # endif
经过几个小时后,当我编译(注意 -arch 标志)时,
nvcc -arch compute_13 -Ilibcutil testFloatDouble.cu
我
* making definitions for cuda __CUDA_ARCH__ is defined using double in cuda * making definitions for cuda warning __CUDA_ARCH__ is NOT defined warning using float in cuda how the hell is this printed if __CUDA_ARCH__ is not defined now? Undefined symbols for architecture i386: "myKernel(float*, int)", referenced from: ....
知道文件被 nvcc 编译了两次。第一个没问题(CUDACC 已定义且 CUDA_ARCH >= 130),但第二次会发生什么? CUDA_DEFINED 但 CUDA_ARCH 未定义或值 130?为什么 ?
感谢您抽出时间。
The problem:
Having a .h, I want to define real to be double if compiling for c/c++ or for cuda with computing capability >= 1.3. If compiling for cuda with computing capability < 1.3 then define real to be float.
After many hours I came to this (which does not work )
# if defined(__CUDACC__) # warning * making definitions for cuda # if defined(__CUDA_ARCH__) # warning __CUDA_ARCH__ is defined # else # warning __CUDA_ARCH__ is NOT defined # endif # if (__CUDA_ARCH__ >= 130) # define real double # warning using double in cuda # elif (__CUDA_ARCH__ >= 0) # define real float # warning using float in cuda # warning how the hell is this printed when __CUDA_ARCH__ is not defined? # else # define real # error what the hell is the value of __CUDA_ARCH__ and how can I print it # endif # else # warning * making definitions for c/c++ # define real double # warning using double for c/c++ # endif
when I compile (note the -arch flag)
nvcc -arch compute_13 -Ilibcutil testFloatDouble.cu
I get
* making definitions for cuda __CUDA_ARCH__ is defined using double in cuda * making definitions for cuda warning __CUDA_ARCH__ is NOT defined warning using float in cuda how the hell is this printed if __CUDA_ARCH__ is not defined now? Undefined symbols for architecture i386: "myKernel(float*, int)", referenced from: ....
I know that files get compiled twice by nvcc. The first one is OK (CUDACC defined and CUDA_ARCH >= 130) but what happens the second time?
CUDA_DEFINED but CUDA_ARCH undefined or with value < 130? Why ?
Thanks for your time.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
看来您可能会混淆两件事 - 当 nvcc 处理 CUDA 代码时如何区分主机和设备编译轨迹,以及如何区分 CUDA 和非 CUDA 代码。两者之间有细微的差别。
__CUDA_ARCH__
回答第一个问题,__CUDACC__
回答第二个问题。考虑以下代码片段:
这里我们有一个模板化的 CUDA 内核,具有 CUDA 架构相关实例化,一个由
nvcc
引导的主机代码的单独节,以及一个不由引导的主机代码编译的节nvcc。其行为如下:
要点如下:
__CUDACC__
定义nvcc
是否引导编译__CUDA_ARCH__
总是< /em> 编译主机代码时未定义,由nvcc
引导或不__CUDA_ARCH__
仅针对由nvcc
引导的编译设备代码轨迹定义nvcc
这三条信息始终足以对不同 CUDA 架构的设备代码、主机端 CUDA 代码以及根本未由
nvcc
编译的代码进行条件编译。 nvcc 文档有时有点简洁,但所有这些都包含在有关编译轨迹的讨论中。It seems you might be conflating two things - how to differentiate between the host and device compilation trajectories when nvcc is processing CUDA code, and how to differentiate between CUDA and non-CUDA code. There is a subtle difference between the two.
__CUDA_ARCH__
answers the first question, and__CUDACC__
answers the second.Consider the following code snippet:
Here we have a templated CUDA kernel with CUDA architecture dependent instantiation, a separate stanza for host code steeered by
nvcc
, and a stanza for compilation of host code not steered bynvcc
. This behaves as follows:The take away points from this are:
__CUDACC__
defines whethernvcc
is steering compilation or not__CUDA_ARCH__
is always undefined when compiling host code, steered bynvcc
or not__CUDA_ARCH__
is only defined for the device code trajectory of compilation steered bynvcc
Those three pieces of information are always enough to have conditional compilation for device code to different CUDA architectures, host side CUDA code, and code not compiled by
nvcc
at all. Thenvcc
documentation is a bit terse at times, but all of this is covered in the discussion on compilation trajectories.目前,我看到的唯一实用的解决方案是使用自定义定义:
然后
当它输出两个编译时:
并
执行
For the moment the only practical solution I see is using a custom define:
and then
As it outputs the for the two compilations:
and
does