nvcc:结合外部和常量
我想将 CUDA 代码组织到单独的目标文件中,以便在编译结束时链接,就像在 C++ 中一样。为此,我希望能够在头文件中声明一个指向 __constant__ 内存的外部指针,并将定义放入其中一个 .cu 文件中,同样遵循 C++ 的模式。但似乎当我这样做时,nvcc 会忽略“extern” - 它将每个声明作为定义。有办法解决这个问题吗?
为了更具体地了解代码和错误,我将其放在头文件中:
extern __device__ void* device_function_table[];
然后在 .cu 文件中:
void* __device__ device_function_table[200];
这在编译时给出了此错误:
(path).cu:40: error: redefinition of ‘void* device_function_table [200]’
(path).hh:29: error: ‘void* device_function_table [200]’ previously declared here
我当前的解决方案是使用 Makefile magic 将我所有的 .cu 放在一起文件,实际上有一个大的翻译单元,但有一些类似的文件组织。但这已经明显减慢了编译速度,因为对我的任何一个类的更改都意味着重新编译所有类;我预计还会增加几个课程。
编辑:我看到我在文本中放入了 __constant__ ,在示例中放入了 __device__
;这个问题对两者都适用。
I would like to organise my CUDA code into separate object files to be linked at the end of compiling, as in C++. To that end I'd like to be able to declare an extern pointer to __constant__
memory in a header file, and put the definition in one of the .cu files, also following the pattern from C++. But it seems that when I do so, nvcc ignores the 'extern' - it takes each declaration as a definition. Is there a way around this?
To be more specific about the code and the errors, I have this in a header file:
extern __device__ void* device_function_table[];
followed by this in a .cu file:
void* __device__ device_function_table[200];
which gives this error on compiling:
(path).cu:40: error: redefinition of ‘void* device_function_table [200]’
(path).hh:29: error: ‘void* device_function_table [200]’ previously declared here
My current solution is to use Makefile magic to glob together all my .cu files and have, in effect, one big translation unit but some semblance of file organisation. But this is already slowing down compiles noticeably, since a change to any one of my classes means recompiling all of them; and I anticipate adding several more classes.
Edit: I see I put __constant__
in the text and __device__
in the example; the question applies to both.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
长话短说,使用最新的 CUDA 工具包(我使用的是 v8)并且计算能力至少为 2.0,在 Visual Studio 中,转到“项目属性”->“ CUDA C/C++ ->常见,在列表中找到“生成可重定位设备代码”,将其设置为“是(-rdc=true)”。
对于命令行此页面建议
– dc 编译器选项
Cutting the long story short, with recent CUDA toolkit (I'm on v8) and compute capability at least 2.0, in Visual Studio, go to Project Properties -> CUDA C/C++ -> Common , find "Generate Relocatable Device Code" in the list, set it to "Yes (-rdc=true)".
For command line this page suggests
–dc
compiler option来自 CUDA C 编程指南版本4.0,D.2.1.1 节:
From the CUDA C Programming Guide version 4.0, section D.2.1.1:
自 CUDA 5.0 起,如果启用单独编译和链接,现在可以使用 CUDA 外部定义数据。这篇博客文章对此进行了解释:http://devblogs.nvidia。 com/parallelforall/separate-compilation-linking-cuda-device-code/
如果这样做了,人们只需像原始帖子中那样使用它,它就“正常工作”。
Since CUDA 5.0, it is now possible to have externally defined data with CUDA, if separate compilation and linking is enabled. This blog post explains it: http://devblogs.nvidia.com/parallelforall/separate-compilation-linking-cuda-device-code/
If this is done, one simply uses it like in the original post, and it 'just works'.