CUDA 和链接器错误
这可能是与链接器类似的问题使用 CUDA __device__ 函数(默认情况下应内联)时出现错误 2005 和 1169(多重定义的符号),但不完全是这样。当尝试在 VS2010 上构建项目(使用已在其他地方显示有效的代码)时,我遇到了几个 LNK2005 错误。我已经无计可施了。
例如,我有以下三个文件:transposeGPU.h
、transposeGPU.cu
和 transposeCUDA.cu
。 transposeGPU.h
可以总结如下:
void transposeGPU(float *d_dst, size_t dst_pitch,
float *d_src, size_t src_pitch,
unsigned int width, unsigned int height);
即,没有任何包含的单个声明。该函数的定义可在 transposeGPU.cu
中找到,可概括如下:
#include <stdio.h>
#include "../transposeGPU.h"
#include "../helper_funcs.h"
#include "transposeCUDA.cu"
void
transposeGPU(float *d_dst, size_t dst_pitch,
float *d_src, size_t src_pitch,
unsigned int width, unsigned int height)
{
// execution configuration parameters
dim3 threads(16, 16);
dim3 grid(iDivUp(width, 16), iDivUp(height, 16));
size_t shared_mem_size =
(threads.x * threads.y + (threads.y - 1)) * sizeof(float);
transposeCUDA<<<grid, threads, shared_mem_size>>>(
d_dst, dst_pitch / sizeof(float),
d_src, src_pitch / sizeof(float),
width, height);
}
即 tranposeGPU.cu
包含其头文件和 transposeCUDA.cu< /code>,除了定义
transposeGPU()
和调用 transposeCUDA()
(后者可在 transposeCUDA.cu
中找到)。现在,transposeCUDA.cu 定义了预期的函数:
#include "common_kernel.h"
__global__ void
transposeCUDA(
float *g_dst, size_t s_dst_pitch,
const float *g_src, size_t s_src_pitch,
unsigned int img_width, unsigned int img_height)
{
// several lines of code...
}
一切看起来都按顺序进行,但我仍然收到错误 LNK2005:“void __cdecl __device_stub__Z13transposeCUDAPfjPKfjjj(float *,unsigned int,float const *,unsigned int,unsigned int,unsigned int)" (?__device_stub__Z13transposeCUDAPfjPKfjjj@@YAXPAMIPBMIII@Z) 已在 transposeGPU.obj
的 transposeCUDA.obj 中定义。
该错误以及其他大约二十个类似的链接器错误。为什么?没有发生明显的重新定义。任何帮助将不胜感激。
This may be a similar question to Linker errors 2005 and 1169 (multiply defined symbols) when using CUDA __device__ functions (should be inline by default), but not exactly. I'm getting several LNK2005 errors when trying to build a project (using code that has been shown to work elsewhere) on VS2010. I'm at my wits' end.
For example, I have the following three files: transposeGPU.h
, transposeGPU.cu
, and transposeCUDA.cu
. transposeGPU.h
can be summarized as follows:
void transposeGPU(float *d_dst, size_t dst_pitch,
float *d_src, size_t src_pitch,
unsigned int width, unsigned int height);
i.e., a single declaration without any includes. The definition of that function is found in transposeGPU.cu
, which can be summarized as follows:
#include <stdio.h>
#include "../transposeGPU.h"
#include "../helper_funcs.h"
#include "transposeCUDA.cu"
void
transposeGPU(float *d_dst, size_t dst_pitch,
float *d_src, size_t src_pitch,
unsigned int width, unsigned int height)
{
// execution configuration parameters
dim3 threads(16, 16);
dim3 grid(iDivUp(width, 16), iDivUp(height, 16));
size_t shared_mem_size =
(threads.x * threads.y + (threads.y - 1)) * sizeof(float);
transposeCUDA<<<grid, threads, shared_mem_size>>>(
d_dst, dst_pitch / sizeof(float),
d_src, src_pitch / sizeof(float),
width, height);
}
i.e., tranposeGPU.cu
includes its header file and transposeCUDA.cu
, besides defining transposeGPU()
and calling transposeCUDA()
, the latter found in transposeCUDA.cu
. Now, transposeCUDA.cu
defines the function as expected:
#include "common_kernel.h"
__global__ void
transposeCUDA(
float *g_dst, size_t s_dst_pitch,
const float *g_src, size_t s_src_pitch,
unsigned int img_width, unsigned int img_height)
{
// several lines of code...
}
It all looks in order, but I still get error LNK2005: "void __cdecl __device_stub__Z13transposeCUDAPfjPKfjjj(float *,unsigned int,float const *,unsigned int,unsigned int,unsigned int)" (?__device_stub__Z13transposeCUDAPfjPKfjjj@@YAXPAMIPBMIII@Z) already defined in transposeCUDA.obj
in transposeGPU.obj
.
That and some twenty other similar linker errors. Why? There's no apparent redefinition occurring. Any help would be greatly appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果您同时编译 transposeCUDA.cu 和 transposeGPU.cu,则会发生重新定义,因为定义出现在两个翻译单元中。您不应该 #include transposeCUDA.cu 并将 nvcc 应用于该文件。
There is a redefinition occurring if you are compiling both transposeCUDA.cu and transposeGPU.cu, since the definition appears in both translation units. You should not #include transposeCUDA.cu and apply nvcc to that file.
澄清一下:
__device__
函数是内联的(至少在 Fermi 之前),但__global__
不是——毕竟,您无法将 GPU 代码内联到 CPU 可执行函数中。全局函数可以获取其地址,唯一的区别是地址指向 GPU 内存(类似于指向 GPU 上存储的数据的普通指针,看起来只是普通指针)。正如 William Pursell 所说,如果你编译全局函数两次,你会得到两个具有相同定义的函数,从而导致链接器错误。
To clarify:
__device__
functions are inlined (at least at pre-Fermi), but__global__
are not -- after all, you cannot inline GPU code into your CPU executable function. Global functions can have their address taken, the only difference is that the address points into GPU memory (simiarly as normal pointers to data stored on GPU look like just plain pointers).As William Pursell said, if you compile your global function twice, you get two functions with the same definition, leading to the linker error.