CUDA:头文件中使用的 __device__ 函数出现 LNK2005 错误
我有一个在头文件中定义的设备函数。它位于头文件中的原因是因为它由全局内核使用,而该内核需要位于头文件中,因为它是模板内核。
当此头文件包含在 2 个或更多 .cu 文件中时,我在链接过程中收到 LNK2005 错误:
FooDevice.cu.obj:错误 LNK2005:“int __cdecl getCurThreadIdx(void)" (?getCurThreadIdx@@YAHXZ) 已定义 在Main.cu.obj中
为什么会出现这个错误?如何修复它?
以下是产生上述错误的示例代码:
FooDevice.h:
#ifndef FOO_DEVICE_H
#define FOO_DEVICE_H
__device__ int getCurThreadIdx()
{
return ( ( blockIdx.x * blockDim.x ) + threadIdx.x );
}
template< typename T >
__global__ void fooKernel( const T* inArr, int num, T* outArr )
{
const int threadNum = ( gridDim.x * blockDim.x );
for ( int idx = getCurThreadIdx(); idx < num; idx += threadNum )
outArr[ idx ] = inArr[ idx ];
return;
}
__global__ void fooKernel2( const int* inArr, int num, int* outArr );
#endif // FOO_DEVICE_H
FooDevice.cu:
#include "FooDevice.h"
// One other kernel that uses getCurThreadIdx()
__global__ void fooKernel2( const int* inArr, int num, int* outArr )
{
const int threadNum = ( gridDim.x * blockDim.x );
for ( int idx = getCurThreadIdx(); idx < num; idx += threadNum )
outArr[ idx ] = inArr[ idx ];
return;
}
Main.cu:
#include "FooDevice.h"
int main()
{
int num = 10;
int* dInArr = NULL;
int* dOutArr = NULL;
const int arrSize = num * sizeof( *dInArr );
cudaMalloc( &dInArr, arrSize );
cudaMalloc( &dOutArr, arrSize );
// Using template kernel
fooKernel<<< 10, 10 >>>( dInArr, num, dOutArr );
return 0;
}
I have a device function that is defined in a header file. The reason it is in a header file is because it is used by a global kernel, which needs to be in a header file since it is a template kernel.
When this header file is included across 2 or more .cu files, I get a LNK2005 error during linking:
FooDevice.cu.obj : error LNK2005: "int
__cdecl getCurThreadIdx(void)" (?getCurThreadIdx@@YAHXZ) already defined
in Main.cu.obj
Why is this error caused? How to fix it?
Here is sample code to produces the above error:
FooDevice.h:
#ifndef FOO_DEVICE_H
#define FOO_DEVICE_H
__device__ int getCurThreadIdx()
{
return ( ( blockIdx.x * blockDim.x ) + threadIdx.x );
}
template< typename T >
__global__ void fooKernel( const T* inArr, int num, T* outArr )
{
const int threadNum = ( gridDim.x * blockDim.x );
for ( int idx = getCurThreadIdx(); idx < num; idx += threadNum )
outArr[ idx ] = inArr[ idx ];
return;
}
__global__ void fooKernel2( const int* inArr, int num, int* outArr );
#endif // FOO_DEVICE_H
FooDevice.cu:
#include "FooDevice.h"
// One other kernel that uses getCurThreadIdx()
__global__ void fooKernel2( const int* inArr, int num, int* outArr )
{
const int threadNum = ( gridDim.x * blockDim.x );
for ( int idx = getCurThreadIdx(); idx < num; idx += threadNum )
outArr[ idx ] = inArr[ idx ];
return;
}
Main.cu:
#include "FooDevice.h"
int main()
{
int num = 10;
int* dInArr = NULL;
int* dOutArr = NULL;
const int arrSize = num * sizeof( *dInArr );
cudaMalloc( &dInArr, arrSize );
cudaMalloc( &dOutArr, arrSize );
// Using template kernel
fooKernel<<< 10, 10 >>>( dInArr, num, dOutArr );
return 0;
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
为什么会出现这个错误呢?
因为您已将标头包含在定义它的 FooDevice.cu 和 Main.cu 中,所以您现在拥有同一函数的两个副本,并且链接器会检测到这一点。
如何修复它?
如果您在 foo.h 中定义了以下内容
,并且两个 .cu 文件都包含 foo.h 并且还包含对它的调用,例如
那么您可以强制 foo() inline:
并调用:
这将阻止它被声明为多个次。
取自 http://www.velocityreviews.com/forums/t447911-why-does-explicit-specialization-of-function-templates-cause-genesis-of-code.html
另请参阅:http://en.wikipedia.org/wiki/One_Definition_Rule
我像这样更改了您的代码:
现在可以编译了。您的声明没有内联 getCurThreadIdx() 违反了单一定义规则。
Why is this error caused?
Because you have included your header in FooDevice.cu and Main.cu where it gets defined so you now have two copies of the same function and the linker detects this.
How to fix it?
If you have the following defined in foo.h
And two .cu files that both include foo.h and also contain a call to it, e.g.
Then you can force foo() inline:
and call:
This will stop it from being declared multiple times.
Taken from http://www.velocityreviews.com/forums/t447911-why-does-explicit-specialization-of-function-templates-cause-generation-of-code.html
See also: http://en.wikipedia.org/wiki/One_Definition_Rule
I changed your code like this:
And it now compiles. Your declaration without the inline of getCurThreadIdx() was violating the one definition rule.
它应该内联。您可以尝试添加
inline
关键字。也许您可以删除不必要的代码并创建一个简单的文本示例供我们查看?通常问题出在细节上...
It should be inlined. You could try adding the
inline
keyword.Maybe you could remove the unnecessary code and create a simple text example for us to see? Usually the problem lies in the details...