CUDA:头文件中使用的 __device__ 函数出现 LNK2005 错误

发布于 2024-10-21 22:49:42 字数 1711 浏览 2 评论 0原文

我有一个在头文件中定义的设备函数。它位于头文件中的原因是因为它由全局内核使用,而该内核需要位于头文件中,因为它是模板内核。

当此头文件包含在 2 个或更多 .cu 文件中时,我在链接过程中收到 LNK2005 错误:

FooDevice.cu.obj:错误 LNK2005:“int __cdecl getCurThreadIdx(void)" (?getCurThreadIdx@@YAHXZ) 已定义 在Main.cu.obj中

为什么会出现这个错误?如何修复它?

以下是产生上述错误的示例代码:

FooDevice.h:

#ifndef FOO_DEVICE_H
#define FOO_DEVICE_H

__device__ int getCurThreadIdx()
{
    return ( ( blockIdx.x * blockDim.x ) + threadIdx.x );
}

template< typename T >
__global__ void fooKernel( const T* inArr, int num, T* outArr )
{
    const int threadNum = ( gridDim.x * blockDim.x );

    for ( int idx = getCurThreadIdx(); idx < num; idx += threadNum )
        outArr[ idx ] = inArr[ idx ];

    return;
}

__global__ void fooKernel2( const int* inArr, int num, int* outArr );

#endif // FOO_DEVICE_H

FooDevice.cu:

#include "FooDevice.h"

// One other kernel that uses getCurThreadIdx()
__global__ void fooKernel2( const int* inArr, int num, int* outArr )
{
    const int threadNum = ( gridDim.x * blockDim.x );

    for ( int idx = getCurThreadIdx(); idx < num; idx += threadNum )
        outArr[ idx ] = inArr[ idx ];

    return;
}

Main.cu:

#include "FooDevice.h"

int main()
{
    int num             = 10;
    int* dInArr         = NULL;
    int* dOutArr        = NULL;
    const int arrSize   = num * sizeof( *dInArr );

    cudaMalloc( &dInArr, arrSize );
    cudaMalloc( &dOutArr, arrSize );

    // Using template kernel
    fooKernel<<< 10, 10 >>>( dInArr, num, dOutArr );

    return 0;
}

I have a device function that is defined in a header file. The reason it is in a header file is because it is used by a global kernel, which needs to be in a header file since it is a template kernel.

When this header file is included across 2 or more .cu files, I get a LNK2005 error during linking:

FooDevice.cu.obj : error LNK2005: "int
__cdecl getCurThreadIdx(void)" (?getCurThreadIdx@@YAHXZ) already defined
in Main.cu.obj

Why is this error caused? How to fix it?

Here is sample code to produces the above error:

FooDevice.h:

#ifndef FOO_DEVICE_H
#define FOO_DEVICE_H

__device__ int getCurThreadIdx()
{
    return ( ( blockIdx.x * blockDim.x ) + threadIdx.x );
}

template< typename T >
__global__ void fooKernel( const T* inArr, int num, T* outArr )
{
    const int threadNum = ( gridDim.x * blockDim.x );

    for ( int idx = getCurThreadIdx(); idx < num; idx += threadNum )
        outArr[ idx ] = inArr[ idx ];

    return;
}

__global__ void fooKernel2( const int* inArr, int num, int* outArr );

#endif // FOO_DEVICE_H

FooDevice.cu:

#include "FooDevice.h"

// One other kernel that uses getCurThreadIdx()
__global__ void fooKernel2( const int* inArr, int num, int* outArr )
{
    const int threadNum = ( gridDim.x * blockDim.x );

    for ( int idx = getCurThreadIdx(); idx < num; idx += threadNum )
        outArr[ idx ] = inArr[ idx ];

    return;
}

Main.cu:

#include "FooDevice.h"

int main()
{
    int num             = 10;
    int* dInArr         = NULL;
    int* dOutArr        = NULL;
    const int arrSize   = num * sizeof( *dInArr );

    cudaMalloc( &dInArr, arrSize );
    cudaMalloc( &dOutArr, arrSize );

    // Using template kernel
    fooKernel<<< 10, 10 >>>( dInArr, num, dOutArr );

    return 0;
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

同尘 2024-10-28 22:49:42

为什么会出现这个错误呢?

因为您已将标头包含在定义它的 FooDevice.cu 和 Main.cu 中,所以您现在拥有同一函数的两个副本,并且链接器会检测到这一点。

如何修复它?

如果您在 foo.h 中定义了以下内容

template<typename T> __device__ T foo(T x)
{
    return x;
}

,并且两个 .cu 文件都包含 foo.h 并且还包含对它的调用,例如

int x = foo<int>(1);

那么您可以强制 foo() inline:

template<typename T>
inline __device__ T foo(T x)
{
    return x;
}

并调用:

int x = foo<int>(1);

这将阻止它被声明为多个次。

函数模板不受
一个定义规则,可能更多
比它们的一个定义
不同的翻译单位。满的
函数模板专业化是
不是模板,而是普通的
函数,所以你需要使用内联
如果需要,关键字不要违反 ODR
将它们放入包含的头文件中
分成几个翻译单元。

取自 http://www.velocityreviews.com/forums/t447911-why-does-explicit-specialization-of-function-templates-cause-genesis-of-code.html

另请参阅:http://en.wikipedia.org/wiki/One_Definition_Rule

我像这样更改了您的代码:

inline __device__ int getCurThreadIdx()
{
    return ( ( blockIdx.x * blockDim.x ) + threadIdx.x );
}

template< typename T >
__global__ void fooKernel( const T* inArr, int num, T* outArr )
{
    const int threadNum = ( gridDim.x * blockDim.x );

    for ( int idx = getCurThreadIdx(); idx < num; idx += threadNum )
        outArr[ idx ] = inArr[ idx ];

    return;
}

现在可以编译了。您的声明没有内联 getCurThreadIdx() 违反了单一定义规则。

Why is this error caused?

Because you have included your header in FooDevice.cu and Main.cu where it gets defined so you now have two copies of the same function and the linker detects this.

How to fix it?

If you have the following defined in foo.h

template<typename T> __device__ T foo(T x)
{
    return x;
}

And two .cu files that both include foo.h and also contain a call to it, e.g.

int x = foo<int>(1);

Then you can force foo() inline:

template<typename T>
inline __device__ T foo(T x)
{
    return x;
}

and call:

int x = foo<int>(1);

This will stop it from being declared multiple times.

Function templates are an exempt of
One Defintion Rule and may be more
than one definition of them in
different translation unit's. Full
function template specialization is
not a template, rather an ordinary
function, so you need to use inline
keyword not to violate ODR if you want
to put them in a header file included
into several translation unit's.

Taken from http://www.velocityreviews.com/forums/t447911-why-does-explicit-specialization-of-function-templates-cause-generation-of-code.html

See also: http://en.wikipedia.org/wiki/One_Definition_Rule

I changed your code like this:

inline __device__ int getCurThreadIdx()
{
    return ( ( blockIdx.x * blockDim.x ) + threadIdx.x );
}

template< typename T >
__global__ void fooKernel( const T* inArr, int num, T* outArr )
{
    const int threadNum = ( gridDim.x * blockDim.x );

    for ( int idx = getCurThreadIdx(); idx < num; idx += threadNum )
        outArr[ idx ] = inArr[ idx ];

    return;
}

And it now compiles. Your declaration without the inline of getCurThreadIdx() was violating the one definition rule.

好倦 2024-10-28 22:49:42

应该内联。您可以尝试添加 inline 关键字。

也许您可以删除不必要的代码并创建一个简单的文本示例供我们查看?通常问题出在细节上...

It should be inlined. You could try adding the inline keyword.

Maybe you could remove the unnecessary code and create a simple text example for us to see? Usually the problem lies in the details...

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文