CUDA 和链接器错误

发布于 2024-10-21 14:15:26 字数 2152 浏览 1 评论 0原文

这可能是与链接器类似的问题使用 CUDA __device__ 函数（默认情况下应内联）时出现错误 2005 和 1169（多重定义的符号），但不完全是这样。当尝试在 VS2010 上构建项目（使用已在其他地方显示有效的代码）时，我遇到了几个 LNK2005 错误。我已经无计可施了。

例如，我有以下三个文件：transposeGPU.h、transposeGPU.cu 和 transposeCUDA.cu。 transposeGPU.h 可以总结如下：

void transposeGPU(float *d_dst, size_t dst_pitch,
    float *d_src, size_t src_pitch,
    unsigned int width, unsigned int height);

即，没有任何包含的单个声明。该函数的定义可在 transposeGPU.cu 中找到，可概括如下：

#include <stdio.h>
#include "../transposeGPU.h"
#include "../helper_funcs.h"

#include "transposeCUDA.cu"

void
transposeGPU(float *d_dst, size_t dst_pitch,
    float *d_src, size_t src_pitch,
    unsigned int width, unsigned int height)
{
    // execution configuration parameters
    dim3 threads(16, 16);
    dim3 grid(iDivUp(width, 16), iDivUp(height, 16));
    size_t shared_mem_size =
        (threads.x * threads.y + (threads.y - 1)) * sizeof(float);

    transposeCUDA<<<grid, threads, shared_mem_size>>>(
        d_dst, dst_pitch / sizeof(float),
        d_src, src_pitch / sizeof(float),
        width, height);
}

即 tranposeGPU.cu 包含其头文件和 transposeCUDA.cu< /code>，除了定义 transposeGPU() 和调用 transposeCUDA()（后者可在 transposeCUDA.cu 中找到）。现在，transposeCUDA.cu 定义了预期的函数：

#include "common_kernel.h"

__global__ void
transposeCUDA(
    float *g_dst, size_t s_dst_pitch,
    const float *g_src, size_t s_src_pitch,
    unsigned int img_width, unsigned int img_height)
{
// several lines of code...
}

一切看起来都按顺序进行，但我仍然收到错误 LNK2005：“void __cdecl __device_stub__Z13transposeCUDAPfjPKfjjj(float *,unsigned int,float const *,unsigned int,unsigned int,unsigned int)" (?__device_stub__Z13transposeCUDAPfjPKfjjj@@YAXPAMIPBMIII@Z) 已在 transposeGPU.obj 的 transposeCUDA.obj 中定义。

该错误以及其他大约二十个类似的链接器错误。为什么？没有发生明显的重新定义。任何帮助将不胜感激。

原文

This may be a similar question to Linker errors 2005 and 1169 (multiply defined symbols) when using CUDA __device__ functions (should be inline by default), but not exactly. I'm getting several LNK2005 errors when trying to build a project (using code that has been shown to work elsewhere) on VS2010. I'm at my wits' end.

For example, I have the following three files: transposeGPU.h, transposeGPU.cu, and transposeCUDA.cu. transposeGPU.h can be summarized as follows:

void transposeGPU(float *d_dst, size_t dst_pitch,
    float *d_src, size_t src_pitch,
    unsigned int width, unsigned int height);

i.e., a single declaration without any includes. The definition of that function is found in transposeGPU.cu, which can be summarized as follows:

#include <stdio.h>
#include "../transposeGPU.h"
#include "../helper_funcs.h"

#include "transposeCUDA.cu"

void
transposeGPU(float *d_dst, size_t dst_pitch,
    float *d_src, size_t src_pitch,
    unsigned int width, unsigned int height)
{
    // execution configuration parameters
    dim3 threads(16, 16);
    dim3 grid(iDivUp(width, 16), iDivUp(height, 16));
    size_t shared_mem_size =
        (threads.x * threads.y + (threads.y - 1)) * sizeof(float);

    transposeCUDA<<<grid, threads, shared_mem_size>>>(
        d_dst, dst_pitch / sizeof(float),
        d_src, src_pitch / sizeof(float),
        width, height);
}

i.e., tranposeGPU.cu includes its header file and transposeCUDA.cu, besides defining transposeGPU() and calling transposeCUDA(), the latter found in transposeCUDA.cu. Now, transposeCUDA.cu defines the function as expected:

#include "common_kernel.h"

__global__ void
transposeCUDA(
    float *g_dst, size_t s_dst_pitch,
    const float *g_src, size_t s_src_pitch,
    unsigned int img_width, unsigned int img_height)
{
// several lines of code...
}

It all looks in order, but I still get error LNK2005: "void __cdecl __device_stub__Z13transposeCUDAPfjPKfjjj(float *,unsigned int,float const *,unsigned int,unsigned int,unsigned int)" (?__device_stub__Z13transposeCUDAPfjPKfjjj@@YAXPAMIPBMIII@Z) already defined in transposeCUDA.obj in transposeGPU.obj.

That and some twenty other similar linker errors. Why? There's no apparent redefinition occurring. Any help would be greatly appreciated.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梦幻之岛 2024-10-28 14:15:26

如果您同时编译 transposeCUDA.cu 和 transposeGPU.cu，则会发生重新定义，因为定义出现在两个翻译单元中。您不应该 #include transposeCUDA.cu 并将 nvcc 应用于该文件。

回复收藏 0 原文

我不吻晚风 2024-10-28 14:15:26

澄清一下：__device__ 函数是内联的（至少在 Fermi 之前），但 __global__ 不是——毕竟，您无法将 GPU 代码内联到 CPU 可执行函数中。全局函数可以获取其地址，唯一的区别是地址指向 GPU 内存（类似于指向 GPU 上存储的数据的普通指针，看起来只是普通指针）。

正如 William Pursell 所说，如果你编译全局函数两次，你会得到两个具有相同定义的函数，从而导致链接器错误。

回复收藏 0 原文

~没有更多了~