如何将CUDA代码分成多个文件

发布于 2024-08-18 16:28:59 字数 3283 浏览 2 评论 0原文

我正在尝试将 CUDA 程序分成两个单独的 .cu 文件，以便更接近于用 C++ 编写真正的应用程序。我有一个简单的小程序：

在主机和设备上分配内存。
将主机数组初始化为一系列数字。将主机阵列复制到设备阵列使用设备内核查找数组中所有元素的平方将设备阵列复制回主机阵列打印结果

如果我将其全部放入一个 .cu 文件中并运行它，效果会很好。当我将它分成两个单独的文件时，我开始出现链接错误。就像我最近的所有问题一样，我知道这是一件小事，但它是什么？

KernelSupport.cu

#ifndef _KERNEL_SUPPORT_
#define _KERNEL_SUPPORT_

#include <iostream>
#include <MyKernel.cu>

int main( int argc, char** argv) 
{
    int* hostArray;
    int* deviceArray;
    const int arrayLength = 16;
    const unsigned int memSize = sizeof(int) * arrayLength;

    hostArray = (int*)malloc(memSize);
    cudaMalloc((void**) &deviceArray, memSize);

    std::cout << "Before device\n";
    for(int i=0;i<arrayLength;i++)
    {
        hostArray[i] = i+1;
        std::cout << hostArray[i] << "\n";
    }
    std::cout << "\n";

    cudaMemcpy(deviceArray, hostArray, memSize, cudaMemcpyHostToDevice);
    TestDevice <<< 4, 4 >>> (deviceArray);
    cudaMemcpy(hostArray, deviceArray, memSize, cudaMemcpyDeviceToHost);

    std::cout << "After device\n";
    for(int i=0;i<arrayLength;i++)
    {
        std::cout << hostArray[i] << "\n";
    }

    cudaFree(deviceArray);
    free(hostArray);

    std::cout << "Done\n";
}

#endif

MyKernel.cu

#ifndef _MY_KERNEL_
#define _MY_KERNEL_

__global__ void TestDevice(int *deviceArray)
{
    int idx = blockIdx.x*blockDim.x + threadIdx.x;
    deviceArray[idx] = deviceArray[idx]*deviceArray[idx];
}


#endif

构建日志：

1>------ Build started: Project: CUDASandbox, Configuration: Debug x64 ------
1>Compiling with CUDA Build Rule...
1>"C:\CUDA\bin64\nvcc.exe"    -arch sm_10 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin"    -Xcompiler "/EHsc /W3 /nologo /O2 /Zi   /MT  "  -maxrregcount=32  --compile -o "x64\Debug\KernelSupport.cu.obj" "d:\Stuff\Programming\Visual Studio 2008\Projects\CUDASandbox\CUDASandbox\KernelSupport.cu" 
1>KernelSupport.cu
1>tmpxft_000016f4_00000000-3_KernelSupport.cudafe1.gpu
1>tmpxft_000016f4_00000000-8_KernelSupport.cudafe2.gpu
1>tmpxft_000016f4_00000000-3_KernelSupport.cudafe1.cpp
1>tmpxft_000016f4_00000000-12_KernelSupport.ii
1>Linking...
1>KernelSupport.cu.obj : error LNK2005: __device_stub__Z10TestDevicePi already defined in MyKernel.cu.obj
1>KernelSupport.cu.obj : error LNK2005: "void __cdecl TestDevice__entry(int *)" (?TestDevice__entry@@YAXPEAH@Z) already defined in MyKernel.cu.obj
1>D:\Stuff\Programming\Visual Studio 2008\Projects\CUDASandbox\x64\Debug\CUDASandbox.exe : fatal error LNK1169: one or more multiply defined symbols found
1>Build log was saved at "file://d:\Stuff\Programming\Visual Studio 2008\Projects\CUDASandbox\CUDASandbox\x64\Debug\BuildLog.htm"
1>CUDASandbox - 3 error(s), 0 warning(s)
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========

我在 Windows 7 64 位上运行 Visual Studio 2008。

编辑：

我想我需要对此进行详细说明。我在这里寻找的最终结果是拥有一个普通的 C++ 应用程序，其中包含 Main.cpp 和 int main() 事件，并从那里开始运行。在我的 .cpp 代码中的某些点上，我希望能够引用 CUDA 位。所以我的想法（如果这里有更标准的约定，请纠正我）是，我将 CUDA 内核代码放入其 .cu 文件中，然后有一个支持 .cu 文件，该文件将负责与设备通信并调用核函数等等。

原文

I am trying separate a CUDA program into two separate .cu files in effort to edge closer to writing a real app in C++. I have a simple little program that:

Allocates a memory on the host and the device.
Initializes the host array to a series of numbers.
Copies the host array to a device array
Finds the square of all the elements in the array using a device kernel
Copies the device array back to the host array
Prints the results

This works great if I put it all in one .cu file and run it. When I split it into two separate files I start getting linking errors. Like all my recent questions, I know this is something small, but what is it?

KernelSupport.cu

#ifndef _KERNEL_SUPPORT_
#define _KERNEL_SUPPORT_

#include <iostream>
#include <MyKernel.cu>

int main( int argc, char** argv) 
{
    int* hostArray;
    int* deviceArray;
    const int arrayLength = 16;
    const unsigned int memSize = sizeof(int) * arrayLength;

    hostArray = (int*)malloc(memSize);
    cudaMalloc((void**) &deviceArray, memSize);

    std::cout << "Before device\n";
    for(int i=0;i<arrayLength;i++)
    {
        hostArray[i] = i+1;
        std::cout << hostArray[i] << "\n";
    }
    std::cout << "\n";

    cudaMemcpy(deviceArray, hostArray, memSize, cudaMemcpyHostToDevice);
    TestDevice <<< 4, 4 >>> (deviceArray);
    cudaMemcpy(hostArray, deviceArray, memSize, cudaMemcpyDeviceToHost);

    std::cout << "After device\n";
    for(int i=0;i<arrayLength;i++)
    {
        std::cout << hostArray[i] << "\n";
    }

    cudaFree(deviceArray);
    free(hostArray);

    std::cout << "Done\n";
}

#endif

MyKernel.cu

#ifndef _MY_KERNEL_
#define _MY_KERNEL_

__global__ void TestDevice(int *deviceArray)
{
    int idx = blockIdx.x*blockDim.x + threadIdx.x;
    deviceArray[idx] = deviceArray[idx]*deviceArray[idx];
}


#endif

Build Log:

1>------ Build started: Project: CUDASandbox, Configuration: Debug x64 ------
1>Compiling with CUDA Build Rule...
1>"C:\CUDA\bin64\nvcc.exe"    -arch sm_10 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin"    -Xcompiler "/EHsc /W3 /nologo /O2 /Zi   /MT  "  -maxrregcount=32  --compile -o "x64\Debug\KernelSupport.cu.obj" "d:\Stuff\Programming\Visual Studio 2008\Projects\CUDASandbox\CUDASandbox\KernelSupport.cu" 
1>KernelSupport.cu
1>tmpxft_000016f4_00000000-3_KernelSupport.cudafe1.gpu
1>tmpxft_000016f4_00000000-8_KernelSupport.cudafe2.gpu
1>tmpxft_000016f4_00000000-3_KernelSupport.cudafe1.cpp
1>tmpxft_000016f4_00000000-12_KernelSupport.ii
1>Linking...
1>KernelSupport.cu.obj : error LNK2005: __device_stub__Z10TestDevicePi already defined in MyKernel.cu.obj
1>KernelSupport.cu.obj : error LNK2005: "void __cdecl TestDevice__entry(int *)" (?TestDevice__entry@@YAXPEAH@Z) already defined in MyKernel.cu.obj
1>D:\Stuff\Programming\Visual Studio 2008\Projects\CUDASandbox\x64\Debug\CUDASandbox.exe : fatal error LNK1169: one or more multiply defined symbols found
1>Build log was saved at "file://d:\Stuff\Programming\Visual Studio 2008\Projects\CUDASandbox\CUDASandbox\x64\Debug\BuildLog.htm"
1>CUDASandbox - 3 error(s), 0 warning(s)
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========

I am running Visual Studio 2008 on Windows 7 64bit.

Edit:

I think I need to elaborate on this a little bit. The end result I am looking for here is to have a normal C++ application with something like Main.cpp with the int main() event and have things run from there. At certains point in my .cpp code I want to be able to reference CUDA bits. So my thinking (and correct me if there a more standard convention here) is that I will put the CUDA Kernel code into their on .cu files, and then have a supporting .cu file that will take care of talking to the device and calling kernel functions and what not.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

狼性发作 2024-08-25 16:28:59

您将 mykernel.cu 包含在 kernelsupport.cu 中，当您尝试链接时，编译器会看到 mykernel.cu 两次。您必须创建一个定义 TestDevice 的标头并将其包含在内。

重新评论：

这样的东西应该可以工作

// MyKernel.h
#ifndef mykernel_h
#define mykernel_h
__global__ void TestDevice(int* devicearray);
#endif

，然后更改包含文件以

//KernelSupport.cu
#ifndef _KERNEL_SUPPORT_
#define _KERNEL_SUPPORT_

#include <iostream>
#include <MyKernel.h>
// ...

重新编辑

，只要您在 c++ 代码中使用的标头没有任何 cuda 特定的内容（__kernel__,__global__ 等）你应该可以很好地链接 c++ 和 cuda 代码。

You are including mykernel.cu in kernelsupport.cu, when you try to link the compiler sees mykernel.cu twice. You'll have to create a header defining TestDevice and include that instead.

re comment:

Something like this should work

// MyKernel.h
#ifndef mykernel_h
#define mykernel_h
__global__ void TestDevice(int* devicearray);
#endif

and then change the including file to

//KernelSupport.cu
#ifndef _KERNEL_SUPPORT_
#define _KERNEL_SUPPORT_

#include <iostream>
#include <MyKernel.h>
// ...

re your edit

As long as the header you use in c++ code doesn't have any cuda specific stuff (__kernel__,__global__, etc) you should be fine linking c++ and cuda code.

回复收藏 0 原文

爱冒险 2024-08-25 16:28:59

如果您查看 CUDA SDK 代码示例，它们具有 extern C 定义，引用从 .cu 文件编译的函数。这样，.cu 文件由 nvcc 编译，并且仅链接到主程序，而 .cpp 文件正常编译。

例如，在 MarchingCubes_kernel.cu 中，有函数体：

extern "C" void
launch_classifyVoxel( dim3 grid, dim3 threads, uint* voxelVerts, uint *voxelOccupied, uchar *volume,
                      uint3 gridSize, uint3 gridSizeShift, uint3 gridSizeMask, uint numVoxels,
                      float3 voxelSize, float isoValue)
{
    // calculate number of vertices need per voxel
    classifyVoxel<<<grid, threads>>>(voxelVerts, voxelOccupied, volume, 
                                     gridSize, gridSizeShift, gridSizeMask, 
                                     numVoxels, voxelSize, isoValue);
    cutilCheckMsg("classifyVoxel failed");
}

而在 MarchingCubes.cpp（main() 所在的位置）中，只有一个定义：

extern "C" void
launch_classifyVoxel( dim3 grid, dim3 threads, uint* voxelVerts, uint *voxelOccupied, uchar *volume,
                      uint3 gridSize, uint3 gridSizeShift, uint3 gridSizeMask, uint numVoxels,
                      float3 voxelSize, float isoValue);

您也可以将它们放入 .h 文件中。

If you look at the CUDA SDK code examples, they have extern C defines that reference functions compiled from .cu files. This way, the .cu files are compiled by nvcc and only linked into the main program while the .cpp files are compiled normally.

For example, in marchingCubes_kernel.cu has the function body:

extern "C" void
launch_classifyVoxel( dim3 grid, dim3 threads, uint* voxelVerts, uint *voxelOccupied, uchar *volume,
                      uint3 gridSize, uint3 gridSizeShift, uint3 gridSizeMask, uint numVoxels,
                      float3 voxelSize, float isoValue)
{
    // calculate number of vertices need per voxel
    classifyVoxel<<<grid, threads>>>(voxelVerts, voxelOccupied, volume, 
                                     gridSize, gridSizeShift, gridSizeMask, 
                                     numVoxels, voxelSize, isoValue);
    cutilCheckMsg("classifyVoxel failed");
}

While in marchingCubes.cpp (where main() resides) just has a definition:

extern "C" void
launch_classifyVoxel( dim3 grid, dim3 threads, uint* voxelVerts, uint *voxelOccupied, uchar *volume,
                      uint3 gridSize, uint3 gridSizeShift, uint3 gridSizeMask, uint numVoxels,
                      float3 voxelSize, float isoValue);

You can put these in a .h file too.

回复收藏 0 原文

黑寡妇 2024-08-25 16:28:59

获得分离实际上非常简单，请查看这个答案了解如何设置。然后，您只需将主机代码放入 .cpp 文件中，将设备代码放入 .cu 文件中，构建规则告诉 Visual Studio 如何将它们链接到最终的可执行文件中。

代码中最直接的问题是您定义了 __global__ TestDevice 函数两次，一次是在您 #include MyKernel.cu 时，另一次是在您独立编译 MyKernel.cu 时。

您还需要将包装器放入 .cu 文件中 - 目前您正在从主函数调用 TestDevice<<<>>> ，但是当您将其移动到.cpp 文件将使用 cl.exe 进行编译，而 cl.exe 不理解 <<<>>> 语法。因此，您只需在 .cpp 文件中调用 TestDeviceWrapper(griddim, blockdim, params) 并在 .cu 文件中提供此函数即可。

如果您想要一个示例，SDK 中的 SobolQRNG 示例可以实现很好的分离，尽管它仍然使用 cutil，而且我始终建议避免使用 cutil。

回复收藏 0 原文

榆西 2024-08-25 16:28:59

简单的解决方案是关闭 MyKernel.cu 文件的构建。

属性->一般->从构建中排除在

我看来，更好的解决方案是将内核拆分为 cu 和 cuh 文件，并包含该文件，例如

//kernel.cu
#include "kernel.cuh"
#include <cuda_runtime.h>

__global__ void increment_by_one_kernel(int* vals) {
  vals[threadIdx.x] += 1;
}

void increment_by_one(int* a) {
  int* a_d;

  cudaMalloc(&a_d, 1);
  cudaMemcpy(a_d, a, 1, cudaMemcpyHostToDevice);
  increment_by_one_kernel<<<1, 1>>>(a_d);
  cudaMemcpy(a, a_d, 1, cudaMemcpyDeviceToHost);

  cudaFree(a_d);
}

：

//kernel.cuh
#pragma once

void increment_by_one(int* a);

//main.cpp
#include "kernel.cuh"

int main() {
  int a[] = {1};

  increment_by_one(a);

  return 0;
}

The simple solution is to turn off building of your MyKernel.cu file.

Properties -> General -> Excluded from build

The better solution imo is to split your kernel into a cu and a cuh file, and include that, for example:

//kernel.cu
#include "kernel.cuh"
#include <cuda_runtime.h>

__global__ void increment_by_one_kernel(int* vals) {
  vals[threadIdx.x] += 1;
}

void increment_by_one(int* a) {
  int* a_d;

  cudaMalloc(&a_d, 1);
  cudaMemcpy(a_d, a, 1, cudaMemcpyHostToDevice);
  increment_by_one_kernel<<<1, 1>>>(a_d);
  cudaMemcpy(a, a_d, 1, cudaMemcpyDeviceToHost);

  cudaFree(a_d);
}

//kernel.cuh
#pragma once

void increment_by_one(int* a);

//main.cpp
#include "kernel.cuh"

int main() {
  int a[] = {1};

  increment_by_one(a);

  return 0;
}

回复收藏 0 原文

~没有更多了~

关于作者

櫻之舞

暂无简介

0 文章

0 评论

22 人气

关注发私信

友情链接

文江博客

如何将CUDA代码分成多个文件

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

謌踐踏愛綪

开始看清了

高速公鹿

alipaysp_PLnULTzf66

热情消退

白色月光

友情链接

如何将CUDA代码分成多个文件

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

謌踐踏愛綪

开始看清了

高速公鹿

alipaysp_PLnULTzf66

热情消退

白色月光

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。