如何将CUDA代码分成多个文件
我正在尝试将 CUDA 程序分成两个单独的 .cu 文件,以便更接近于用 C++ 编写真正的应用程序。我有一个简单的小程序:
在主机和设备上分配内存。
将主机数组初始化为一系列数字。 将主机阵列复制到设备阵列 使用设备内核查找数组中所有元素的平方 将设备阵列复制回主机阵列 打印结果
如果我将其全部放入一个 .cu 文件中并运行它,效果会很好。当我将它分成两个单独的文件时,我开始出现链接错误。就像我最近的所有问题一样,我知道这是一件小事,但它是什么?
KernelSupport.cu
#ifndef _KERNEL_SUPPORT_
#define _KERNEL_SUPPORT_
#include <iostream>
#include <MyKernel.cu>
int main( int argc, char** argv)
{
int* hostArray;
int* deviceArray;
const int arrayLength = 16;
const unsigned int memSize = sizeof(int) * arrayLength;
hostArray = (int*)malloc(memSize);
cudaMalloc((void**) &deviceArray, memSize);
std::cout << "Before device\n";
for(int i=0;i<arrayLength;i++)
{
hostArray[i] = i+1;
std::cout << hostArray[i] << "\n";
}
std::cout << "\n";
cudaMemcpy(deviceArray, hostArray, memSize, cudaMemcpyHostToDevice);
TestDevice <<< 4, 4 >>> (deviceArray);
cudaMemcpy(hostArray, deviceArray, memSize, cudaMemcpyDeviceToHost);
std::cout << "After device\n";
for(int i=0;i<arrayLength;i++)
{
std::cout << hostArray[i] << "\n";
}
cudaFree(deviceArray);
free(hostArray);
std::cout << "Done\n";
}
#endif
MyKernel.cu
#ifndef _MY_KERNEL_
#define _MY_KERNEL_
__global__ void TestDevice(int *deviceArray)
{
int idx = blockIdx.x*blockDim.x + threadIdx.x;
deviceArray[idx] = deviceArray[idx]*deviceArray[idx];
}
#endif
构建日志:
1>------ Build started: Project: CUDASandbox, Configuration: Debug x64 ------
1>Compiling with CUDA Build Rule...
1>"C:\CUDA\bin64\nvcc.exe" -arch sm_10 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin" -Xcompiler "/EHsc /W3 /nologo /O2 /Zi /MT " -maxrregcount=32 --compile -o "x64\Debug\KernelSupport.cu.obj" "d:\Stuff\Programming\Visual Studio 2008\Projects\CUDASandbox\CUDASandbox\KernelSupport.cu"
1>KernelSupport.cu
1>tmpxft_000016f4_00000000-3_KernelSupport.cudafe1.gpu
1>tmpxft_000016f4_00000000-8_KernelSupport.cudafe2.gpu
1>tmpxft_000016f4_00000000-3_KernelSupport.cudafe1.cpp
1>tmpxft_000016f4_00000000-12_KernelSupport.ii
1>Linking...
1>KernelSupport.cu.obj : error LNK2005: __device_stub__Z10TestDevicePi already defined in MyKernel.cu.obj
1>KernelSupport.cu.obj : error LNK2005: "void __cdecl TestDevice__entry(int *)" (?TestDevice__entry@@YAXPEAH@Z) already defined in MyKernel.cu.obj
1>D:\Stuff\Programming\Visual Studio 2008\Projects\CUDASandbox\x64\Debug\CUDASandbox.exe : fatal error LNK1169: one or more multiply defined symbols found
1>Build log was saved at "file://d:\Stuff\Programming\Visual Studio 2008\Projects\CUDASandbox\CUDASandbox\x64\Debug\BuildLog.htm"
1>CUDASandbox - 3 error(s), 0 warning(s)
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========
我在 Windows 7 64 位上运行 Visual Studio 2008。
编辑:
我想我需要对此进行详细说明。我在这里寻找的最终结果是拥有一个普通的 C++ 应用程序,其中包含 Main.cpp 和 int main()
事件,并从那里开始运行。在我的 .cpp 代码中的某些点上,我希望能够引用 CUDA 位。所以我的想法(如果这里有更标准的约定,请纠正我)是,我将 CUDA 内核代码放入其 .cu 文件中,然后有一个支持 .cu 文件,该文件将负责与设备通信并调用核函数等等。
I am trying separate a CUDA program into two separate .cu files in effort to edge closer to writing a real app in C++. I have a simple little program that:
Allocates a memory on the host and the device.
Initializes the host array to a series of numbers.
Copies the host array to a device array
Finds the square of all the elements in the array using a device kernel
Copies the device array back to the host array
Prints the results
This works great if I put it all in one .cu file and run it. When I split it into two separate files I start getting linking errors. Like all my recent questions, I know this is something small, but what is it?
KernelSupport.cu
#ifndef _KERNEL_SUPPORT_
#define _KERNEL_SUPPORT_
#include <iostream>
#include <MyKernel.cu>
int main( int argc, char** argv)
{
int* hostArray;
int* deviceArray;
const int arrayLength = 16;
const unsigned int memSize = sizeof(int) * arrayLength;
hostArray = (int*)malloc(memSize);
cudaMalloc((void**) &deviceArray, memSize);
std::cout << "Before device\n";
for(int i=0;i<arrayLength;i++)
{
hostArray[i] = i+1;
std::cout << hostArray[i] << "\n";
}
std::cout << "\n";
cudaMemcpy(deviceArray, hostArray, memSize, cudaMemcpyHostToDevice);
TestDevice <<< 4, 4 >>> (deviceArray);
cudaMemcpy(hostArray, deviceArray, memSize, cudaMemcpyDeviceToHost);
std::cout << "After device\n";
for(int i=0;i<arrayLength;i++)
{
std::cout << hostArray[i] << "\n";
}
cudaFree(deviceArray);
free(hostArray);
std::cout << "Done\n";
}
#endif
MyKernel.cu
#ifndef _MY_KERNEL_
#define _MY_KERNEL_
__global__ void TestDevice(int *deviceArray)
{
int idx = blockIdx.x*blockDim.x + threadIdx.x;
deviceArray[idx] = deviceArray[idx]*deviceArray[idx];
}
#endif
Build Log:
1>------ Build started: Project: CUDASandbox, Configuration: Debug x64 ------
1>Compiling with CUDA Build Rule...
1>"C:\CUDA\bin64\nvcc.exe" -arch sm_10 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin" -Xcompiler "/EHsc /W3 /nologo /O2 /Zi /MT " -maxrregcount=32 --compile -o "x64\Debug\KernelSupport.cu.obj" "d:\Stuff\Programming\Visual Studio 2008\Projects\CUDASandbox\CUDASandbox\KernelSupport.cu"
1>KernelSupport.cu
1>tmpxft_000016f4_00000000-3_KernelSupport.cudafe1.gpu
1>tmpxft_000016f4_00000000-8_KernelSupport.cudafe2.gpu
1>tmpxft_000016f4_00000000-3_KernelSupport.cudafe1.cpp
1>tmpxft_000016f4_00000000-12_KernelSupport.ii
1>Linking...
1>KernelSupport.cu.obj : error LNK2005: __device_stub__Z10TestDevicePi already defined in MyKernel.cu.obj
1>KernelSupport.cu.obj : error LNK2005: "void __cdecl TestDevice__entry(int *)" (?TestDevice__entry@@YAXPEAH@Z) already defined in MyKernel.cu.obj
1>D:\Stuff\Programming\Visual Studio 2008\Projects\CUDASandbox\x64\Debug\CUDASandbox.exe : fatal error LNK1169: one or more multiply defined symbols found
1>Build log was saved at "file://d:\Stuff\Programming\Visual Studio 2008\Projects\CUDASandbox\CUDASandbox\x64\Debug\BuildLog.htm"
1>CUDASandbox - 3 error(s), 0 warning(s)
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========
I am running Visual Studio 2008 on Windows 7 64bit.
Edit:
I think I need to elaborate on this a little bit. The end result I am looking for here is to have a normal C++ application with something like Main.cpp with the int main()
event and have things run from there. At certains point in my .cpp code I want to be able to reference CUDA bits. So my thinking (and correct me if there a more standard convention here) is that I will put the CUDA Kernel code into their on .cu files, and then have a supporting .cu file that will take care of talking to the device and calling kernel functions and what not.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您将
mykernel.cu
包含在kernelsupport.cu
中,当您尝试链接时,编译器会看到 mykernel.cu 两次。您必须创建一个定义 TestDevice 的标头并将其包含在内。重新评论:
这样的东西应该可以工作
,然后更改包含文件以
重新编辑
,只要您在 c++ 代码中使用的标头没有任何 cuda 特定的内容(
__kernel__
,__global__
等)你应该可以很好地链接 c++ 和 cuda 代码。You are including
mykernel.cu
inkernelsupport.cu
, when you try to link the compiler sees mykernel.cu twice. You'll have to create a header defining TestDevice and include that instead.re comment:
Something like this should work
and then change the including file to
re your edit
As long as the header you use in c++ code doesn't have any cuda specific stuff (
__kernel__
,__global__
, etc) you should be fine linking c++ and cuda code.如果您查看 CUDA SDK 代码示例,它们具有 extern C 定义,引用从 .cu 文件编译的函数。这样,.cu 文件由 nvcc 编译,并且仅链接到主程序,而 .cpp 文件正常编译。
例如,在 MarchingCubes_kernel.cu 中,有函数体:
而在 MarchingCubes.cpp(main() 所在的位置)中,只有一个定义:
您也可以将它们放入 .h 文件中。
If you look at the CUDA SDK code examples, they have extern C defines that reference functions compiled from .cu files. This way, the .cu files are compiled by nvcc and only linked into the main program while the .cpp files are compiled normally.
For example, in marchingCubes_kernel.cu has the function body:
While in marchingCubes.cpp (where main() resides) just has a definition:
You can put these in a .h file too.
获得分离实际上非常简单,请查看 这个答案了解如何设置。然后,您只需将主机代码放入 .cpp 文件中,将设备代码放入 .cu 文件中,构建规则告诉 Visual Studio 如何将它们链接到最终的可执行文件中。
代码中最直接的问题是您定义了
__global__ TestDevice
函数两次,一次是在您#include
MyKernel.cu 时,另一次是在您独立编译 MyKernel.cu 时。您还需要将包装器放入 .cu 文件中 - 目前您正在从主函数调用
TestDevice<<<>>>
,但是当您将其移动到.cpp 文件将使用 cl.exe 进行编译,而 cl.exe 不理解<<<>>>
语法。因此,您只需在 .cpp 文件中调用 TestDeviceWrapper(griddim, blockdim, params) 并在 .cu 文件中提供此函数即可。如果您想要一个示例,SDK 中的 SobolQRNG 示例可以实现很好的分离,尽管它仍然使用 cutil,而且我始终建议避免使用 cutil。
Getting the separation is actually quite simple, please check out this answer for how to set it up. Then you simply put your host code in .cpp files and your device code in .cu files, the build rules tell Visual Studio how to link them together into the final executable.
The immediate problem in your code that you are defining the
__global__ TestDevice
function twice, once when you#include
MyKernel.cu and once when you compile the MyKernel.cu independently.You will need to put a wrapper into a .cu file too - at the moment you are calling
TestDevice<<<>>>
from your main function but when you move this into a .cpp file it will be compiled with cl.exe, which doesn't understand the<<<>>>
syntax. Therefore you would simply callTestDeviceWrapper(griddim, blockdim, params)
in the .cpp file and provide this function in your .cu file.If you want an example, the SobolQRNG sample in the SDK achieves nice separation, although it still uses cutil and I would always recommend avoiding cutil.
简单的解决方案是关闭 MyKernel.cu 文件的构建。
属性->一般->从构建中排除 在
我看来,更好的解决方案是将内核拆分为 cu 和 cuh 文件,并包含该文件,例如
:
The simple solution is to turn off building of your MyKernel.cu file.
Properties -> General -> Excluded from build
The better solution imo is to split your kernel into a cu and a cuh file, and include that, for example: