cuda 和 c++问题

发布于 2024-10-23 23:05:11 字数 1131 浏览 1 评论 0原文

你好,我有一个运行成功的 cuda 程序 这是 cuda 程序的代码,

#include <stdio.h>
#include <cuda.h>

    __global__ void square_array(float *a, int N)
    {
      int idx = blockIdx.x * blockDim.x + threadIdx.x;
      if (idx<N) 
       a[idx] = a[idx] * a[idx];
    }

    int main(void)
    {
      float *a_h, *a_d; 
      const int N = 10;  
      size_t size = N * sizeof(float);
      a_h = (float *)malloc(size);        
      cudaMalloc((void **) &a_d, size);   
      for (int i=0; i<N; i++) a_h[i] = (float)i;
      cudaMemcpy(a_d, a_h, size, cudaMemcpyHostToDevice);
      int block_size = 4;
      int n_blocks = N/block_size + (N%block_size == 0 ? 0:1);
      square_array <<< n_blocks, block_size >>> (a_d, N);

      cudaMemcpy(a_h, a_d, sizeof(float)*N, cudaMemcpyDeviceToHost);
      // Print results
      for (int i=0; i<N; i++) printf("%d %f\n", i, a_h[i]);

      free(a_h); 
      cudaFree(a_d);
    }

现在我想将此代码拆分为两个文件,这意味着应该有两个文件,一个用于 C++ 代码或 C 代码,另一个用于内核的 .cu 文件。我只是想这样做是为了学习,我不想一次又一次地编写相同的内核代码。 谁能告诉我该怎么做? 如何将此代码拆分为两个不同的文件? 比如何编译它? 如何为其编写makefile? 如何

hi i have a cuda program which run successfully
here is code for cuda program

#include <stdio.h>
#include <cuda.h>

    __global__ void square_array(float *a, int N)
    {
      int idx = blockIdx.x * blockDim.x + threadIdx.x;
      if (idx<N) 
       a[idx] = a[idx] * a[idx];
    }

    int main(void)
    {
      float *a_h, *a_d; 
      const int N = 10;  
      size_t size = N * sizeof(float);
      a_h = (float *)malloc(size);        
      cudaMalloc((void **) &a_d, size);   
      for (int i=0; i<N; i++) a_h[i] = (float)i;
      cudaMemcpy(a_d, a_h, size, cudaMemcpyHostToDevice);
      int block_size = 4;
      int n_blocks = N/block_size + (N%block_size == 0 ? 0:1);
      square_array <<< n_blocks, block_size >>> (a_d, N);

      cudaMemcpy(a_h, a_d, sizeof(float)*N, cudaMemcpyDeviceToHost);
      // Print results
      for (int i=0; i<N; i++) printf("%d %f\n", i, a_h[i]);

      free(a_h); 
      cudaFree(a_d);
    }

now i want to split this code into two files means there should be two file one for c++ code or c code and other one .cu file for kernel. i just wanat to do it for learning and i don't want to write same kernel code again and again.
can any one tell me how to do this ?
how to split this code into two different file?
than how to compile it?
how to write makefile for it ?
how to

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

执手闯天涯 2024-10-30 23:05:11

具有 CUDA C 扩展名的代码必须位于 *.cu 文件中,其余的可以位于 c++ 文件中。

因此,您的内核代码可以移动到单独的 *.cu 文件中。

要在 C++ 文件中实现主函数,您需要使用 C++ 函数包装内核调用(使用 square_array<<<...>>>(...); 的代码)哪个实现也需要在 *cu 文件中。

只要包含正确的 cuda 标头,cudaMalloc 等函数就可以保留在 c++ 文件中。

Code which has CUDA C extensions has to be in *.cu file, rest can be in c++ file.

So here your kernel code can be moved to separate *.cu file.

To have main function implementation in c++ file you need to wrap invocation of kernel (code with square_array<<<...>>>(...);) with c++ function which implementation needs to be in *cu file as well.

Functions cudaMalloc etc. can be left in c++ file as long as you include proper cuda headers.

醉态萌生 2024-10-30 23:05:11

您最有可能遇到的最大障碍是 - 如何从 cpp 文件调用内核。 C++ 不会理解你的 <<< >>> 语法。有 3 种方法可以做到这一点。

  • 只需在您的 .cu 文件中编写一个小的封装主机函数

  • 使用 CUDA库函数 (cudaConfigureCallcudaFuncGetAttributescudaLaunch) --- 有关详细信息,请参阅 Cuda 参考手册的“执行控制”一章 在线版本。只要包含 cuda 库,您就可以在纯 C++ 代码中使用这些函数。

  • 在运行时包含 PTX。它更难,但允许您在运行时操作 PTX 代码。这种 JIT 方法在 Cuda 编程指南(第 3.3.2 章)和 Cuda 参考手册(模块管理章节)中进行了解释 在线版本


封装函数可能如下所示:

mystuff.cu:

... //your device square_array function

void host_square_array(dim3 grid, dim3 block, float *deviceA, int N) {
  square_array <<< grid, block >>> (deviceA, N);
}

mystuff.h

#include <cuda.h>
void host_square_array(dim3 grid, dim3 block, float *deviceA, int N);

mymain.cpp

#include "mystuff.h"

int main() { ... //your normal host code
}

The biggest obstacle that you will most likely encounter is to - how to call your kernel from your cpp file. C++ will not understand your <<< >>> syntax. There are 3 ways of doing it.

  • Just write a small encapsulating host function in your .cu file

  • Use CUDA library functions (cudaConfigureCall, cudaFuncGetAttributes, cudaLaunch) --- check Cuda Reference Manual for details, chapter "Execution Control" online version. You can use those functions in plain C++ code, as long as you include the cuda libraries.

  • Include PTX at runtime. It is harder, but allows you to manipulate PTX code at runtime. This JIT approach is explained in Cuda Programming Guide (chapter 3.3.2) and in Cuda Reference Manual (Module Management chapter) online version


Encapsilating function could look like this for example:

mystuff.cu:

... //your device square_array function

void host_square_array(dim3 grid, dim3 block, float *deviceA, int N) {
  square_array <<< grid, block >>> (deviceA, N);
}

mystuff.h

#include <cuda.h>
void host_square_array(dim3 grid, dim3 block, float *deviceA, int N);

mymain.cpp

#include "mystuff.h"

int main() { ... //your normal host code
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文