cuda 和 c++问题

发布于 2024-10-23 23:05:11 字数 1131 浏览 1 评论 0原文

你好，我有一个运行成功的 cuda 程序这是 cuda 程序的代码，

#include <stdio.h>
#include <cuda.h>

    __global__ void square_array(float *a, int N)
    {
      int idx = blockIdx.x * blockDim.x + threadIdx.x;
      if (idx<N) 
       a[idx] = a[idx] * a[idx];
    }

    int main(void)
    {
      float *a_h, *a_d; 
      const int N = 10;  
      size_t size = N * sizeof(float);
      a_h = (float *)malloc(size);        
      cudaMalloc((void **) &a_d, size);   
      for (int i=0; i<N; i++) a_h[i] = (float)i;
      cudaMemcpy(a_d, a_h, size, cudaMemcpyHostToDevice);
      int block_size = 4;
      int n_blocks = N/block_size + (N%block_size == 0 ? 0:1);
      square_array <<< n_blocks, block_size >>> (a_d, N);

      cudaMemcpy(a_h, a_d, sizeof(float)*N, cudaMemcpyDeviceToHost);
      // Print results
      for (int i=0; i<N; i++) printf("%d %f\n", i, a_h[i]);

      free(a_h); 
      cudaFree(a_d);
    }

现在我想将此代码拆分为两个文件，这意味着应该有两个文件，一个用于 C++ 代码或 C 代码，另一个用于内核的 .cu 文件。我只是想这样做是为了学习，我不想一次又一次地编写相同的内核代码。谁能告诉我该怎么做？如何将此代码拆分为两个不同的文件？比如何编译它？如何为其编写makefile？如何

原文

hi i have a cuda program which run successfully
here is code for cuda program

#include <stdio.h>
#include <cuda.h>

    __global__ void square_array(float *a, int N)
    {
      int idx = blockIdx.x * blockDim.x + threadIdx.x;
      if (idx<N) 
       a[idx] = a[idx] * a[idx];
    }

    int main(void)
    {
      float *a_h, *a_d; 
      const int N = 10;  
      size_t size = N * sizeof(float);
      a_h = (float *)malloc(size);        
      cudaMalloc((void **) &a_d, size);   
      for (int i=0; i<N; i++) a_h[i] = (float)i;
      cudaMemcpy(a_d, a_h, size, cudaMemcpyHostToDevice);
      int block_size = 4;
      int n_blocks = N/block_size + (N%block_size == 0 ? 0:1);
      square_array <<< n_blocks, block_size >>> (a_d, N);

      cudaMemcpy(a_h, a_d, sizeof(float)*N, cudaMemcpyDeviceToHost);
      // Print results
      for (int i=0; i<N; i++) printf("%d %f\n", i, a_h[i]);

      free(a_h); 
      cudaFree(a_d);
    }

now i want to split this code into two files means there should be two file one for c++ code or c code and other one .cu file for kernel. i just wanat to do it for learning and i don't want to write same kernel code again and again.
can any one tell me how to do this ?
how to split this code into two different file?
than how to compile it?
how to write makefile for it ?
how to

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

执手闯天涯 2024-10-30 23:05:11

具有 CUDA C 扩展名的代码必须位于 *.cu 文件中，其余的可以位于 c++ 文件中。

因此，您的内核代码可以移动到单独的 *.cu 文件中。

要在 C++ 文件中实现主函数，您需要使用 C++ 函数包装内核调用（使用 square_array<<<...>>>(...); 的代码）哪个实现也需要在 *cu 文件中。

只要包含正确的 cuda 标头，cudaMalloc 等函数就可以保留在 c++ 文件中。

回复收藏 0 原文

醉态萌生 2024-10-30 23:05:11

您最有可能遇到的最大障碍是 - 如何从 cpp 文件调用内核。 C++ 不会理解你的 <<< >>> 语法。有 3 种方法可以做到这一点。

只需在您的 .cu 文件中编写一个小的封装主机函数
使用 CUDA库函数 (cudaConfigureCall、cudaFuncGetAttributes、cudaLaunch) --- 有关详细信息，请参阅 Cuda 参考手册的“执行控制”一章在线版本。只要包含 cuda 库，您就可以在纯 C++ 代码中使用这些函数。
在运行时包含 PTX。它更难，但允许您在运行时操作 PTX 代码。这种 JIT 方法在 Cuda 编程指南（第 3.3.2 章）和 Cuda 参考手册（模块管理章节）中进行了解释在线版本

封装函数可能如下所示：

mystuff.cu:

... //your device square_array function

void host_square_array(dim3 grid, dim3 block, float *deviceA, int N) {
  square_array <<< grid, block >>> (deviceA, N);
}

mystuff.h

#include <cuda.h>
void host_square_array(dim3 grid, dim3 block, float *deviceA, int N);

mymain.cpp

#include "mystuff.h"

int main() { ... //your normal host code
}

The biggest obstacle that you will most likely encounter is to - how to call your kernel from your cpp file. C++ will not understand your <<< >>> syntax. There are 3 ways of doing it.

Just write a small encapsulating host function in your .cu file
Use CUDA library functions (cudaConfigureCall, cudaFuncGetAttributes, cudaLaunch) --- check Cuda Reference Manual for details, chapter "Execution Control" online version. You can use those functions in plain C++ code, as long as you include the cuda libraries.
Include PTX at runtime. It is harder, but allows you to manipulate PTX code at runtime. This JIT approach is explained in Cuda Programming Guide (chapter 3.3.2) and in Cuda Reference Manual (Module Management chapter) online version

Encapsilating function could look like this for example:

mystuff.cu:

... //your device square_array function

void host_square_array(dim3 grid, dim3 block, float *deviceA, int N) {
  square_array <<< grid, block >>> (deviceA, N);
}

mystuff.h

#include <cuda.h>
void host_square_array(dim3 grid, dim3 block, float *deviceA, int N);

mymain.cpp

#include "mystuff.h"

int main() { ... //your normal host code
}

回复收藏 0 原文

~没有更多了~

关于作者

拍不死你

暂无简介

0 文章

0 评论

599 人气

关注发私信

友情链接

文江博客

cuda 和 c++问题

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

烙印

singlesman

给自己一个微笑

独孤求败

晨钟暮鼓

我是自愿种绣球花的

友情链接

cuda 和 c++问题

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

烙印

singlesman

给自己一个微笑

独孤求败

晨钟暮鼓

我是自愿种绣球花的

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。