cuda:使用设备内存的全局变量

发布于 2024-12-25 05:16:03 字数 1531 浏览 1 评论 0原文

我有一个关于如何在 cuda 代码中使用正确变量的问题。我的程序有很多数组,需要在不同的函数中访问,我想避免传递它们并希望使用全局变量和 2D mallocpitch 数组,而不是展平的 1D 数组。所以,我正在考虑这样的事情:

__device__ double * dataPtr ;
__device__ size_t dataPitch;
....
int main()
{
 double * dataPtrLoc; size_t dataPitchLoc;
cudaMallocPitch( (void**) &dataPtrLoc, &dataPitchLoc, width*sizeof(double), height);
cudaMemcpyToSymbol(dataPtr, &dataPtrLoc, sizeof(dataPtrLoc));
cudaMemcpyToSymbol(dataPitch, &dataPitchLoc, sizeof(dataPitchLoc));
...
}

这看起来是获取全局 2D 设备数据的好方法吗?你能给建议吗?

编辑:我制作了这个程序,它编译并运行良好:

#include <stdio.h>
__device__ int *d_gridPtr;
__device__ size_t d_gridPitch;

__device__ int valij(int ii, int jj)
{
  int* row = (int*)((char*)d_gridPtr + ii * d_gridPitch);
  return (row[jj]);
}

__global__ void printval()
{
  int val0, val1, val2, val3;
  val0= valij(0,0);
  val1= valij(0,1);
  val2= valij(1,0);
  val3= valij(1,1);
  printf("%d %d %d %d \n", val0, val1, val2, val3);
}

int main()
{
  size_t d_gridPitchLoc;
  int * d_gridPtrLoc;  
  cudaMallocPitch((void**)&d_gridPtrLoc, &d_gridPitchLoc, 2 * sizeof(int), 2);
  cudaMemcpyToSymbol(d_gridPtr, & d_gridPtrLoc, sizeof(d_gridPtrLoc));
  cudaMemcpyToSymbol(d_gridPitch, &d_gridPitchLoc, sizeof(float));

  int h_mem[2*2]={0,1,100,4};  
  size_t hostpitch = 2* sizeof(int);
  cudaMemcpy2D(d_gridPtrLoc,d_gridPitchLoc,h_mem,hostpitch,2*sizeof(int),2,cudaMemcpyHostToDevice );

  printval<<<1,1>>> ();
  cudaDeviceReset();  
}

I have a question regarding how to use proper variables in cuda code. My program has lot of arrays, which need to be accessed in different functions and I want to avoid passing them and want to use global variables and 2D mallocpitch arrays, instead of flattened 1D array. So, I am thinking of something like this:

__device__ double * dataPtr ;
__device__ size_t dataPitch;
....
int main()
{
 double * dataPtrLoc; size_t dataPitchLoc;
cudaMallocPitch( (void**) &dataPtrLoc, &dataPitchLoc, width*sizeof(double), height);
cudaMemcpyToSymbol(dataPtr, &dataPtrLoc, sizeof(dataPtrLoc));
cudaMemcpyToSymbol(dataPitch, &dataPitchLoc, sizeof(dataPitchLoc));
...
}

Does it look like a good way to have global 2D device data? Can you give suggestions?

Edit: I made this program and it compiles and runs fine:

#include <stdio.h>
__device__ int *d_gridPtr;
__device__ size_t d_gridPitch;

__device__ int valij(int ii, int jj)
{
  int* row = (int*)((char*)d_gridPtr + ii * d_gridPitch);
  return (row[jj]);
}

__global__ void printval()
{
  int val0, val1, val2, val3;
  val0= valij(0,0);
  val1= valij(0,1);
  val2= valij(1,0);
  val3= valij(1,1);
  printf("%d %d %d %d \n", val0, val1, val2, val3);
}

int main()
{
  size_t d_gridPitchLoc;
  int * d_gridPtrLoc;  
  cudaMallocPitch((void**)&d_gridPtrLoc, &d_gridPitchLoc, 2 * sizeof(int), 2);
  cudaMemcpyToSymbol(d_gridPtr, & d_gridPtrLoc, sizeof(d_gridPtrLoc));
  cudaMemcpyToSymbol(d_gridPitch, &d_gridPitchLoc, sizeof(float));

  int h_mem[2*2]={0,1,100,4};  
  size_t hostpitch = 2* sizeof(int);
  cudaMemcpy2D(d_gridPtrLoc,d_gridPitchLoc,h_mem,hostpitch,2*sizeof(int),2,cudaMemcpyHostToDevice );

  printval<<<1,1>>> ();
  cudaDeviceReset();  
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

晨敛清荷 2025-01-01 05:16:03

如果扭曲或块的所有线程同时访问相同的只读全局内存地址(例如数组索引),则考虑将该只读全局数据存储在__constant__内存数组中。如果写入数据,则不能使用__constant__

如果您的数组是只读的并且您的访问模式具有很强的 2D 局部性(在扭曲和/或块内),请考虑改用纹理。

If all threads of a warp or block access the same read-only global memory address (e.g. array index) at the same time, then consider storing that read-only global data in a __constant__ memory array instead. If you write to the data, then you can't use __constant__.

If your array is read-only and your access pattern has strong 2D locality (within a warp and/or block), consider using textures instead.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文