CUDA 4.0 在内核中使用指针 - 错误

发布于 2024-11-28 11:49:43 字数 897 浏览 1 评论 0原文

我的问题如下：

我希望以两种方式使用内核。

我使用一个已使用 cudaMemcpy 复制的数组 d_array，即通过

cutilSafeCall(cudaMemcpy(d_array, array,  100*sizeof(double),
                         cudaMemcpyHostToDevice));

或者

我直接输入一个双精度 mydouble ，即 double mydouble = 3;

如果我输入我简单使用的数组（效果很好）：

kernel<<<1, 100>>>(d_array, 100, output);

如果我输入一个双精度，我使用（ 不能正常工作！！！）：

kernel<<<1, 100>>>(&mydouble, 1, output);

我的内核如下所示：

___global___ void kernel(double * d_array, int size_d_array, double * output)
{
  double a;

  if (size_d_array == 100) 
    {output[threadIdx.x] = d_array[threadIdx.x];}

  else
    {output a[threadIdx.x] = d_array[0];} 
}

原文

my question is as follows:

I wish to use a kernel in two ways.

I use an array d_array that has been copied over using cudaMemcpy, i.e. through

cutilSafeCall(cudaMemcpy(d_array, array,  100*sizeof(double),
                         cudaMemcpyHostToDevice));

I input a double mydouble directly i.e. double mydouble = 3;

If I input the array I simply use (which works fine):

kernel<<<1, 100>>>(d_array, 100, output);

If I input a double I use (which doesn't work fine!!!!):

kernel<<<1, 100>>>(&mydouble, 1, output);

My kernel is listed below:

___global___ void kernel(double * d_array, int size_d_array, double * output)
{
  double a;

  if (size_d_array == 100) 
    {output[threadIdx.x] = d_array[threadIdx.x];}

  else
    {output a[threadIdx.x] = d_array[0];} 
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

夜清冷一曲。 2024-12-05 11:49:43

double aDouble = 3;
double *myDouble = &double;

如果您在主机代码中执行上述操作，则 myDouble 是指向主机内存的指针。这就是为什么你不能将它直接传递给设备内核（指针就是指针，无论它指向数组还是标量值！）。

但是，在 CUDA 4.0 中，如果您的系统支持统一虚拟寻址，然后就可以传递给内核了。如果没有，那么你可以调用 cudaHostRegister 带有适当的标志，然后 cudaHostGetDevicePointer 来获取可以传递给设备内核的指针。请参阅 CUDA 文档

double aDouble = 3;
double *myDouble = &double;

If you do the above in host code, then myDouble is a pointer to host memory. That is why you can't pass it directly to a device kernel (a pointer is a pointer, whether it points to an array or a scalar value!).

However in CUDA 4.0 you can call cudaHostRegister on the host pointer and if your system supports unified virtual addressing, then you can pass it to the kernel. If it does not, then you can call cudaHostRegister with appropriate flags and then cudaHostGetDevicePointer to get a pointer you can pass to the device kernel. See the CUDA documentation on

回复收藏 0 原文

~没有更多了~