用推力调用手写的CUDA内核

发布于 2024-08-24 18:17:57 字数 1106 浏览 13 评论 0原文

由于我需要使用 CUDA 对大量数字进行排序,因此我使用推力。到目前为止,一切都很好......但是当我想调用一个“手写”内核,并有一个包含数据的 Thrust::host_vector 时该怎么办?

我的方法是(缺少备份):

int CUDA_CountAndAdd_Kernel(thrust::host_vector<float> *samples, thrust::host_vector<int> *counts, int n) {

 thrust::device_ptr<float> dSamples = thrust::device_malloc<float>(n);
 thrust::copy(samples->begin(), samples->end(), dSamples);

 thrust::device_ptr<int> dCounts = thrust::device_malloc<int>(n);
 thrust::copy(counts->begin(), counts->end(), dCounts);

 float *dSamples_raw = thrust::raw_pointer_cast(dSamples);
 int *dCounts_raw = thrust::raw_pointer_cast(dCounts);

 CUDA_CountAndAdd_Kernel<<<1, n>>>(dSamples_raw, dCounts_raw);

 thrust::device_free(dCounts);
 thrust::device_free(dSamples);
}

内核看起来像:

__global__ void CUDA_CountAndAdd_Kernel_Device(float *samples, int *counts) 

但是编译失败并显示:

错误:“float **”类型的参数是 与类型参数不兼容 “推力::host_vector> *”

啊?!我以为我正在提供 float 和 int 原始指针?或者我错过了什么?

since i needed to sort large arrays of numbers with CUDA, i came along with using thrust. So far, so good...but what when i want to call a "handwritten" kernel, having a thrust::host_vector containing the data?

My approach was (backcopy is missing):

int CUDA_CountAndAdd_Kernel(thrust::host_vector<float> *samples, thrust::host_vector<int> *counts, int n) {

 thrust::device_ptr<float> dSamples = thrust::device_malloc<float>(n);
 thrust::copy(samples->begin(), samples->end(), dSamples);

 thrust::device_ptr<int> dCounts = thrust::device_malloc<int>(n);
 thrust::copy(counts->begin(), counts->end(), dCounts);

 float *dSamples_raw = thrust::raw_pointer_cast(dSamples);
 int *dCounts_raw = thrust::raw_pointer_cast(dCounts);

 CUDA_CountAndAdd_Kernel<<<1, n>>>(dSamples_raw, dCounts_raw);

 thrust::device_free(dCounts);
 thrust::device_free(dSamples);
}

The kernel looks like:

__global__ void CUDA_CountAndAdd_Kernel_Device(float *samples, int *counts) 

But compilation fails with:

error: argument of type "float **" is
incompatible with parameter of type
"thrust::host_vector> *"

Huh?! I thought i was giving float and int raw-pointers? Or am i missing something?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

鹿港小镇 2024-08-31 18:17:57

您正在使用调用所在函数的名称来调用内核,而不是内核的名称 - 因此参数不匹配。

改变:

CUDA_CountAndAdd_Kernel<<<1, n>>>(dSamples_raw, dCounts_raw);

看看

CUDA_CountAndAdd_Kernel_Device<<<1, n>>>(dSamples_raw, dCounts_raw);

会发生什么。

You are calling the kernel with the name of the function the call is in, not the name of the kernel - hence the parameter mismatch.

Change:

CUDA_CountAndAdd_Kernel<<<1, n>>>(dSamples_raw, dCounts_raw);

to

CUDA_CountAndAdd_Kernel_Device<<<1, n>>>(dSamples_raw, dCounts_raw);

and see what happens.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文