用推力调用手写的CUDA内核

发布于 2024-08-24 18:17:57 字数 1106 浏览 15 评论 0原文

由于我需要使用 CUDA 对大量数字进行排序，因此我使用推力。到目前为止，一切都很好......但是当我想调用一个“手写”内核，并有一个包含数据的 Thrust::host_vector 时该怎么办？

我的方法是（缺少备份）：

int CUDA_CountAndAdd_Kernel(thrust::host_vector<float> *samples, thrust::host_vector<int> *counts, int n) {

 thrust::device_ptr<float> dSamples = thrust::device_malloc<float>(n);
 thrust::copy(samples->begin(), samples->end(), dSamples);

 thrust::device_ptr<int> dCounts = thrust::device_malloc<int>(n);
 thrust::copy(counts->begin(), counts->end(), dCounts);

 float *dSamples_raw = thrust::raw_pointer_cast(dSamples);
 int *dCounts_raw = thrust::raw_pointer_cast(dCounts);

 CUDA_CountAndAdd_Kernel<<<1, n>>>(dSamples_raw, dCounts_raw);

 thrust::device_free(dCounts);
 thrust::device_free(dSamples);
}

内核看起来像：

__global__ void CUDA_CountAndAdd_Kernel_Device(float *samples, int *counts)

但是编译失败并显示：

错误：“float **”类型的参数是与类型参数不兼容 “推力::host_vector> *”

啊？！我以为我正在提供 float 和 int 原始指针？或者我错过了什么？

原文

since i needed to sort large arrays of numbers with CUDA, i came along with using thrust. So far, so good...but what when i want to call a "handwritten" kernel, having a thrust::host_vector containing the data?

My approach was (backcopy is missing):

int CUDA_CountAndAdd_Kernel(thrust::host_vector<float> *samples, thrust::host_vector<int> *counts, int n) {

 thrust::device_ptr<float> dSamples = thrust::device_malloc<float>(n);
 thrust::copy(samples->begin(), samples->end(), dSamples);

 thrust::device_ptr<int> dCounts = thrust::device_malloc<int>(n);
 thrust::copy(counts->begin(), counts->end(), dCounts);

 float *dSamples_raw = thrust::raw_pointer_cast(dSamples);
 int *dCounts_raw = thrust::raw_pointer_cast(dCounts);

 CUDA_CountAndAdd_Kernel<<<1, n>>>(dSamples_raw, dCounts_raw);

 thrust::device_free(dCounts);
 thrust::device_free(dSamples);
}

The kernel looks like:

__global__ void CUDA_CountAndAdd_Kernel_Device(float *samples, int *counts)

But compilation fails with:

error: argument of type "float **" is
incompatible with parameter of type
"thrust::host_vector> *"

Huh?! I thought i was giving float and int raw-pointers? Or am i missing something?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

鹿港小镇 2024-08-31 18:17:57

您正在使用调用所在函数的名称来调用内核，而不是内核的名称 - 因此参数不匹配。

改变：

CUDA_CountAndAdd_Kernel<<<1, n>>>(dSamples_raw, dCounts_raw);

看看

CUDA_CountAndAdd_Kernel_Device<<<1, n>>>(dSamples_raw, dCounts_raw);

会发生什么。

You are calling the kernel with the name of the function the call is in, not the name of the kernel - hence the parameter mismatch.

Change:

CUDA_CountAndAdd_Kernel<<<1, n>>>(dSamples_raw, dCounts_raw);

CUDA_CountAndAdd_Kernel_Device<<<1, n>>>(dSamples_raw, dCounts_raw);

and see what happens.

回复收藏 0 原文

~没有更多了~

关于作者

笑梦风尘

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

用推力调用手写的CUDA内核

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

梦途

成熟稳重的好男人

蓝眼睛不忧郁

134fengkuang

yang18

属性

友情链接

用推力调用手写的CUDA内核

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

梦途

成熟稳重的好男人

蓝眼睛不忧郁

134fengkuang

yang18

属性

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。