如何在 Thrust 中减少一维二维数据
我是 CUDA 和推力库的新手。我正在学习并尝试实现一个函数,该函数将有一个 for 循环执行推力函数。有没有办法将这个循环转换为另一个推力函数?或者我应该使用 CUDA 内核来实现这一点?
我想出了这样的代码
// thrust functor
struct GreaterthanX
{
const float _x;
GreaterthanX(float x) : _x(x) {}
__host__ __device__ bool operator()(const float &a) const
{
return a > _x;
}
};
int main(void)
{
// fill a device_vector with
// 3 2 4 5
// 0 -2 3 1
// 9 8 7 6
int row = 3;
int col = 4;
thrust::device_vector<int> vec(row * col);
thrust::device_vector<int> count(row);
vec[0] = 3;
vec[1] = 2;
vec[2] = 4;
vec[3] = 5;
vec[4] = 0;
vec[5] = -2;
vec[6] = 3;
vec[7] = 1;
vec[8] = 9;
vec[9] = 8;
vec[10] = 7;
vec[11] = 6;
// Goal: For each row, count the number of elements greater than 2.
// And then find the row with the max count
// count the element greater than 2 in vec
for (int i = 0; i < row; i++)
{
count[i] = thrust::count_if(vec.begin(), vec.begin() + i * col, GreaterthanX(2));
}
thrust::device_vector<int>::iterator result = thrust::max_element(count.begin(), count.end());
int max_val = *result;
unsigned int position = result - count.begin();
printf("result = %d at position %d\r\n", max_val, position);
// result = 4 at position 2
return 0;
}
我的目标是找到具有最多大于 2 的元素的行。我正在努力解决如何在没有循环的情况下做到这一点。任何建议将不胜感激。谢谢。
I'm new to CUDA and the thrust library. I'm learning and trying to implement a function that will have a for loop doing a thrust function. Is there a way to convert this loop into another thrust function? Or should I use a CUDA kernel to achieve this?
I have come up with code like this
// thrust functor
struct GreaterthanX
{
const float _x;
GreaterthanX(float x) : _x(x) {}
__host__ __device__ bool operator()(const float &a) const
{
return a > _x;
}
};
int main(void)
{
// fill a device_vector with
// 3 2 4 5
// 0 -2 3 1
// 9 8 7 6
int row = 3;
int col = 4;
thrust::device_vector<int> vec(row * col);
thrust::device_vector<int> count(row);
vec[0] = 3;
vec[1] = 2;
vec[2] = 4;
vec[3] = 5;
vec[4] = 0;
vec[5] = -2;
vec[6] = 3;
vec[7] = 1;
vec[8] = 9;
vec[9] = 8;
vec[10] = 7;
vec[11] = 6;
// Goal: For each row, count the number of elements greater than 2.
// And then find the row with the max count
// count the element greater than 2 in vec
for (int i = 0; i < row; i++)
{
count[i] = thrust::count_if(vec.begin(), vec.begin() + i * col, GreaterthanX(2));
}
thrust::device_vector<int>::iterator result = thrust::max_element(count.begin(), count.end());
int max_val = *result;
unsigned int position = result - count.begin();
printf("result = %d at position %d\r\n", max_val, position);
// result = 4 at position 2
return 0;
}
My goal is to find the row that has the most elements greater than 2. I'm struggling at how to do this without a loop. Any suggestions would be very appreciated. Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
使用 Thrust 的解决方案
这是使用
thrust::reduce_by_key
与多个“奇特迭代器”结合使用的实现。为了优雅和可读性,我还自由地添加了一些
const
、auto
和 lambda。由于 lambda,您需要对nvcc
使用-extended-lambda
标志。使用 MatX 的额外解决方案
正如评论中提到的,NVIDIA 发布了一个新的高级 C++17 库,名为 MatX< /a> 其目标涉及(密集)多维数据(即张量)的问题。该库试图将 CUFFT、CUSOLVER 和 CUTLASS 等多个低级库统一到一个类似 python/matlab 的界面中。在撰写本文时(v0.2.2),该库仍处于初始开发阶段,因此可能无法保证稳定的 API。因此,性能不如更成熟的 Thrust 库那样优化,并且文档/示例也不是很详尽,MatX 还不应该在生产代码中使用。在构建这个解决方案时,我实际上偶然发现了一个 bug ,该错误立即得到修复。因此,此代码仅适用于主分支,不适用于当前版本 v0.2.2,并且某些使用的功能可能不会出现在 文档尚未。
使用 MatX 的解决方案如下所示:
由于 MatX 采用延迟执行运算符,
matx::as_int(tensor > 2)
有效地融合到内核中,实现与使用thrust 相同的效果: Thrust 中的 :transform_iterator
。由于 MatX 了解问题的规律性,而 Thrust 不了解,因此 MatX 解决方案可能比 Thrust 解决方案性能更高。它当然更加优雅。也可以在已经分配的内存中构造张量,因此可以混合这些库,例如我通过传递在名为
vec
的thrust::vector
的内存中构造一个张量thrust::raw_pointer_cast(vec.data())
到张量的构造函数。Solution using Thrust
Here is an implementation using
thrust::reduce_by_key
in conjunction with multiple "fancy iterators".I also took the freedom to sprinkle in some
const
,auto
and lambdas for elegance and readability. Due to the lambdas, you will need to use the-extended-lambda
flag fornvcc
.Bonus solution using MatX
As mentioned in the comments NVIDIA has released a new high-level, C++17 library called MatX which targets problems involving (dense) multi-dimensional data (i.e. tensors). The library tries to unify multiple low-level libraries like CUFFT, CUSOLVER and CUTLASS in one python-/matlab-like interface. At the point of this writing (v0.2.2) the library is still in initial development and therefore probably doesn't guarantee a stable API. Due to this, the performance not being as optimized as with the more mature Thrust library and the documentation/samples not being quite exhaustive, MatX should not be used in production code yet. While constructing this solution I actually stumbled upon a bug which was instantly fixed. So this code will only work on the main branch and not with the current release v0.2.2 and some used features might not appear in the documentation yet.
A solution using MatX looks the following way:
As MatX employs deferred execution operators,
matx::as_int(tensor > 2)
is effectively fused into the kernel achieving the same as using athrust::transform_iterator
in Thrust.Due to MatX knowing about the regularity of the problem while Thrust does not, the MatX solution could potentially be more performant than the Thrust solution. It certainly is more elegant. It is also possible to construct tensors in already allocated memory, so one can mix the libraries e.g. my constructing a tensor in the memory of a
thrust::vector
namedvec
via passingthrust::raw_pointer_cast(vec.data())
to the constructor of the tensor.