快速处理交错数据
在推力中处理交错数据的最佳方法是什么,假设我想添加交错长度等于 3 的值,例如:
[1, 2, 3, 4, 5, 6]
将给出
[6, 15]
或解交错数据,所以
[1, 2, 3, 4, 5, 6, 7, 8, 9]
会
[1, 4, 7, 2, 5, 8, 3, 6, 9]
感谢
what is the best way to work with interleaved data in thrust, say I want to add the values with interleave length equal to 3, for example:
[1, 2, 3, 4, 5, 6]
would give
[6, 15]
or deinterleaving the data, so
[1, 2, 3, 4, 5, 6, 7, 8, 9]
would give
[1, 4, 7, 2, 5, 8, 3, 6, 9]
thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这里有两个问题。第一个问题询问如何对数据集执行结构化缩减,第二个问题询问如何在给定映射的情况下对数据集重新排序。
第一个问题可以通过将数据集逻辑划分为规则大小的子集的集合,然后对每个子集执行约简来解决。简而言之,这可以通过将
reduce_by_key
与转换后的counting_iterator
相结合来完成。这个想法是用每个数据子集的索引来“键入”每个数据。reduce_by_key
对具有相同键的每个连续数据求和。第二个问题可以通过排列数据集的顺序来解决。您可以通过调用
gather
来完成此操作。这里,转换后的 counting_iterator 可以将索引从原始数组到置换数组的映射进行通信。您还可以使用permutation_iterator
将此类收集操作与其他算法(例如transform
)融合。请查看示例程序,了解如何执行此操作的想法。也就是说,由于内存合并问题,在 GPU 上排列数组的成本很高,因此您应该谨慎行事。
这是解决您的两个问题的完整程序:
以及输出:
There are two questions here. The first asks how to perform a structured reduction on a data set, and the second asks how to reorder a data set given a mapping.
The first problem can be solved by logically partitioning the data set into a collection of regularly-sized subsets, and then performing a reduction on each subset. In thrust, this can be done by combining
reduce_by_key
with a transformedcounting_iterator
. The idea is to "key" each datum with the index of its subset.reduce_by_key
sums every contiguous datum with equal key.The second problem can be solved by permuting the order of the data set. You can do this with a call to
gather
. Here, a transformedcounting_iterator
can communicate the mapping of indices from the original array into the permuted array. You can also fuse such a gather operation with other algorithms (such astransform
) using apermutation_iterator
. Check the example program for ideas on how to do so.That said, permuting an array is costly on a GPU due to memory coalescing issues, so you should do so sparingly.
Here's the full program solving your two problems:
And the output: