快速处理交错数据

发布于 2024-12-21 00:21:08 字数 285 浏览 4 评论 0原文

在推力中处理交错数据的最佳方法是什么，假设我想添加交错长度等于 3 的值，例如：

[1, 2, 3, 4, 5, 6]

将给出

[6, 15]

或解交错数据，所以

[1, 2, 3, 4, 5, 6, 7, 8, 9]

会

[1, 4, 7, 2, 5, 8, 3, 6, 9]

感谢

原文

what is the best way to work with interleaved data in thrust, say I want to add the values with interleave length equal to 3, for example:

[1, 2, 3, 4, 5, 6]

would give

[6, 15]

or deinterleaving the data, so

[1, 2, 3, 4, 5, 6, 7, 8, 9]

would give

[1, 4, 7, 2, 5, 8, 3, 6, 9]

thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

二手情话 2024-12-28 00:21:08

这里有两个问题。第一个问题询问如何对数据集执行结构化缩减，第二个问题询问如何在给定映射的情况下对数据集重新排序。

第一个问题可以通过将数据集逻辑划分为规则大小的子集的集合，然后对每个子集执行约简来解决。简而言之，这可以通过将reduce_by_key与转换后的counting_iterator相结合来完成。这个想法是用每个数据子集的索引来“键入”每个数据。 reduce_by_key 对具有相同键的每个连续数据求和。

第二个问题可以通过排列数据集的顺序来解决。您可以通过调用 gather 来完成此操作。这里，转换后的 counting_iterator 可以将索引从原始数组到置换数组的映射进行通信。您还可以使用 permutation_iterator 将此类收集操作与其他算法（例如 transform）融合。请查看示例程序，了解如何执行此操作的想法。

也就是说，由于内存合并问题，在 GPU 上排列数组的成本很高，因此您应该谨慎行事。

这是解决您的两个问题的完整程序：

#include <thrust/device_vector.h>
#include <thrust/reduce.h>
#include <thrust/gather.h>
#include <thrust/functional.h>

struct divide_by_three
  : thrust::unary_function<unsigned int, unsigned int>
{
  __host__ __device__
  unsigned int operator()(unsigned int i)
  {
    return i / 3;
  }
};

struct deinterleave_index
  : thrust::unary_function<unsigned int, unsigned int>
{
  __host__ __device__
  unsigned int operator()(unsigned int i)
  {
    return (i/3) + 3 * (i%3);
  }
};

int main()
{
  using namespace thrust;

  device_vector<int> example_one(6);
  example_one[0] = 1; example_one[1] = 2; example_one[2] = 3;
  example_one[3] = 4; example_one[4] = 5; example_one[5] = 6;

  // the result will have size two
  device_vector<int> example_one_result(2);

  // for each datum, associate an key, which is the datum's index divided by three
  // reduce the data by key
  reduce_by_key(make_transform_iterator(make_counting_iterator(0u), divide_by_three()),
                make_transform_iterator(make_counting_iterator(6u), divide_by_three()),
                example_one.begin(),
                thrust::make_discard_iterator(),
                example_one_result.begin());

  std::cout << "example one input:  [ ";
  thrust::copy(example_one.begin(), example_one.end(), std::ostream_iterator<int>(std::cout, " "));
  std::cout << "]" << std::endl;

  std::cout << "example one result: [ ";
  thrust::copy(example_one_result.begin(), example_one_result.end(), std::ostream_iterator<int>(std::cout, " "));
  std::cout << "]" << std::endl;


  device_vector<int> example_two(9);
  example_two[0] = 1; example_two[1] = 2; example_two[2] = 3;
  example_two[3] = 4; example_two[4] = 5; example_two[5] = 6;
  example_two[6] = 7; example_two[7] = 8; example_two[8] = 9;

  // the result will be the same size
  device_vector<int> example_two_result(9);

  // gather using the mapping defined by deinterleave_index
  gather(make_transform_iterator(make_counting_iterator(0u), deinterleave_index()),
         make_transform_iterator(make_counting_iterator(9u), deinterleave_index()),
         example_two.begin(),
         example_two_result.begin());

  std::cout << "example two input:  [ ";
  thrust::copy(example_two.begin(), example_two.end(), std::ostream_iterator<int>(std::cout, " "));
  std::cout << "]" << std::endl;

  std::cout << "example two result: [ ";
  thrust::copy(example_two_result.begin(), example_two_result.end(), std::ostream_iterator<int>(std::cout, " "));
  std::cout << "]" << std::endl;

  return 0;
}

以及输出：

$ nvcc test.cu -run
example one input:  [ 1 2 3 4 5 6 ]
example one result: [ 6 15 ]
example two input:  [ 1 2 3 4 5 6 7 8 9 ]
example two result: [ 1 4 7 2 5 8 3 6 9 ]

There are two questions here. The first asks how to perform a structured reduction on a data set, and the second asks how to reorder a data set given a mapping.

The first problem can be solved by logically partitioning the data set into a collection of regularly-sized subsets, and then performing a reduction on each subset. In thrust, this can be done by combining reduce_by_key with a transformed counting_iterator. The idea is to "key" each datum with the index of its subset. reduce_by_key sums every contiguous datum with equal key.

The second problem can be solved by permuting the order of the data set. You can do this with a call to gather. Here, a transformed counting_iterator can communicate the mapping of indices from the original array into the permuted array. You can also fuse such a gather operation with other algorithms (such as transform) using a permutation_iterator. Check the example program for ideas on how to do so.

That said, permuting an array is costly on a GPU due to memory coalescing issues, so you should do so sparingly.

Here's the full program solving your two problems:

#include <thrust/device_vector.h>
#include <thrust/reduce.h>
#include <thrust/gather.h>
#include <thrust/functional.h>

struct divide_by_three
  : thrust::unary_function<unsigned int, unsigned int>
{
  __host__ __device__
  unsigned int operator()(unsigned int i)
  {
    return i / 3;
  }
};

struct deinterleave_index
  : thrust::unary_function<unsigned int, unsigned int>
{
  __host__ __device__
  unsigned int operator()(unsigned int i)
  {
    return (i/3) + 3 * (i%3);
  }
};

int main()
{
  using namespace thrust;

  device_vector<int> example_one(6);
  example_one[0] = 1; example_one[1] = 2; example_one[2] = 3;
  example_one[3] = 4; example_one[4] = 5; example_one[5] = 6;

  // the result will have size two
  device_vector<int> example_one_result(2);

  // for each datum, associate an key, which is the datum's index divided by three
  // reduce the data by key
  reduce_by_key(make_transform_iterator(make_counting_iterator(0u), divide_by_three()),
                make_transform_iterator(make_counting_iterator(6u), divide_by_three()),
                example_one.begin(),
                thrust::make_discard_iterator(),
                example_one_result.begin());

  std::cout << "example one input:  [ ";
  thrust::copy(example_one.begin(), example_one.end(), std::ostream_iterator<int>(std::cout, " "));
  std::cout << "]" << std::endl;

  std::cout << "example one result: [ ";
  thrust::copy(example_one_result.begin(), example_one_result.end(), std::ostream_iterator<int>(std::cout, " "));
  std::cout << "]" << std::endl;


  device_vector<int> example_two(9);
  example_two[0] = 1; example_two[1] = 2; example_two[2] = 3;
  example_two[3] = 4; example_two[4] = 5; example_two[5] = 6;
  example_two[6] = 7; example_two[7] = 8; example_two[8] = 9;

  // the result will be the same size
  device_vector<int> example_two_result(9);

  // gather using the mapping defined by deinterleave_index
  gather(make_transform_iterator(make_counting_iterator(0u), deinterleave_index()),
         make_transform_iterator(make_counting_iterator(9u), deinterleave_index()),
         example_two.begin(),
         example_two_result.begin());

  std::cout << "example two input:  [ ";
  thrust::copy(example_two.begin(), example_two.end(), std::ostream_iterator<int>(std::cout, " "));
  std::cout << "]" << std::endl;

  std::cout << "example two result: [ ";
  thrust::copy(example_two_result.begin(), example_two_result.end(), std::ostream_iterator<int>(std::cout, " "));
  std::cout << "]" << std::endl;

  return 0;
}

And the output:

$ nvcc test.cu -run
example one input:  [ 1 2 3 4 5 6 ]
example one result: [ 6 15 ]
example two input:  [ 1 2 3 4 5 6 7 8 9 ]
example two result: [ 1 4 7 2 5 8 3 6 9 ]

回复收藏 0 原文

~没有更多了~