在 CUDA 中混合自定义内存管理和 Thrust

发布于 2024-12-28 23:27:41 字数 343 浏览 5 评论 0原文

在我的项目中，我实现了一个自定义内存分配器，以避免在应用程序“预热”后不必要地调用 cudaMalloc。此外，我使用自定义内核进行基本数组填充、数组之间的算术运算等，并希望通过使用 Thrust 并摆脱这些内核来简化我的代码。设备上的每个数组都是通过原始指针创建和访问的（目前），我想在这些对象上使用 device_vector 和 Thrust 方法，但我发现自己一直在原始指针和 device_ptr<> 之间进行转换，这让我的代码有些混乱。

我相当模糊的问题：您将如何/如何以最易读的方式组织自定义内存管理、Thrust 数组方法和对自定义内核的调用的使用？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

淡忘如思 2025-01-04 23:27:41

与所有标准 C++ 容器一样，您可以通过提供自己的 "分配器"。默认情况下，thrust::device_vector 的分配器是 thrust::device_malloc_allocator，它使用 cudaMalloc (cudaFree）当 Thrust 的后端系统是 CUDA 时。

有时，需要自定义 device_vector 分配内存的方式，例如在 OP 的情况下，他们希望在程序初始化时执行的单个大型分配中对存储进行子分配。这可以避免对底层分配方案（在本例中为 cudaMalloc）的多次单独调用可能产生的开销。

提供device_vector自定义分配器的一个简单方法是继承device_malloc_allocator。原则上，我们可以从头开始编写整个分配器，但使用继承方法，只需要提供 allocate 和 deallocate 成员函数。定义自定义分配器后，可以将其作为第二个模板参数提供给 device_vector。

此示例代码演示了如何提供一个自定义分配器，该分配器在分配和释放时打印一条消息：

#include <thrust/device_malloc_allocator.h>
#include <thrust/device_vector.h>
#include <iostream>

template<typename T>
  struct my_allocator : thrust::device_malloc_allocator<T>
{
  // shorthand for the name of the base class
  typedef thrust::device_malloc_allocator<T> super_t;

  // get access to some of the base class's typedefs

  // note that because we inherited from device_malloc_allocator,
  // pointer is actually thrust::device_ptr<T>
  typedef typename super_t::pointer   pointer;

  typedef typename super_t::size_type size_type;

  // customize allocate
  pointer allocate(size_type n)
  {
    std::cout << "my_allocator::allocate(): Hello, world!" << std::endl;

    // defer to the base class to allocate storage for n elements of type T
    // in practice, you'd do something more interesting here
    return super_t::allocate(n);
  }

  // customize deallocate
  void deallocate(pointer p, size_type n)
  {
    std::cout << "my_allocator::deallocate(): Hello, world!" << std::endl;

    // defer to the base class to deallocate n elements of type T at address p
    // in practice, you'd do something more interesting here
    super_t::deallocate(p,n);
  }
};

int main()
{
  // create a device_vector which uses my_allocator
  thrust::device_vector<int, my_allocator<int> > vec;

  // create 10 ints
  vec.resize(10, 13);

  return 0;
}

这是输出：

$ nvcc my_allocator_test.cu -arch=sm_20 -run
my_allocator::allocate(): Hello, world!
my_allocator::deallocate(): Hello, world!

在本示例中，请注意，我们在 时听到了 my_allocator::allocate() 的消息。 vec.resize(10,13)。当 vec 因销毁其元素而超出范围时，my_allocator::deallocate() 会被调用一次。

Like all standard c++ containers, you can customize how thrust::device_vector allocates storage by providing it with your own "allocator". By default, thrust::device_vector's allocator is thrust::device_malloc_allocator, which allocates (deallocates) storage with cudaMalloc (cudaFree) when Thrust's backend system is CUDA.

Occasionally, it is desirable to customize the way device_vector allocates memory, such as in the OP's case, who would like to sub-allocate storage within a single large allocation performed at program initialization. This can avoid overhead which may be incurred by many individual calls to the underlying allocation scheme, in this case, cudaMalloc.

A simple way to provide device_vector a custom allocator is to inherit from device_malloc_allocator. One could in principle author an entire allocator from scratch, but with an inheritance approach, only the allocate and deallocate member functions need to be provided. Once the custom allocator is defined, it can be provided to device_vector as its second template parameter.

This example code demonstrates how to provide a custom allocator which prints a message upon allocation and deallocation:

#include <thrust/device_malloc_allocator.h>
#include <thrust/device_vector.h>
#include <iostream>

template<typename T>
  struct my_allocator : thrust::device_malloc_allocator<T>
{
  // shorthand for the name of the base class
  typedef thrust::device_malloc_allocator<T> super_t;

  // get access to some of the base class's typedefs

  // note that because we inherited from device_malloc_allocator,
  // pointer is actually thrust::device_ptr<T>
  typedef typename super_t::pointer   pointer;

  typedef typename super_t::size_type size_type;

  // customize allocate
  pointer allocate(size_type n)
  {
    std::cout << "my_allocator::allocate(): Hello, world!" << std::endl;

    // defer to the base class to allocate storage for n elements of type T
    // in practice, you'd do something more interesting here
    return super_t::allocate(n);
  }

  // customize deallocate
  void deallocate(pointer p, size_type n)
  {
    std::cout << "my_allocator::deallocate(): Hello, world!" << std::endl;

    // defer to the base class to deallocate n elements of type T at address p
    // in practice, you'd do something more interesting here
    super_t::deallocate(p,n);
  }
};

int main()
{
  // create a device_vector which uses my_allocator
  thrust::device_vector<int, my_allocator<int> > vec;

  // create 10 ints
  vec.resize(10, 13);

  return 0;
}

Here's the output:

$ nvcc my_allocator_test.cu -arch=sm_20 -run
my_allocator::allocate(): Hello, world!
my_allocator::deallocate(): Hello, world!

In this example, note that we hear from my_allocator::allocate() once upon vec.resize(10,13). my_allocator::deallocate() is invoked once when vec goes out of scope as it destroys its elements.

回复收藏 0 原文

~没有更多了~