是否存在一些 Thrust::device_vector 等效库,可以在 CUDA 内核中使用?
Throw::device_vector 的自动内存管理确实很有用,唯一的缺点是无法在内核代码中使用它。
我在互联网上查找,刚刚找到了矢量库,例如推力,它处理来自主机代码的设备内存。是否存在内核矢量库?如果没有,拥有这样一个图书馆是不是一个坏主意?
The automatic memory management of thrust::device_vector is really useful, the only drawback is that it's not possible to use it from within a kernel code.
I've looked on the Internet and just found vector libraries such as thrust, that deals with device memory from host code. Does any vector library for kernels exists? If not, is it a bad idea to have such a library?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
编写这样的库是可以的,但是效率非常低。
事实上,thrust::device_vector 与 Thrust::host_vector 或 std::vector 的唯一区别在于它在设备而不是主机上分配内存。调整大小算法是相同的,并且在主机上运行。
调整大小逻辑非常简单,但涉及分配/释放内存和复制数据。在多线程设置中,每次线程调整向量大小时都必须锁定整个向量 - 由于副本的原因,这可能会很长。
在内核将元素附加到向量的情况下,同步机制实际上会序列化工作,因为一次只允许一个线程调整大小。因此,您的代码将以单个设备处理器的速度运行,减去(相当大的)同步开销。这可能会比 CPU 实现慢很多。
It is possible to write such a library, but it would be very inefficient.
Indeed thrust::device_vector only differs from thrust::host_vector or std::vector in that it allocates memory on the device instead of the host. The resizing algorithm is the same, and runs on the host.
The resize logic is quite simple but involves allocating/freeing memory and copying the data. In a multi-threaded setting, you have to lock the whole vector each time a thread resizes it - which can be quite long because of the copy.
In the case of a kernel which appends elements to a vector, the synchronization mechanism would actually serialize the work since only one thread at a time is allowed to resize. Thus your code would run at the speed of a single device processor, minus the (quite big) synchronization overhead. This would probably be quite a lot slower than a CPU implementation.
Thrust 不能在内核中使用,但是,
thrust::device_vector
可以用于与内核的接口。此时,指向底层数据的指针可以传递给内核。例如:根据您的情况,即使在实现您自己的内核时,这可能仍然意味着 Thrust 库很有用。
Thrust cannot be used within a kernel, however, a
thrust::device_vector
can be used up to the interface with the kernel. At that point, a pointer to the underlying data can be passed to the kernel. For example:Depending on your situation this may still mean the Thrust library is useful even when implementing your own kernels.