哪一个更快?原始指针与推力向量
我是 Cuda 的初学者,我只是想问一个简单的问题,但我找不到任何明确的答案。
我知道我们可以使用原始指针在设备内存中定义数组:
int *raw_ptr;
cudaMalloc((void **) &raw_ptr, N * sizeof(int));
并且,我们还可以使用 Thrust 定义向量并推回我们的项目:
thrust::device_vector<int> D;
实际上,我也需要大量内存(例如 500M int 变量)来应用许多内核并行。就内核访问内存而言,(何时)使用原始指针比 Thrust::vector 更快?
I am a beginner in Cuda, and I just wanted to ask a simple question that I could not find any clear answer for.
I know that we can define our array in Device memory using a raw pointer:
int *raw_ptr;
cudaMalloc((void **) &raw_ptr, N * sizeof(int));
And, we can also use Thrust to define a vector and push_back our items:
thrust::device_vector<int> D;
Actually, I need a huge amount of memory (like 500M int variables) to apply too many kernels on them in parallel. In terms of accessing the memory by kernels, is (when) using raw pointers faster than Thrust::vector?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
thrust::device_vector
中的数据是普通的全局内存,访问速度没有区别。但请注意,您提供的两种替代方案并不等效。 cudaMalloc 返回未初始化的内存。
thrust::device_vector
中的内存将被初始化。分配后,它启动一个内核来初始化其元素,然后是cudaDeviceSynchronize
。这可能会减慢代码速度。您需要对您的代码进行基准测试。The data in
thrust::device_vector
is ordinary global memory, there is no difference in access speed.Note however that the two alternatives you present are not equivalent. cudaMalloc returns uninitialized memory. Memory in
thrust::device_vector
will be initialized. After allocation it launches a kernel for the initialization of its elements, followed bycudaDeviceSynchronize
. This could slow down the code. You need to benchmark your code.