Cuda 更改数组中的单个值
我在 CUDA 设备内存中计算了一个名为 d_index 的向量,我只想更改一个值,如下所示...
d_index[columnsA-rowsA]=columnsA;
我怎样才能做到这一点,而不必将其复制到系统内存然后再返回到设备内存?
I have a vector called d_index
calculated in the CUDA device memory and I want to change just one value, like this...
d_index[columnsA-rowsA]=columnsA;
How can I do this without having to copy it to the system memory and then back to the device memory?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以在
<<<1,1>>>
网格上调用 kernel,仅更改所需的元素:,或者使用类似以下内容:
如果您只执行一次,我认为使用哪个版本没有太大区别。如果您经常调用此代码,您最好考虑将此数组修改包含到其他内核中,以避免调用开销。
You could either call kernel on
<<<1,1>>>
grid, that changes only the desired element:, or use something like:
If you only do this once, I think there is no big difference which version to use. If you call this code often, you better consider including this array modification into some other kernel to avoid invocation overhead.
主机 (CPU) 代码无法直接访问设备内存,因此您有两种选择:
update_array<<<1,1>>>>(index, value)
)cudaMemcpy()
到位置当然更新数组中的单个值效率非常低,希望您已经考虑过这是否有必要或者可以避免吗?例如,您可以将数组作为 GPU 代码的一部分进行更新吗?
Host (CPU) code cannot directly access device memory, so you have two choices:
update_array<<<1,1>>>(index, value)
)cudaMemcpy()
to the locationOf course updating a single value in an array is very inefficient, hopefully you've considered whether this is necessary or perhaps it could be avoided? For example, could you update the array as part of the GPU code?
我认为由于 d_index 数组位于设备内存中,因此每个线程都可以直接访问它。
I think since the d_index array is in the device memory, it can be directly accessed by every thread.