CUDA:对无符号字符的原子操作
我是 CUDA 初学者。我在全局内存中有一个无符号字符的像素缓冲区,可以由任何和所有线程更新。因此,为了避免像素值出现奇怪现象,我想在线程尝试更新像素值时执行atomicExch。但编程指南说该函数仅适用于 32 位或 64 位字,而我只想自动交换一个 8 位字节。有办法做到这一点吗?
谢谢。
I'm a CUDA beginner. I have a pixel buffer of unsigned chars in global memory that can and is updated by any and all threads. To avoid weirdness in the pixel values, therefore, I want to perform an atomicExch when a thread attempts to update one. But the programming guide says that this function only works on 32- or 64-bit words, whereas I just want to atomically exchange one 8-bit byte. Is there a way to do this?
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我最近刚刚遇到这个问题。从理论上讲,原子操作/乐观重试应该比锁/互斥体更快,因此对其他数据类型使用原子操作的“黑客”解决方案对我来说似乎比使用关键部分更好。
以下是一些基于线程的实现 如何为 char 实现atomicMin 和 简称atomicAdd。
我已经测试了所有这些,并且我的测试似乎表明它们到目前为止工作正常。
char 的atomicAdd 版本1
char 的atomicCAS
char 的atomicAdd 版本2(使用位移位而不是__byte_perm,因此必须处理溢出) 对于atomicMin
,请检查此线程。
I just ran into this problem recently. In theory, atomic operations / optimistic retries are supposed to be faster than locks/mutexes, so the "hack" solutions that use atomic operations on other data types seem better to me than using critical sections.
Here are some implementations based on the threads for how to implement atomicMin for char and atomicAdd for short.
I've tested all of these, and my tests seem to show that they work fine so far.
Version 1 of atomicAdd for char
atomicCAS for char
Version 2 of atomicAdd for char (uses bit shifts instead of __byte_perm and has to handle overflow as a result)
For atomicMin, please check this thread.
您可以使用互斥变量来实现关键部分。
所以类似于
http://forums.nvidia.com/index.php?showtopic=185809 < /一>
<一href="https://stackoverflow.com/questions/2021019/how-to-implement-a-ritic-section-in-cuda">在 CUDA 中实现关键部分
You might implement a critical section using a mutex variable.
So something like
http://forums.nvidia.com/index.php?showtopic=185809
Implementing a critical section in CUDA
其他答案在
atomicCAS()
的实现中存在错误。这个版本对我有用:The other answer has a bug in its implementation of
atomicCAS()
. This version works for me: