CUDA:有关活动扭曲(活动块)以及如何选择块大小的问题
假设一个 CUDA GPU 在一个多处理器上可以有 48 个同时活动的 warp,即 48 个块的 1 个 warp,或 24 个块的 2 个 warp,...,因为来自多个块的所有活动 warp 都被调度执行,所以看起来大小block的大小对于GPU的占用并不重要(当然应该是32的倍数),32、64、128都没有区别吧?那么块的大小只是由计算任务和资源限制(共享内存或寄存器)决定的?
Suppose a CUDA GPU can have 48 simultaneously active warps on one multiprocessor, that is 48 blocks of one warp, or 24 blocks of 2 warp, ..., since all the active warps from multiple blocks are scheduled for execution, it seems the size of the block is not important for the occupancy of the GPU (of course it should be multiple of 32), whether 32, 64, or 128 make no difference, right? So the size of the block is just determined by the computation task and the resource limit (shared memory or registers)?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您忽略了多个值得考虑的因素。
There are multiple factors worth considering, that you ommit.
不。
块大小确实很重要。
如果您的块大小为 32 个线程,则占用率非常低。
如果块大小为 256,则占用率很高。这意味着所有 256 个都同时处于活动状态。
超过 256 个线程/块几乎不会产生任何影响。
由于所涉及的架构很复杂,因此使用软件进行测试始终是最好的方法。
No.
The blocksize does matter.
If you have a blocksize of 32 threads you have a very low occupancy.
If you have a blocksize of 256 you have a high occupancy. That means that all the 256 are concurrently active.
More than 256 threads / block would rarely make some difference.
As the architecture involved is complex, testing it with your software is always the best approach.