2D/3D CUDA 块如何划分为扭曲?
如果我从一个块具有尺寸的网格开始我的内核:
dim3 block_dims(16,16);
网格块现在如何分割成扭曲?这样一个块的前两行是否形成一个扭曲,或者前两列,或者这是任意排序的?
假设 GPU 计算能力为 2.0。
If I start my kernel with a grid whose blocks have dimensions:
dim3 block_dims(16,16);
How are the grid blocks now split into warps? Do the first two rows of such a block form one warp, or the first two columns, or is this arbitrarily-ordered?
Assume a GPU Compute Capability of 2.0.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
线程在块内按顺序编号,以便
threadIdx.x
变化最快,然后threadIdx.y
变化第二快,threadIdx.z
变化最慢。这在功能上与多维数组中的列主要排序相同。扭曲是按此顺序从线程顺序构造的。因此 2d 块的计算为这在编程指南和 PTX 指南中都有介绍。
Threads are numbered in order within blocks so that
threadIdx.x
varies the fastest, thenthreadIdx.y
the second fastest varying, andthreadIdx.z
the slowest varying. This is functionally the same as column major ordering in multidimensional arrays. Warps are sequentially constructed from threads in this ordering. So the calculation for a 2d block isThis is covered both in the programming guide and the PTX guide.
为了通过两个连续扭曲的“Visual Studio WarpWatch”窗口说明 @talonmies 的答案(
dim3 block_dims(16,16);
和 WarpSize = 32):To illustrate @talonmies's answer through 'Visual Studio WarpWatch' window for two consecutive warps (
dim3 block_dims(16,16);
and WarpSize = 32):