从给定的 2D 偏移量了解 CUDA 中的块 ID
我试图根据 CUDA 中给定的偏移量计算 blockIdx.x 和 blockIdx.y,但我完全无法思考。这个想法是在可能的情况下从共享内存中读取数据,在其他情况下从全局内存中读取数据。
例如,如果我有一个包含 64 个元素的一维数组,并且我配置了一个具有 16x1 线程(总共 4 个块)的内核,每个线程可以使用以下命令访问一个位置:
int idx = blockDim.x*blockIdx.x + threadIdx.x
并且我可以轻松获取给定索引值的 blockIdx.x从 idx 来看,
int blockNumber = idx / blockDim.x;
但在具有 8x8 元素和 4x4 线程内核配置(总共 2x2 块)的 2D 场景中,每个线程使用以下方式访问一个位置:
int x = threadIdx.x + blockIdx.x * blockDim.x;
int y = threadIdx.y + blockIdx.y * blockDim.y;
int pitch = blockDim.x * gridDim.x;
int idx = x + y * pitch;
int sharedMemIndex = threadIdx.x+threadIdx.y+BLOCK_DIM_X;
__shared_block[sharedMemIndex] = fromGlobalMemory[idx];
__syncthreads();
// ... some operations
int unknow_index = __shared_block[sharedMemIndex];
if ( unknow_index within this block? )
// ... read from shared memory
else
// ... read from global memory
我如何知道块 ID.x 和ID.y 在给定的 idx 处?即索引34和35在块(0, 1)中,索引36在块(1, 1)中。因此,如果块 (0, 1) 中的线程读取索引 35 的值,则该线程将知道该值在其块内,并将从共享内存中读取它。索引 35 的值将存储在该块的共享内存的位置 11 (0. 1) 中。
提前致谢!
i've trying to calculate the blockIdx.x and blockIdx.y from a given offset in CUDA but i'm totally mind-blocked. The idea is read data from shared memory when possible and from global memory in other case.
In example, if I've a 1D array of 64 elements and I configure a kernel with 16x1 threads (4 blocks in total) each thread can access to a position using:
int idx = blockDim.x*blockIdx.x + threadIdx.x
and i can easily get the blockIdx.x of a given index value from the idx as
int blockNumber = idx / blockDim.x;
but in a 2D scenario with 8x8 elements and a kernel configuration of 4x4 threads (2x2 blocks in total) each thread accesses to a position using:
int x = threadIdx.x + blockIdx.x * blockDim.x;
int y = threadIdx.y + blockIdx.y * blockDim.y;
int pitch = blockDim.x * gridDim.x;
int idx = x + y * pitch;
int sharedMemIndex = threadIdx.x+threadIdx.y+BLOCK_DIM_X;
__shared_block[sharedMemIndex] = fromGlobalMemory[idx];
__syncthreads();
// ... some operations
int unknow_index = __shared_block[sharedMemIndex];
if ( unknow_index within this block? )
// ... read from shared memory
else
// ... read from global memory
How can i know the Block ID.x and ID.y at a given idx? i.e. index 34 and 35 are in block (0, 1) and index 36 in block (1, 1). So, if a thread in block (0, 1) read a value of index 35, that thread will know that the value is within its block and will read it from shared memory. The index 35 value will be in stored in the position 11 of the shared memory of the block (0. 1).
Thanks in advance!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
在实践中,我真的想不出为什么需要这样做的充分理由,但是您可以这样计算结果,对于任意索引值 idx(假设列有序索引):
应该给出 bidx 和 bidy 中索引的块坐标。
In practice, I really can't think of a good reason why this is ever necessary, but you can compute the result like this, for an arbitrary index value
idx
(assuming column ordered indexing):that should give you the block coordinates of the index in bidx and bidy.
无需对 Idx 应用数学来找出 X 和 Y 块,也无需从 Idx 向后查找块索引。对于每个线程 (Idx),您只需调用 blockIdx.x 和 blockIdx.y 即可找到 Y 和 X 块。
在内核中的任何点:
更新:
如果您执着于反向操作,则需要知道节距和块尺寸的值
There's no need to apply math on Idx to find out the X and Y blocks or go backwards from Idx to find the block index. For every thread (Idx) you can find out the Y and X blocks simply by calling the blockIdx.x and blockIdx.y.
at any point in kernel:
Update:
If you're dead set on the reverse operation, you need to know the value of pitch and block dimensions
您正在执行不必要的计算。
您可以只使用 blockIdx.x 而不是复杂的计算。对于 2D 网格(blockIdx.x 和 blockIdx.y)也是如此。
You are performing unnecessary calculations.
You can just use blockIdx.x instead of the convoluted calculation. The same is true for 2D grids (blockIdx.x and blockIdx.y).