从给定的 2D 偏移量了解 CUDA 中的块 ID

发布于 2024-11-13 10:43:19 字数 1109 浏览 6 评论 0原文

我试图根据 CUDA 中给定的偏移量计算 blockIdx.x 和 blockIdx.y，但我完全无法思考。这个想法是在可能的情况下从共享内存中读取数据，在其他情况下从全局内存中读取数据。

例如，如果我有一个包含 64 个元素的一维数组，并且我配置了一个具有 16x1 线程（总共 4 个块）的内核，每个线程可以使用以下命令访问一个位置：

int idx = blockDim.x*blockIdx.x + threadIdx.x

并且我可以轻松获取给定索引值的 blockIdx.x从 idx 来看，

int blockNumber = idx / blockDim.x;

但在具有 8x8 元素和 4x4 线程内核配置（总共 2x2 块）的 2D 场景中，每个线程使用以下方式访问一个位置：

int x = threadIdx.x + blockIdx.x * blockDim.x;
int y = threadIdx.y + blockIdx.y * blockDim.y;
int pitch = blockDim.x * gridDim.x;
int idx = x + y * pitch;

int sharedMemIndex = threadIdx.x+threadIdx.y+BLOCK_DIM_X;
__shared_block[sharedMemIndex] = fromGlobalMemory[idx];
__syncthreads();

// ... some operations

int unknow_index = __shared_block[sharedMemIndex];

if ( unknow_index within this block? )
    // ... read from shared memory
else
    // ... read from global memory

我如何知道块 ID.x 和ID.y 在给定的 idx 处？即索引34和35在块(0, 1)中，索引36在块(1, 1)中。因此，如果块 (0, 1) 中的线程读取索引 35 的值，则该线程将知道该值在其块内，并将从共享内存中读取它。索引 35 的值将存储在该块的共享内存的位置 11 (0. 1) 中。

提前致谢！

原文

i've trying to calculate the blockIdx.x and blockIdx.y from a given offset in CUDA but i'm totally mind-blocked. The idea is read data from shared memory when possible and from global memory in other case.

In example, if I've a 1D array of 64 elements and I configure a kernel with 16x1 threads (4 blocks in total) each thread can access to a position using:

int idx = blockDim.x*blockIdx.x + threadIdx.x

and i can easily get the blockIdx.x of a given index value from the idx as

int blockNumber = idx / blockDim.x;

but in a 2D scenario with 8x8 elements and a kernel configuration of 4x4 threads (2x2 blocks in total) each thread accesses to a position using:

int x = threadIdx.x + blockIdx.x * blockDim.x;
int y = threadIdx.y + blockIdx.y * blockDim.y;
int pitch = blockDim.x * gridDim.x;
int idx = x + y * pitch;

int sharedMemIndex = threadIdx.x+threadIdx.y+BLOCK_DIM_X;
__shared_block[sharedMemIndex] = fromGlobalMemory[idx];
__syncthreads();

// ... some operations

int unknow_index = __shared_block[sharedMemIndex];

if ( unknow_index within this block? )
    // ... read from shared memory
else
    // ... read from global memory

How can i know the Block ID.x and ID.y at a given idx? i.e. index 34 and 35 are in block (0, 1) and index 36 in block (1, 1). So, if a thread in block (0, 1) read a value of index 35, that thread will know that the value is within its block and will read it from shared memory. The index 35 value will be in stored in the position 11 of the shared memory of the block (0. 1).

Thanks in advance!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

白云不回头 2024-11-20 10:43:19

在实践中，我真的想不出为什么需要这样做的充分理由，但是您可以这样计算结果，对于任意索引值 idx（假设列有序索引）：

int pitch = blockDim.x * gridDim.x;
int tidy = idx / pitch; // div(idx,pitch)
int tidx = idx - (pitch * tidy); // mod(idx,pitch)
int bidx = idx / blockDim.x;
int bidy = idy / blockDim.y;

应该给出 bidx 和 bidy 中索引的块坐标。

In practice, I really can't think of a good reason why this is ever necessary, but you can compute the result like this, for an arbitrary index value idx(assuming column ordered indexing):

int pitch = blockDim.x * gridDim.x;
int tidy = idx / pitch; // div(idx,pitch)
int tidx = idx - (pitch * tidy); // mod(idx,pitch)
int bidx = idx / blockDim.x;
int bidy = idy / blockDim.y;

that should give you the block coordinates of the index in bidx and bidy.

回复收藏 0 原文

靖瑶 2024-11-20 10:43:19

无需对 Idx 应用数学来找出 X 和 Y 块，也无需从 Idx 向后查找块索引。对于每个线程 (Idx)，您只需调用 blockIdx.x 和 blockIdx.y 即可找到 Y 和 X 块。

在内核中的任何点：

int x = blockIdx.x // will give you X block Index at that particular thread
int y = blockIdx.y // will give you Y block Index at that particular thread.

更新：
如果您执着于反向操作，则需要知道节距和块尺寸的值

   int currentRow = idx/pitch;
   int currentCol = idx%pitch;

   int block_idx_x = currentCol/blockDim.x;
   int block_idx_y = currentRow/blockDim.y;

There's no need to apply math on Idx to find out the X and Y blocks or go backwards from Idx to find the block index. For every thread (Idx) you can find out the Y and X blocks simply by calling the blockIdx.x and blockIdx.y.

at any point in kernel:

int x = blockIdx.x // will give you X block Index at that particular thread
int y = blockIdx.y // will give you Y block Index at that particular thread.

Update:
If you're dead set on the reverse operation, you need to know the value of pitch and block dimensions

   int currentRow = idx/pitch;
   int currentCol = idx%pitch;

   int block_idx_x = currentCol/blockDim.x;
   int block_idx_y = currentRow/blockDim.y;

回复收藏 0 原文

勿忘初心 2024-11-20 10:43:19

您正在执行不必要的计算。

idx / blockDim.x
-->(blockDim.x * blockIdx.x + threadIdx.x)/blockDim.x
-->(blockIdx.x  + threadIdx.x/blockDim.x)
--> blockIdx.x + 0 (threadIdx.x always less than blockDim.x)

您可以只使用 blockIdx.x 而不是复杂的计算。对于 2D 网格（blockIdx.x 和 blockIdx.y）也是如此。

You are performing unnecessary calculations.

idx / blockDim.x
-->(blockDim.x * blockIdx.x + threadIdx.x)/blockDim.x
-->(blockIdx.x  + threadIdx.x/blockDim.x)
--> blockIdx.x + 0 (threadIdx.x always less than blockDim.x)

You can just use blockIdx.x instead of the convoluted calculation. The same is true for 2D grids (blockIdx.x and blockIdx.y).

回复收藏 0 原文

~没有更多了~