3D 图像索引
我有一张尺寸为 512 x 512 x 512 的图像。 我需要单独处理所有体素。 我怎样才能获得线程ID来做到这一点? 如果我使用 1D 线程 ID,块数将超过 65536。
int id = blockIdx.x*blockDim.x + threadIdx.x;
注意:- 我的卡不支持 3D 网格
I have an image of size 512 x 512 x 512.
I need to process all the voxels individually.
How can I get the thread id to do this?
If I use 1D thread ID the number of blocks will exceeds 65536.
int id = blockIdx.x*blockDim.x + threadIdx.x;
Note :- My card doesnt support for the 3D grids
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
您可以在 CUDA 4.0 和计算能力 2.0+ 中使用 3D 索引。示例代码:
如果您的设备不支持计算能力 2.0,则有一些技巧:
You are able to use 3D indicies in CUDA 4.0 and compute capability 2.0+. Example code:
If your device doesn't support compute capability 2.0, there is some trick:
你可以使用网格。它为您提供了更多索引。
You could use grids. It gives you much more indexes.
请注意,您的 PC 内存不是 3D 的。这只是可视化的问题,因此您可以将 3D 图像转换为单个指针。
现在将 Array2 馈送到 CUDA 并在单维中工作
Note that the memory of your PC is not in 3D. It's just the matter of visualization, so you can convert your 3D image into a single pointer.
Now feed the Array2 to CUDA and work in single dimension
如果您需要更大的网格,CUDA 在所有硬件上支持 2D 网格,并且最新版本的 CUDA 工具包还支持当前 Fermi 硬件上的 3D 网格。
然而,并不是绝对有必要拥有这么大的网格。如果每个体素操作都是独立的,那么为什么不只使用一维网格,而是让每个线程处理多个体素呢?这样的方案不仅不需要更大的 2D 或 3D 网格,而且可能会更有效,因为与块的调度和初始化相关的固定成本可以通过多个体素计算进行摊销。
If you need a larger grid, CUDA supports 2D grids on all hardware, and the most recent versions of the CUDA toolkit also support 3D grids on current Fermi hardware.
However, it isn't strictly necessary to have such large grids. If each voxel operation is independent, then why not just use a 1D grid, but have each thread process more than one voxel? Not only would such a scheme not need larger 2D or 3D grids, it might well be more efficient because the fixed costs associated with scheduling and initialization of a block can be amortized over multiple voxel calculations.
我使用了这样的东西:
在代码中定义你的网格:
暗淡 3 替代网格,替代线程;
altgrid.x=lx;
altgrid.y=ly;
altgrid.z=1;
altthreads.x=lz;
altthreads.y=1;
altthreads.z=1;
并且在内核中
由于设备上的数组仅为 1D,因此您可以通过矩阵 A 检索 [idx][idy][idz] 元素作为 A[ind],其中 ind=idz+lz*(idy+ly*idx );
我希望它有帮助
I used something like this:
In the code define your grid:
dim3 altgrid,altthreads;
altgrid.x=lx;
altgrid.y=ly;
altgrid.z=1;
altthreads.x=lz;
altthreads.y=1;
altthreads.z=1;
and in the kernel
Since the array in on device is only 1D you retrieve the [idx][idy][idz] element by of a matrix A as A[ind], where ind=idz+lz*(idy+ly*idx);
I hope it helps