3D 图像索引

发布于 2024-12-02 21:53:07 字数 197 浏览 0 评论 0原文

我有一张尺寸为 512 x 512 x 512 的图像。我需要单独处理所有体素。我怎样才能获得线程ID来做到这一点？如果我使用 1D 线程 ID，块数将超过 65536。

    int id = blockIdx.x*blockDim.x + threadIdx.x;

注意：- 我的卡不支持 3D 网格

原文

I have an image of size 512 x 512 x 512.
I need to process all the voxels individually.
How can I get the thread id to do this?
If I use 1D thread ID the number of blocks will exceeds 65536.

    int id = blockIdx.x*blockDim.x + threadIdx.x;

Note :- My card doesnt support for the 3D grids

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

∞琼窗梦回ˉ 2024-12-09 21:53:07

您可以在 CUDA 4.0 和计算能力 2.0+ 中使用 3D 索引。示例代码：

int blocksInX = (nx+8-1)/8;
int blocksInY = (ny+8-1)/8;
int blocksInZ = (nz+8-1)/8;

dim3 Dg(blocksInX, blocksInY, blocksInZ);
dim3 Db(8, 8, 8);
foo_kernel<<Dg, Db>>(R, nx, ny, nz);

...

__global__ void foo_kernel( float* R, const int nx, const int ny, const int nz )
{
  unsigned int xIndex = blockDim.x * blockIdx.x + threadIdx.x;
  unsigned int yIndex = blockDim.y * blockIdx.y + threadIdx.y;
  unsigned int zIndex = blockDim.z * blockIdx.z + threadIdx.z;

  if ( (xIndex < nx) && (yIndex < ny) && (zIndex < nz) )
  {
    unsigned int index_out = xIndex + nx*yIndex + nx*ny*zIndex;
    ...
    R[index_out] = ...;
  }
}

如果您的设备不支持计算能力 2.0，则有一些技巧：

int threadsInX = 16;
int threadsInY = 4;
int threadsInZ = 4;

int blocksInX = (nx+threadsInX-1)/threadsInX;
int blocksInY = (ny+threadsInY-1)/threadsInY;
int blocksInZ = (nz+threadsInZ-1)/threadsInZ;

dim3 Dg = dim3(blocksInX, blocksInY*blocksInZ);
dim3 Db = dim3(threadsInX, threadsInY, threadsInZ);

foo_kernel<<<Dg, Db>>>(R, nx, ny, nz, blocksInY, 1.0f/(float)blocksInY);

__global__ void foo_kernel(float *R, const int nx, const int ny, const int nz,
                           unsigned int blocksInY, float invBlocksInY)
{

    unsigned int blockIdxz = __float2uint_rd(blockIdx.y * invBlocksInY);
    unsigned int blockIdxy = blockIdx.y - __umul24(blockIdxz, blocksInY);
    unsigned int xIndex = __umul24(blockIdx.x, blockDim.x) + threadIdx.x;
    unsigned int yIndex = __umul24(blockIdxy, blockDim.y) + threadIdx.y;
    unsigned int zIndex = __umul24(blockIdxz, blockDim.z) + threadIdx.z;

    if ( (xIndex < nx) && (yIndex < xIndex) && (zIndex < nz) )
    {
        unsigned int index = xIndex + nx*yIndex + nx*ny*zIndex;
        ...
        R[index] = ...;
    }

}

You are able to use 3D indicies in CUDA 4.0 and compute capability 2.0+. Example code:

int blocksInX = (nx+8-1)/8;
int blocksInY = (ny+8-1)/8;
int blocksInZ = (nz+8-1)/8;

dim3 Dg(blocksInX, blocksInY, blocksInZ);
dim3 Db(8, 8, 8);
foo_kernel<<Dg, Db>>(R, nx, ny, nz);

...

__global__ void foo_kernel( float* R, const int nx, const int ny, const int nz )
{
  unsigned int xIndex = blockDim.x * blockIdx.x + threadIdx.x;
  unsigned int yIndex = blockDim.y * blockIdx.y + threadIdx.y;
  unsigned int zIndex = blockDim.z * blockIdx.z + threadIdx.z;

  if ( (xIndex < nx) && (yIndex < ny) && (zIndex < nz) )
  {
    unsigned int index_out = xIndex + nx*yIndex + nx*ny*zIndex;
    ...
    R[index_out] = ...;
  }
}

If your device doesn't support compute capability 2.0, there is some trick:

int threadsInX = 16;
int threadsInY = 4;
int threadsInZ = 4;

int blocksInX = (nx+threadsInX-1)/threadsInX;
int blocksInY = (ny+threadsInY-1)/threadsInY;
int blocksInZ = (nz+threadsInZ-1)/threadsInZ;

dim3 Dg = dim3(blocksInX, blocksInY*blocksInZ);
dim3 Db = dim3(threadsInX, threadsInY, threadsInZ);

foo_kernel<<<Dg, Db>>>(R, nx, ny, nz, blocksInY, 1.0f/(float)blocksInY);

__global__ void foo_kernel(float *R, const int nx, const int ny, const int nz,
                           unsigned int blocksInY, float invBlocksInY)
{

    unsigned int blockIdxz = __float2uint_rd(blockIdx.y * invBlocksInY);
    unsigned int blockIdxy = blockIdx.y - __umul24(blockIdxz, blocksInY);
    unsigned int xIndex = __umul24(blockIdx.x, blockDim.x) + threadIdx.x;
    unsigned int yIndex = __umul24(blockIdxy, blockDim.y) + threadIdx.y;
    unsigned int zIndex = __umul24(blockIdxz, blockDim.z) + threadIdx.z;

    if ( (xIndex < nx) && (yIndex < xIndex) && (zIndex < nz) )
    {
        unsigned int index = xIndex + nx*yIndex + nx*ny*zIndex;
        ...
        R[index] = ...;
    }

}

回复收藏 0 原文

等待我真够勒 2024-12-09 21:53:07

你可以使用网格。它为您提供了更多索引。

回复收藏 0 原文

夜还是长夜 2024-12-09 21:53:07

请注意，您的 PC 内存不是 3D 的。这只是可视化的问题，因此您可以将 3D 图像转换为单个指针。

Array[i][j][z] is same as Array2[ i*cols+j + rows*cols*z];

现在将 Array2 馈送到 CUDA 并在单维中工作

Note that the memory of your PC is not in 3D. It's just the matter of visualization, so you can convert your 3D image into a single pointer.

Array[i][j][z] is same as Array2[ i*cols+j + rows*cols*z];

Now feed the Array2 to CUDA and work in single dimension

回复收藏 0 原文

混浊又暗下来 2024-12-09 21:53:07

如果您需要更大的网格，CUDA 在所有硬件上支持 2D 网格，并且最新版本的 CUDA 工具包还支持当前 Fermi 硬件上的 3D 网格。

然而，并不是绝对有必要拥有这么大的网格。如果每个体素操作都是独立的，那么为什么不只使用一维网格，而是让每个线程处理多个体素呢？这样的方案不仅不需要更大的 2D 或 3D 网格，而且可能会更有效，因为与块的调度和初始化相关的固定成本可以通过多个体素计算进行摊销。

回复收藏 0 原文

纸伞微斜 2024-12-09 21:53:07

我使用了这样的东西：

在代码中定义你的网格：
暗淡 3 替代网格，替代线程；
altgrid.x=lx;
altgrid.y=ly;
altgrid.z=1；
altthreads.x=lz;
altthreads.y=1;
altthreads.z=1；

并且在内核中

int idx = threadIdx.x;
int idy = blockIdx.x ;
int idz = blockIdx.y ;

由于设备上的数组仅为 1D，因此您可以通过矩阵 A 检索 [idx][idy][idz] 元素作为 A[ind]，其中 ind=idz+lz*(idy+ly*idx ）；

我希望它有帮助

I used something like this:

In the code define your grid:
dim3 altgrid,altthreads;
altgrid.x=lx;
altgrid.y=ly;
altgrid.z=1;
altthreads.x=lz;
altthreads.y=1;
altthreads.z=1;

and in the kernel

int idx = threadIdx.x;
int idy = blockIdx.x ;
int idz = blockIdx.y ;

Since the array in on device is only 1D you retrieve the [idx][idy][idz] element by of a matrix A as A[ind], where ind=idz+lz*(idy+ly*idx);

I hope it helps

回复收藏 0 原文

~没有更多了~

关于作者

可可

暂无简介

0 文章

0 评论

22 人气

关注发私信

友情链接

文江博客

3D 图像索引

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

关于作者

相关话题

热门标签

推荐作者

胡图图

zt006

z祗昰~

冰葑

野の

天空

友情链接

3D 图像索引

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

关于作者

相关话题

热门标签

推荐作者

胡图图

zt006

z祗昰~

冰葑

野の

天空

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。