CUDA 中的卷积、带滤波器的数组

发布于 2024-09-25 14:38:45 字数 1483 浏览 4 评论 0原文

我正在尝试使用共享内存在 GPU 上对 256x256 数据数组与 3x3 滤波器进行卷积。我知道我要将数组分成块,然后在每个块中应用过滤器。这最终意味着块沿边缘重叠,并且需要在没有数据的边缘周围进行一些填充,以便过滤器正常工作。

int 网格 = (256/(16+3-1))*(256/(16+3-1)) 其中 256 是我的数组的长度或宽度,16 是我的共享内存中的块的长度或宽度,3 是我的过滤器的长度或宽度,我减一以使其均匀。

int thread = (16+3-1)*(16+3-1)

现在我调用我的内核<<>>(output, input, 256) 输入和输出是大小为 256*256 的数组

__global__ void kernel(float *input, float *output, int size)
{
    __shared__ float tile[16+3-1][16+3-1];
    blockIdx.x = bIdx;
    blockIdy.y = bIdy;
    threadIdx.x = tIdx;
    threadIdy.y = tIdy

    //i is for input
    unsigned int iX = bIdx * 3 + tIdx;
    unsigned int iY = bIdy * 3 + tIdy;

    if (tIdx == 0 || tIdx == width || tIdy == 0 || tIdy == height)
    {
        //this will pad the outside edges
        block[tIdy][tIdx] = 0;
    }
    else 
    {
        //This will fill in the block with real data
        unsigned int iin = iY * size + iX;
        block[tIdy][tIdx] = idata[iin];
    }

    __syncthreads();

    //I believe is above is correct; below, where I do the convolution, I feel is wrong
    float result = 0;
    for(int fX=-N/2; fX<=N/2; fX++){
        for(int fY=-N/2; fY<=N/2; fY++){
            if(iY+fX>=0 && iY+fX<size && iX+fY>=0 && iX+fY<size)
                result+=tile[tIdx+fX][tIdy+fY];
        }
    }
    output[iY*size+iX] = result/(3*3);
}

当我运行代码时,如果运行卷积部分,则会出现内核错误。有什么见解吗?或者建议?

I'm trying to take the convolution of an array of data, 256x256, with a filter, 3x3 on a GPU using shared memory. I understand that I'm to break the array up in blocks, and then apply the filter within each block. This ultimately means that blocks with overlap along the edges, and some padding will need to be done around the edges where there is no data so that the filter works properly.

int grid = (256/(16+3-1))*(256/(16+3-1))
where 256 is the length or width of my array, 16 is the length or wide of my block in shared memory, 3 is the length or width of my filter, and I minus one to make it so it's even.

int thread = (16+3-1)*(16+3-1)

Now I call my kernel <<>>(output, input, 256)
input and output are an array of size 256*256

__global__ void kernel(float *input, float *output, int size)
{
    __shared__ float tile[16+3-1][16+3-1];
    blockIdx.x = bIdx;
    blockIdy.y = bIdy;
    threadIdx.x = tIdx;
    threadIdy.y = tIdy

    //i is for input
    unsigned int iX = bIdx * 3 + tIdx;
    unsigned int iY = bIdy * 3 + tIdy;

    if (tIdx == 0 || tIdx == width || tIdy == 0 || tIdy == height)
    {
        //this will pad the outside edges
        block[tIdy][tIdx] = 0;
    }
    else 
    {
        //This will fill in the block with real data
        unsigned int iin = iY * size + iX;
        block[tIdy][tIdx] = idata[iin];
    }

    __syncthreads();

    //I believe is above is correct; below, where I do the convolution, I feel is wrong
    float result = 0;
    for(int fX=-N/2; fX<=N/2; fX++){
        for(int fY=-N/2; fY<=N/2; fY++){
            if(iY+fX>=0 && iY+fX<size && iX+fY>=0 && iX+fY<size)
                result+=tile[tIdx+fX][tIdy+fY];
        }
    }
    output[iY*size+iX] = result/(3*3);
}

When I run the code, if I run the convolution part, I get a kernel error. Any insights? Or suggestions?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

半步萧音过轻尘 2024-10-02 14:38:45

查看 sobelFilter SDK 示例。

它使用纹理来处理边缘情况,稍微过度获取块(但纹理缓存使之更加高效),并使用共享内存进行处理。

关于共享内存的微妙之处在于,如果读取相邻字节,则会出现 4 路存储体冲突。解决这个问题的一种方法(如 sobelFilter 示例所示)是将循环展开 4 倍并访问每四个字节。

Check out the sobelFilter SDK sample.

It uses texture to deal with the edge cases, overfetches blocks slightly (but the texture cache makes that more efficient), and uses shared memory for the processing.

The subtle thing about the shared memory is that you get 4-way bank conflicts if you read adjacent bytes. One way to get around this, illustrated in the sobelFilter sample, is to unroll your loop 4x and access every fourth byte.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文