CUFFT - 填充/初始化问题

发布于 2024-10-28 03:48:22 字数 2019 浏览 1 评论 0原文

我正在查看 Nvidia SDK 的卷积 FFT 示例(对于大内核),我知道傅立叶变换背后的理论及其 FFT 实现(至少是基础知识),但我无法弄清楚以下代码的作用

const int    fftH = snapTransformSize(dataH + kernelH - 1);
const int    fftW = snapTransformSize(dataW + kernelW - 1);

....//gpu initialization code

printf("...creating R2C & C2R FFT plans for %i x %i\n", fftH, fftW);
        cuf ftSafeCall( cufftPlan2d(&fftPlanFwd, fftH, fftW, CUFFT_R2C) );
        cufftSafeCall( cufftPlan2d(&fftPlanInv, fftH, fftW, CUFFT_C2R) );

    printf("...uploading to GPU and padding convolution kernel and input data\n");
        cutilSafeCall( cudaMemcpy(d_Kernel, h_Kernel, kernelH * kernelW * sizeof(float), cudaMemcpyHostToDevice) );
        cutilSafeCall( cudaMemcpy(d_Data,   h_Data,   dataH   * dataW *   sizeof(float), cudaMemcpyHostToDevice) );
        cutilSafeCall( cudaMemset(d_PaddedKernel, 0, fftH * fftW * sizeof(float)) );
        cutilSafeCall( cudaMemset(d_PaddedData,   0, fftH * fftW * sizeof(float)) );

        padKernel(
            d_PaddedKernel,
            d_Kernel,
            fftH,
            fftW,
            kernelH,
            kernelW,
            kernelY,
            kernelX
        );

        padDataClampToBorder(
            d_PaddedData,
            d_Data,
            fftH,
            fftW,
            dataH,
            dataW,
            kernelH,
            kernelW,
            kernelY,
            kernelX
        );

:以前从未使用过 CUFFT 库,所以我不知道 snapTransformSize 的作用

(这里是代码)

int snapTransformSize(int dataSize){
    int hiBit;
    unsigned int lowPOT, hiPOT;

    dataSize = iAlignUp(dataSize, 16);

    for(hiBit = 31; hiBit >= 0; hiBit--)
        if(dataSize & (1U << hiBit)) break;

    lowPOT = 1U << hiBit;
    if(lowPOT == dataSize)
        return dataSize;

    hiPOT = 1U << (hiBit + 1);
    if(hiPOT <= 1024)
        return hiPOT;
    else 
        return iAlignUp(dataSize, 512);
}

,也不知道为什么复平面如此初始化。

您能给我提供解释链接或答案吗?

I am looking at the Nvidia SDK for the convolution FFT example (for large kernels), I know the theory behind fourier transforms and their FFT implementations (the basics at least), but I can't figure out what the following code does:

const int    fftH = snapTransformSize(dataH + kernelH - 1);
const int    fftW = snapTransformSize(dataW + kernelW - 1);

....//gpu initialization code

printf("...creating R2C & C2R FFT plans for %i x %i\n", fftH, fftW);
        cuf ftSafeCall( cufftPlan2d(&fftPlanFwd, fftH, fftW, CUFFT_R2C) );
        cufftSafeCall( cufftPlan2d(&fftPlanInv, fftH, fftW, CUFFT_C2R) );

    printf("...uploading to GPU and padding convolution kernel and input data\n");
        cutilSafeCall( cudaMemcpy(d_Kernel, h_Kernel, kernelH * kernelW * sizeof(float), cudaMemcpyHostToDevice) );
        cutilSafeCall( cudaMemcpy(d_Data,   h_Data,   dataH   * dataW *   sizeof(float), cudaMemcpyHostToDevice) );
        cutilSafeCall( cudaMemset(d_PaddedKernel, 0, fftH * fftW * sizeof(float)) );
        cutilSafeCall( cudaMemset(d_PaddedData,   0, fftH * fftW * sizeof(float)) );

        padKernel(
            d_PaddedKernel,
            d_Kernel,
            fftH,
            fftW,
            kernelH,
            kernelW,
            kernelY,
            kernelX
        );

        padDataClampToBorder(
            d_PaddedData,
            d_Data,
            fftH,
            fftW,
            dataH,
            dataW,
            kernelH,
            kernelW,
            kernelY,
            kernelX
        );

I've never used CUFFT library before so I don't know what the snapTransformSize does

(here's the code)

int snapTransformSize(int dataSize){
    int hiBit;
    unsigned int lowPOT, hiPOT;

    dataSize = iAlignUp(dataSize, 16);

    for(hiBit = 31; hiBit >= 0; hiBit--)
        if(dataSize & (1U << hiBit)) break;

    lowPOT = 1U << hiBit;
    if(lowPOT == dataSize)
        return dataSize;

    hiPOT = 1U << (hiBit + 1);
    if(hiPOT <= 1024)
        return hiPOT;
    else 
        return iAlignUp(dataSize, 512);
}

nor why the complex plane is such initialized.

Can you provide me explanation links or answers please?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

长发绾君心 2024-11-04 03:48:22

它似乎将 FFT 尺寸四舍五入到 2 的下一个幂,除非尺寸超过 1024,在这种情况下,它会四舍五入到 512 的下一个倍数。

四舍五入 FFT 尺寸后,您当然需要填充您的数据带零,使其成为 FFT 的正确大小。

请注意,我们通常需要对卷积进行舍入和填充的原因是因为每个 FFT 维度需要为 image_dimension + kernel_dimension - 1,这通常不是一个方便的数字,例如 2 的幂。

It appears to be rounding up the FFT dimensions to the next power of 2, unless the dimension would exceed 1024, in which case it's rounded up to the next multiple of 512.

Having rounded up the FFT size you then of course need to pad your data with zeroes to make it the correct size for the FFT.

Note that the reason that we typically need to round up and pad for convolution is because each FFT dimension needs to be image_dimension + kernel_dimension - 1, which is not normally a convenient number, such as a power of 2.

阪姬 2024-11-04 03:48:22

@Paul R 说的是正确的。为什么这样做是因为快速傅立叶变换运算
需要以最快的速度执行两个的倍数。请参阅 Cooley-Tukey 算法

只需确保您声明的是一个矩阵这是 2 的幂,您不需要那种通用的安全实现。

What @Paul R says is correct. Why it does that is because The Fast Fourier Transform operation
requires multiple of two to be executed at the fastest speed. See the Cooley-Tukey algorithm

just make sure that you are declaring a matrix that is a power of two and you should not need that generic safe implementation.

万人眼中万个我 2024-11-04 03:48:22

它将 FFT 维度四舍五入到 2 次方,直到维度超过 1024 为止,它四舍五入到 512 的倍数。您应该用零填充数据,使其成为 FFT 的正确大小。 `

It is rounding up the FFT dimensions to the power of 2, and until the dimension would exceed 1024, it rounded up to the multiple of 512. You should pad the data with zeroes to make it the correct size for the FFT. `

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文