FFT 卷积 - 如何应用 Kernel

发布于 2024-11-29 01:00:12 字数 1927 浏览 5 评论 0原文

我对图像处理还很陌生，发现 FFT 卷积可以大大加快大内核尺寸的卷积速度。

我的问题是，使用 KissFFT 时如何将内核应用于频率空间中的图像？

我已经执行了以下操作：

//I have an image with RGB pixels and given width/height

const int dim[2] = {height, width}; // dimensions of fft
const int dimcount = 2; // number of dimensions. here 2
kiss_fftnd_cfg stf = kiss_fftnd_alloc(dim, dimcount, 0, 0, 0); // forward 2d
kiss_fftnd_cfg sti = kiss_fftnd_alloc(dim, dimcount, 1, 0, 0); // inverse 2d

kiss_fft_cpx *a = new kiss_fft_cpx[width * height];
kiss_fft_cpx *r = new kiss_fft_cpx[width * height];
kiss_fft_cpx *g = new kiss_fft_cpx[width * height];
kiss_fft_cpx *b = new kiss_fft_cpx[width * height];
kiss_fft_cpx *mask = new kiss_fft_cpx[width * height];

kiss_fft_cpx *outa = new kiss_fft_cpx[width * height];
kiss_fft_cpx *outr = new kiss_fft_cpx[width * height];
kiss_fft_cpx *outg = new kiss_fft_cpx[width * height];
kiss_fft_cpx *outb = new kiss_fft_cpx[width * height];
kiss_fft_cpx *outmask = new kiss_fft_cpx[width * height];

for(unsigned int i=0; i<height; i++) {
    for(unsigned int l=0; l<width; l++) {
        float red = intToFloat((int)Input(i,l)->Red);
        float green = intToFloat((int)Input(i,l)->Green);
        float blue = intToFloat((int)Input(i,l)->Blue);

        int index = i * height + l;

        a[index].r = 1.0;
        r[index].r = red;
        g[index].r = green;
        b[index].r = blue;
    }
}

kiss_fftnd(stf, a, outa);
kiss_fftnd(stf, r, outr);
kiss_fftnd(stf, g, outg);
kiss_fftnd(stf, b, outb);
kiss_fftnd(stf, mask, outmask);


kiss_fftnd(sti, outa, a);
kiss_fftnd(sti, outr, r);
kiss_fftnd(sti, outg, g);

当我在图像上再次设置 RGB 值时，我确实恢复了原始图像。所以这个转变是有效的。例如，如果我想应用内核，我现在应该做什么 9x9 框模糊（1/9、1/9、... 1/9）。

我读过一些有关快速卷积的内容，但它们都是不同的，具体取决于 FFT 的实现。在应用过滤器之前，是否有一种“列表”我必须关心的事情？

我的想法是：

图像大小必须是2的幂；我必须创建一个与图像大小相同的内核。将中间的 9 个值设为 1/9，其余的设为 0，然后将该核变换到频域，与源图像相乘，然后将源图像变换回来。但这实际上不起作用：DD

原文

I'm pretty new to Image Processing and found out that the FFT convolution speeds up the convolution with large kernel sizes a lot.

My question is, how can I apply a kernel to a image in frequency space when using kissFFT?

I already did the following:

//I have an image with RGB pixels and given width/height

const int dim[2] = {height, width}; // dimensions of fft
const int dimcount = 2; // number of dimensions. here 2
kiss_fftnd_cfg stf = kiss_fftnd_alloc(dim, dimcount, 0, 0, 0); // forward 2d
kiss_fftnd_cfg sti = kiss_fftnd_alloc(dim, dimcount, 1, 0, 0); // inverse 2d

kiss_fft_cpx *a = new kiss_fft_cpx[width * height];
kiss_fft_cpx *r = new kiss_fft_cpx[width * height];
kiss_fft_cpx *g = new kiss_fft_cpx[width * height];
kiss_fft_cpx *b = new kiss_fft_cpx[width * height];
kiss_fft_cpx *mask = new kiss_fft_cpx[width * height];

kiss_fft_cpx *outa = new kiss_fft_cpx[width * height];
kiss_fft_cpx *outr = new kiss_fft_cpx[width * height];
kiss_fft_cpx *outg = new kiss_fft_cpx[width * height];
kiss_fft_cpx *outb = new kiss_fft_cpx[width * height];
kiss_fft_cpx *outmask = new kiss_fft_cpx[width * height];

for(unsigned int i=0; i<height; i++) {
    for(unsigned int l=0; l<width; l++) {
        float red = intToFloat((int)Input(i,l)->Red);
        float green = intToFloat((int)Input(i,l)->Green);
        float blue = intToFloat((int)Input(i,l)->Blue);

        int index = i * height + l;

        a[index].r = 1.0;
        r[index].r = red;
        g[index].r = green;
        b[index].r = blue;
    }
}

kiss_fftnd(stf, a, outa);
kiss_fftnd(stf, r, outr);
kiss_fftnd(stf, g, outg);
kiss_fftnd(stf, b, outb);
kiss_fftnd(stf, mask, outmask);


kiss_fftnd(sti, outa, a);
kiss_fftnd(sti, outr, r);
kiss_fftnd(sti, outg, g);

When I set the rgb values again on an image I do get the original image back. So the transformation works.
What should I do now if I want to apply a kernel, for example
a 9x9 box blur (1/9, 1/9, ... 1/9).

I have read some things about Fast convolution, but they're all different, depending on the implementation of the FFT . Is there a kind of "list" what things I have to care before applying a filter ?

The way I think:

The imagesize must be a power of 2;
I must create a kernel, the same size as the image. Put the 9 middle values to 1/9, the rest to 0 and then transform this kernel into frequency domain, multiply the source image with it, then transform the source image back. But that doesn't really work :DD

分享到QQ

分享到微博