FFT 卷积 - 如何应用 Kernel

发布于 2024-11-29 01:00:12 字数 1927 浏览 5 评论 0原文

我对图像处理还很陌生,发现 FFT 卷积可以大大加快大内核尺寸的卷积速度。

我的问题是,使用 KissFFT 时如何将内核应用于频率空间中的图像?

我已经执行了以下操作:

//I have an image with RGB pixels and given width/height

const int dim[2] = {height, width}; // dimensions of fft
const int dimcount = 2; // number of dimensions. here 2
kiss_fftnd_cfg stf = kiss_fftnd_alloc(dim, dimcount, 0, 0, 0); // forward 2d
kiss_fftnd_cfg sti = kiss_fftnd_alloc(dim, dimcount, 1, 0, 0); // inverse 2d

kiss_fft_cpx *a = new kiss_fft_cpx[width * height];
kiss_fft_cpx *r = new kiss_fft_cpx[width * height];
kiss_fft_cpx *g = new kiss_fft_cpx[width * height];
kiss_fft_cpx *b = new kiss_fft_cpx[width * height];
kiss_fft_cpx *mask = new kiss_fft_cpx[width * height];

kiss_fft_cpx *outa = new kiss_fft_cpx[width * height];
kiss_fft_cpx *outr = new kiss_fft_cpx[width * height];
kiss_fft_cpx *outg = new kiss_fft_cpx[width * height];
kiss_fft_cpx *outb = new kiss_fft_cpx[width * height];
kiss_fft_cpx *outmask = new kiss_fft_cpx[width * height];

for(unsigned int i=0; i<height; i++) {
    for(unsigned int l=0; l<width; l++) {
        float red = intToFloat((int)Input(i,l)->Red);
        float green = intToFloat((int)Input(i,l)->Green);
        float blue = intToFloat((int)Input(i,l)->Blue);

        int index = i * height + l;

        a[index].r = 1.0;
        r[index].r = red;
        g[index].r = green;
        b[index].r = blue;
    }
}

kiss_fftnd(stf, a, outa);
kiss_fftnd(stf, r, outr);
kiss_fftnd(stf, g, outg);
kiss_fftnd(stf, b, outb);
kiss_fftnd(stf, mask, outmask);


kiss_fftnd(sti, outa, a);
kiss_fftnd(sti, outr, r);
kiss_fftnd(sti, outg, g);

当我在图像上再次设置 RGB 值时,我确实恢复了原始图像。所以这个转变是有效的。 例如,如果我想应用内核,我现在应该做什么 9x9 框模糊(1/9、1/9、... 1/9)。

我读过一些有关快速卷积的内容,但它们都是不同的,具体取决于 FFT 的实现。在应用过滤器之前,是否有一种“列表”我必须关心的事情?

我的想法是:

图像大小必须是2的幂; 我必须创建一个与图像大小相同的内核。将中间的 9 个值设为 1/9,其余的设为 0,然后将该核变换到频域,与源图像相乘,然后将源图像变换回来。但这实际上不起作用:DD

I'm pretty new to Image Processing and found out that the FFT convolution speeds up the convolution with large kernel sizes a lot.

My question is, how can I apply a kernel to a image in frequency space when using kissFFT?

I already did the following:

//I have an image with RGB pixels and given width/height

const int dim[2] = {height, width}; // dimensions of fft
const int dimcount = 2; // number of dimensions. here 2
kiss_fftnd_cfg stf = kiss_fftnd_alloc(dim, dimcount, 0, 0, 0); // forward 2d
kiss_fftnd_cfg sti = kiss_fftnd_alloc(dim, dimcount, 1, 0, 0); // inverse 2d

kiss_fft_cpx *a = new kiss_fft_cpx[width * height];
kiss_fft_cpx *r = new kiss_fft_cpx[width * height];
kiss_fft_cpx *g = new kiss_fft_cpx[width * height];
kiss_fft_cpx *b = new kiss_fft_cpx[width * height];
kiss_fft_cpx *mask = new kiss_fft_cpx[width * height];

kiss_fft_cpx *outa = new kiss_fft_cpx[width * height];
kiss_fft_cpx *outr = new kiss_fft_cpx[width * height];
kiss_fft_cpx *outg = new kiss_fft_cpx[width * height];
kiss_fft_cpx *outb = new kiss_fft_cpx[width * height];
kiss_fft_cpx *outmask = new kiss_fft_cpx[width * height];

for(unsigned int i=0; i<height; i++) {
    for(unsigned int l=0; l<width; l++) {
        float red = intToFloat((int)Input(i,l)->Red);
        float green = intToFloat((int)Input(i,l)->Green);
        float blue = intToFloat((int)Input(i,l)->Blue);

        int index = i * height + l;

        a[index].r = 1.0;
        r[index].r = red;
        g[index].r = green;
        b[index].r = blue;
    }
}

kiss_fftnd(stf, a, outa);
kiss_fftnd(stf, r, outr);
kiss_fftnd(stf, g, outg);
kiss_fftnd(stf, b, outb);
kiss_fftnd(stf, mask, outmask);


kiss_fftnd(sti, outa, a);
kiss_fftnd(sti, outr, r);
kiss_fftnd(sti, outg, g);

When I set the rgb values again on an image I do get the original image back. So the transformation works.
What should I do now if I want to apply a kernel, for example
a 9x9 box blur (1/9, 1/9, ... 1/9).

I have read some things about Fast convolution, but they're all different, depending on the implementation of the FFT . Is there a kind of "list" what things I have to care before applying a filter ?

The way I think:

The imagesize must be a power of 2;
I must create a kernel, the same size as the image. Put the 9 middle values to 1/9, the rest to 0 and then transform this kernel into frequency domain, multiply the source image with it, then transform the source image back. But that doesn't really work :DD

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

紫罗兰の梦幻 2024-12-06 01:00:12

在频域中执行的卷积实际上是循环卷积。因此,当内核的非零元素到达图片边缘时,它会环绕并包含图片另一侧的像素,这可能不是您想要的。为了处理这个问题,只需用与内核中非零元素一样多的元素对输入进行零填充(实际上少一个即可)。对于 3x3 内核,您需要在每个维度添加 3-1=2 个零像素。

The convolution performed in the frequency domain is really a circular convolution. So when your non-zero elements of the kernel reach the edge of the picture it wraps around and includes the pixels from the other side of the picture, which is probably not what you want. To deal with this just zero pad the input with as many elements as you have non-zero elements in the kernel (actually one less will do). With a 3x3 kernel you need to add 3-1=2 zero pixels in each dimension.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文