高斯模糊与 FFT 问题
我目前有一个使用常规卷积的高斯模糊实现。对于小内核来说它足够高效,但是一旦内核尺寸变大,性能就会受到影响。所以,我正在考虑使用FFT来实现卷积。我从来没有任何与 FFT 相关的图像处理经验,所以我有几个问题。
基于 2D FFT 的卷积是否也可分为两个 1D 卷积?
- 如果为真,它是否会像这样 - 对每行进行 1D FFT,然后对每列进行 1D FFT,然后与 2D 内核相乘,然后对每列进行逆变换,对每行进行逆变换?或者我是否必须在每次 1D FFT 变换后乘以 1D 内核?
现在我明白内核大小应该与图像大小相同(一维情况下的行)。但它会如何影响边缘呢?我必须用零填充图像边缘吗?如果是这样,内核大小应该等于填充之前或之后的图像大小?
另外,这是一个 C++ 项目,我计划使用 KissFFT,因为这是一个商业项目。欢迎您提出更好的替代方案。谢谢。
编辑:感谢您的回复,但我还有几个问题。
我看到输入图像的虚部将全为零。但输出虚部也会为零吗?我是否必须将高斯核乘以实部和虚部?
我有相同图像的实例在不同的比例下被模糊,即相同的图像被缩放到不同的大小并在不同的内核大小下模糊。每次缩放图像时是否都必须执行 FFT,还是可以使用相同的 FFT?
最后,如果我想可视化 FFT,我知道必须对 FFT 应用对数滤波器。但我真的不知道应该使用哪一部分来可视化 FFT?实部或虚部。
另外,对于尺寸为 512x512 的图像,实部和虚部的尺寸是多少。它们的长度相同吗?
再次感谢您的详细回复。
I have a current implementation of Gaussian Blur using regular convolution. It is efficient enough for small kernels, but once the kernels size gets a little bigger, the performance takes a hit. So, I am thinking to implement the convolution using FFT. I've never had any experience with FFT related image processing so I have a few questions.
Is a 2D FFT based convolution also separable into two 1D convolutions ?
- If true, does it go like this - 1D FFT on every row, and then 1D FFT on every column, then multiply with the 2D kernel and then inverse transform of every column and the inverse transform of every row? Or do I have to multiply with a 1D kernel after each 1D FFT Transform?
Now I understand that the kernel size should be the same size as the image (row in case of 1D). But how will it affect the edges? Do I have to pad the image edges with zeros? If so the kernel size should be equal to the image size before or after padding?
Also, this is a C++ project, and I plan on using kissFFT, since this is a commercial project. You are welcome to suggest any better alternatives. Thank you.
EDIT: Thanks for the responses, but I have a few more questions.
I see that the imaginary part of the input image will be all zeros. But will the output imaginary part will also be zeros? Do I have to multiply the Gaussian kernel to both real and imaginary parts?
I have instances of the same image to be blurred at different scales, i.e. the same image is scaled to different sizes and blurred at different kernel sizes. Do I have to perform a FFT every time I scale the image or can I use the same FFT?
Lastly, If I wanted to visualize the FFT, I understand that a log filter has to be applied to the FFT. But I am really lost on which part should be used to visualize FFT? The real part or the imaginary part.
Also for an image of size 512x512, what will be the size of real and imaginary parts. Will they be the same length?
Thank you again for your detailed replies.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
2-D FFT 是可分离的,并且您的执行方式是正确的,只是您必须乘以 2D 内核的 2-D FFT。如果您使用 Kissfft,执行 2-D FFT 的更简单方法是仅使用 Kissfft 包的工具目录中的
kiss_fftnd
。这将执行多维 FFT。内核大小不必是任何特定大小。如果内核小于图像,则只需在执行 2-D FFT 之前将零填充到图像大小。您还应该对图像边缘进行零填充,因为通过频域中的乘法执行的卷积实际上是循环卷积,并且结果在边缘处环绕。
总结一下(假设图像大小为 M x N):
如果您在不同的图像上多次执行相同的过滤器,则不必每次都执行 1-3。
注意:内核大小必须相当大,才能比直接计算卷积更快。另外,您是否利用二维高斯滤波器是可分离的这一事实来实现直接卷积(参见此 几段进入“力学”部分)?也就是说,您可以先对行执行 2-D 卷积,然后再对列执行 1-D 卷积。我发现这比大多数基于 FFT 的方法更快,除非内核非常大。
对编辑的响应
如果输入是真实的,除了极少数情况外,输出仍然会很复杂。高斯核的 FFT 也很复杂,因此乘法必须是复数乘法。当您执行逆 FFT 时,输出应该是真实的,因为您的输入图像和内核是真实的。输出将以复数数组形式返回,但虚部应该为零或非常小(浮点错误)并且可以被丢弃。
如果您使用相同的图像,您可以重复使用图像 FFT,但您需要根据最大内核大小进行零填充。您必须计算所有不同内核的 FFT。
为了可视化,应使用复数输出的大小。当较大的组件在线性比例中淹没它们时,对数比例仅有助于可视化输出的较小组件。经常使用 分贝 标度,由
20*log10(abs(x ))
或10*log10(x*x')
是等效的。 (x
是复数 fft 输出,x'
是x
的复共轭)。FFT 的输入和输出大小相同。此外,实部和虚部将具有相同的大小,因为一个实部和一个虚部形成单个样本。
The 2-D FFT is seperable and you are correct in how to perform it except that you must multiply by the 2-D FFT of the 2D kernel. If you are using kissfft, an easier way to perform the 2-D FFT is to just use
kiss_fftnd
in the tools directory of the kissfft package. This will do multi-dimensional FFTs.The kernel size does not have to be any particular size. If the kernel is smaller than the image, you just need to zero-pad up to the image size before performing the 2-D FFT. You should also zero pad the image edges since the convoulution being performed by multiplication in the frequency domain is actually circular convolution and results wrap around at the edges.
So to summarize (given that the image size is M x N):
If you are performing the same filter multiple times on different images, you don't have to perform 1-3 every time.
Note: The kernel size will have to be rather large for this to be faster than direct computation of convolution. Also, did you implement your direct convolution taking advantage of the fact that a 2-D gaussian filter is separable (see this a few paragraphs into the "Mechanics" section)? That is, you can perform the 2-D convolution as 1-D convolutions on the rows and then the columns. I have found this to be faster than most FFT-based approaches unless the kernels are quite large.
Response to Edit
If the input is real, the output will still be complex except for rare circumstances. The FFT of your gaussian kernel will also be complex, so the multiply must be a complex multiplication. When you perform the inverse FFT, the output should be real since your input image and kernel are real. The output will be returned in a complex array, but the imaginary components should be zero or very small (floating point error) and can be discarded.
If you are using the same image, you can reuse the image FFT, but you will need to zero pad based on your biggest kernel size. You will have to compute the FFTs of all of the different kernels.
For visualization, the magnitude of the complex output should be used. The log scale just helps to visualize smaller components of the output when larger components would drown them out in a linear scale. The Decibel scale is often used and is given by either
20*log10(abs(x))
or10*log10(x*x')
which are equivalent. (x
is the complex fft output andx'
is the complex conjugate ofx
).The input and output of the FFT will be the same size. Also the real and imaginary parts will be the same size since one real and one imaginary value form a single sample.
请记住,空间中的卷积相当于频域中的乘法。这意味着一旦对图像和掩模(内核)执行 FFT,您只需进行逐点乘法,然后对结果进行 IFFT。话虽如此,这里有几句警告。
您可能知道,在数字信号处理中,我们经常使用循环卷积 ,而不是线性卷积。发生这种情况是因为好奇的周期性。简单来说,这意味着 DFT(以及 FFT 是其计算效率较高的变体)假设您的信号是周期性的,并且当您以这种方式过滤信号时 - 假设您的图像是 N x M 个像素——将 (1,m) 处的像素转移到 (N, m< /em>) 对于某些m<M。您的信号实际上会缠绕到自身上。这意味着您的高斯掩模将平均最右侧的像素与最左侧的像素,顶部和底部也是如此。这可能是所希望的,也可能不是所希望的,但总的来说,无论如何我们都必须处理边缘伪影。然而,在处理 FFT 乘法时更容易忘记这个问题,因为问题不再明显。有很多方法可以解决这个问题。最好的方法是简单地用零填充图像,然后删除多余的像素。
在频域中使用高斯滤波器的一个非常巧妙的事情是,您永远不需要真正进行 FFT。众所周知,高斯的傅立叶变换是高斯(一些技术细节此处)。然后您所要做的就是用零(顶部和底部)填充图像,在频域中生成高斯,将它们相乘并进行 IFFT。然后你就完成了。
希望这有帮助。
Remember that convolution in space is equivalent to multiplication in frequency domain. This means that once you perform FFT of both image and mask (kernel), you only have to do point-by-point multiplication, and then IFFT of the result. Having said that, here are a few words of caution.
You probably know that in digital signal processing, we often use circular convolution, not linear convolution. This happens because of curious periodicity. What this means in simple terms is that DFT (and FFT which is its computationally efficient variant) assumes that you signal is periodic, and when you filter your signal in such manner -- suppose your image is N x M pixels -- that it takes pixel at (1,m) to the the neighbor or pixel at (N, m) for some m<M. You signal virtually wraps around onto itself. This means that your Gaussian mask will be averaging pixels on the far right with pixels on the far left, and same goes for top and bottom. This might or might not be desired, but in general one has to deal with edging artifacts anyway. It is however much easier to forget about this issue when dealing with FFT multiplication because the problem stops being apparent. There are many ways to take care of this problem. The best way is to simply pad your image with zeros and remove the extra pixels later.
A very neat thing about using a Gaussian filter in frequency domain is that you never really have to take its FFT. It si a well-know fact that Fourier transform of a Gaussian is a Gaussian (some technical details here). All you would have to do then is pad you image with zeros (both top and bottom), generate a Gaussian in the frequency domain, multiply them together and take IFFT. Then you're done.
Hope this helps.