高斯模糊与 FFT 问题

发布于 2024-11-30 08:22:17 字数 820 浏览 6 评论 0原文

我目前有一个使用常规卷积的高斯模糊实现。对于小内核来说它足够高效，但是一旦内核尺寸变大，性能就会受到影响。所以，我正在考虑使用FFT来实现卷积。我从来没有任何与 FFT 相关的图像处理经验，所以我有几个问题。

基于 2D FFT 的卷积是否也可分为两个 1D 卷积？
- 如果为真，它是否会像这样 - 对每行进行 1D FFT，然后对每列进行 1D FFT，然后与 2D 内核相乘，然后对每列进行逆变换，对每行进行逆变换？或者我是否必须在每次 1D FFT 变换后乘以 1D 内核？
现在我明白内核大小应该与图像大小相同（一维情况下的行）。但它会如何影响边缘呢？我必须用零填充图像边缘吗？如果是这样，内核大小应该等于填充之前或之后的图像大小？

另外，这是一个 C++ 项目，我计划使用 KissFFT，因为这是一个商业项目。欢迎您提出更好的替代方案。谢谢。

编辑：感谢您的回复，但我还有几个问题。

我看到输入图像的虚部将全为零。但输出虚部也会为零吗？我是否必须将高斯核乘以实部和虚部？
我有相同图像的实例在不同的比例下被模糊，即相同的图像被缩放到不同的大小并在不同的内核大小下模糊。每次缩放图像时是否都必须执行 FFT，还是可以使用相同的 FFT？
最后，如果我想可视化 FFT，我知道必须对 FFT 应用对数滤波器。但我真的不知道应该使用哪一部分来可视化 FFT？实部或虚部。
另外，对于尺寸为 512x512 的图像，实部和虚部的尺寸是多少。它们的长度相同吗？

再次感谢您的详细回复。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

如果没有你 2024-12-07 08:22:17

2-D FFT 是可分离的，并且您的执行方式是正确的，只是您必须乘以 2D 内核的 2-D FFT。如果您使用 Kissfft，执行 2-D FFT 的更简单方法是仅使用 Kissfft 包的工具目录中的 kiss_fftnd 。这将执行多维 FFT。
内核大小不必是任何特定大小。如果内核小于图像，则只需在执行 2-D FFT 之前将零填充到图像大小。您还应该对图像边缘进行零填充，因为通过频域中的乘法执行的卷积实际上是循环卷积，并且结果在边缘处环绕。

总结一下（假设图像大小为 M x N）：

提出任意大小 (U x V) 的 2-D 内核，
将内核零填充到 (M+U-1) x (N+V) -1)
取内核零填充图像的 2-D FFT
，直到 (M+U-1) x (N+V-1)
取图像的 2-D FFT
将内核的 FFT 乘以图像
采用逆二维结果的 FFT
修剪掉边缘的垃圾

如果您在不同的图像上多次执行相同的过滤器，则不必每次都执行 1-3。

注意：内核大小必须相当大，才能比直接计算卷积更快。另外，您是否利用二维高斯滤波器是可分离的这一事实来实现直接卷积（参见此几段进入“力学”部分）？也就是说，您可以先对行执行 2-D 卷积，然后再对列执行 1-D 卷积。我发现这比大多数基于 FFT 的方法更快，除非内核非常大。

对编辑的响应

如果输入是真实的，除了极少数情况外，输出仍然会很复杂。高斯核的 FFT 也很复杂，因此乘法必须是复数乘法。当您执行逆 FFT 时，输出应该是真实的，因为您的输入图像和内核是真实的。输出将以复数数组形式返回，但虚部应该为零或非常小（浮点错误）并且可以被丢弃。
如果您使用相同的图像，您可以重复使用图像 FFT，但您需要根据最大内核大小进行零填充。您必须计算所有不同内核的 FFT。
为了可视化，应使用复数输出的大小。当较大的组件在线性比例中淹没它们时，对数比例仅有助于可视化输出的较小组件。经常使用分贝标度，由 20*log10(abs(x )) 或 10*log10(x*x') 是等效的。（x 是复数 fft 输出，x' 是 x 的复共轭）。
FFT 的输入和输出大小相同。此外，实部和虚部将具有相同的大小，因为一个实部和一个虚部形成单个样本。

The 2-D FFT is seperable and you are correct in how to perform it except that you must multiply by the 2-D FFT of the 2D kernel. If you are using kissfft, an easier way to perform the 2-D FFT is to just use kiss_fftnd in the tools directory of the kissfft package. This will do multi-dimensional FFTs.
The kernel size does not have to be any particular size. If the kernel is smaller than the image, you just need to zero-pad up to the image size before performing the 2-D FFT. You should also zero pad the image edges since the convoulution being performed by multiplication in the frequency domain is actually circular convolution and results wrap around at the edges.

So to summarize (given that the image size is M x N):

come up with a 2-D kernel of any size (U x V)
zero-pad the kernel up to (M+U-1) x (N+V-1)
take the 2-D fft of the kernel
zero-pad the image up to (M+U-1) x (N+V-1)
take the 2-D FFT of the image
multiply FFT of kernel by FFT of image
take inverse 2-D FFT of result
trim off garbage at edges

If you are performing the same filter multiple times on different images, you don't have to perform 1-3 every time.

Note: The kernel size will have to be rather large for this to be faster than direct computation of convolution. Also, did you implement your direct convolution taking advantage of the fact that a 2-D gaussian filter is separable (see this a few paragraphs into the "Mechanics" section)? That is, you can perform the 2-D convolution as 1-D convolutions on the rows and then the columns. I have found this to be faster than most FFT-based approaches unless the kernels are quite large.

Response to Edit

If the input is real, the output will still be complex except for rare circumstances. The FFT of your gaussian kernel will also be complex, so the multiply must be a complex multiplication. When you perform the inverse FFT, the output should be real since your input image and kernel are real. The output will be returned in a complex array, but the imaginary components should be zero or very small (floating point error) and can be discarded.
If you are using the same image, you can reuse the image FFT, but you will need to zero pad based on your biggest kernel size. You will have to compute the FFTs of all of the different kernels.
For visualization, the magnitude of the complex output should be used. The log scale just helps to visualize smaller components of the output when larger components would drown them out in a linear scale. The Decibel scale is often used and is given by either 20*log10(abs(x)) or 10*log10(x*x') which are equivalent. (x is the complex fft output and x' is the complex conjugate of x).
The input and output of the FFT will be the same size. Also the real and imaginary parts will be the same size since one real and one imaginary value form a single sample.

回复收藏 0 原文

丧 2024-12-07 08:22:17

请记住，空间中的卷积相当于频域中的乘法。这意味着一旦对图像和掩模（内核）执行 FFT，您只需进行逐点乘法，然后对结果进行 IFFT。话虽如此，这里有几句警告。

您可能知道，在数字信号处理中，我们经常使用循环卷积，而不是线性卷积。发生这种情况是因为好奇的周期性。简单来说，这意味着 DFT（以及 FFT 是其计算效率较高的变体）假设您的信号是周期性的，并且当您以这种方式过滤信号时 - 假设您的图像是 N x M 个像素——将 (1,m) 处的像素转移到 (N, m< /em>) 对于某些m<M。您的信号实际上会缠绕到自身上。这意味着您的高斯掩模将平均最右侧的像素与最左侧的像素，顶部和底部也是如此。这可能是所希望的，也可能不是所希望的，但总的来说，无论如何我们都必须处理边缘伪影。然而，在处理 FFT 乘法时更容易忘记这个问题，因为问题不再明显。有很多方法可以解决这个问题。最好的方法是简单地用零填充图像，然后删除多余的像素。

在频域中使用高斯滤波器的一个非常巧妙的事情是，您永远不需要真正进行 FFT。众所周知，高斯的傅立叶变换是高斯（一些技术细节此处）。然后您所要做的就是用零（顶部和底部）填充图像，在频域中生成高斯，将它们相乘并进行 IFFT。然后你就完成了。

希望这有帮助。