高斯模糊和 FFT
我正在尝试为学校项目实施高斯模糊。 我需要同时实现 CPU 和 GPU 来比较性能。
我不太确定我是否理解高斯模糊的工作原理。所以我的问题之一是 如果我理解正确的话?
这就是我现在所做的: 我使用维基百科 http://en.wikipedia.org/wiki/Gaussian_blur 中的方程来计算 过滤器。 对于 2d,我采用图像中每个像素的 RGB 并通过以下方式对其应用滤镜 将像素和周围像素的 RGB 与关联的滤镜位置相乘。 然后将这些值相加得到新的像素 RGB 值。 对于 1d,我首先水平应用过滤器,然后垂直应用过滤器,这应该给出 如果我理解正确的话,结果是一样的。 这个结果与应用 2d 滤波器时的结果完全相同吗?
我的另一个问题是如何优化算法。 我读到快速傅里叶变换适用于高斯模糊。 但我不知道如何将其联系起来。 有人能给我正确方向的提示吗?
谢谢。
I´m trying to make an implementation of Gaussian blur for a school project.
I need to make both a CPU and a GPU implementation to compare performance.
I am not quite sure that I understand how Gaussian blur works. So one of my questions is
if I have understood it correctly?
Heres what I do now:
I use the equation from wikipedia http://en.wikipedia.org/wiki/Gaussian_blur to calculate
the filter.
For 2d I take RGB of each pixel in the image and apply the filter to it by
multiplying RGB of the pixel and the surrounding pixels with the associated filter position.
These are then summed to be the new pixel RGB values.
For 1d I apply the filter first horizontally and then vetically, which should give
the same result if I understand things correctly.
Is this result exactly the same result as when the 2d filter is applied?
Another question I have is about how the algorithm can be optimized.
I have read that the Fast Fourier Transform is applicable to Gaussian blur.
But I can't figure out how to relate it.
Can someone give me a hint in the right direction?
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
是的,2D 高斯核是可分离的,所以你可以只需将其作为两个一维内核应用即可。请注意,您无法“就地”应用这些操作 - 您至少需要一个临时缓冲区来存储第一个一维传递的结果。
当您拥有大内核时,基于 FFT 的卷积是一种有用的优化 - 这适用于任何类型的滤波器,而不仅仅是高斯滤波器。 “大”到底有多大取决于您的架构,但您可能不想担心对小于 49x49 内核的任何东西使用基于 FFT 的方法。一般方法是:
请注意,如果您应用的是使用相同的过滤器处理多个图像,那么您只需要对填充的内核进行 FFT 一次。不过,每个图像仍然至少需要执行两次 FFT(一次正向和一次反向),这就是为什么这种技术仅成为大型内核的计算优势。
Yes, the 2D Gaussian kernel is separable so you can just apply it as two 1D kernels. Note that you can't apply these operations "in place" however - you need at least one temporary buffer to store the result of the first 1D pass.
FFT-based convolution is a useful optimisation when you have large kernels - this applies to any kind of filter, not just Gaussian. Just how big "large" is depends on your architecture, but you probably don't want to worry about using an FFT-based approach for anything smaller than, say, a 49x49 kernel. The general approach is:
Note that if you're applying the same filter to more than one image then you only need to FFT the padded kernel once. You still have at least two FFTs to perform per image though (one forward and one inverse), which is why this technique only becomes a computational win for large-ish kernels.
我非常快速地编写了一个详尽的算法,并对其进行了优化,以将真正的高斯模糊应用于图像,不同的方法(OpenCV gaussian、pocketfft1D、pocketfft2D、pffft)经过实验并可用于测试,所有方法都可以相互进行 1:1 比较,我是两年后目前变得更加优雅。 Blur_algorithms
假设:
图像被视为 2D,但可以按图块逐行处理对于三个通道中的每一个,行、转置、然后逐行再次进行。这种方法确保了内存和缓存友好的操作以及同时独立使用多个线程的可能性
2D 高斯内核可以分解为 2x1D 维度,这将允许分别将其应用于行和列
频域中的卷积只是图像的图块和图块之间的复数乘法(实部和虚部)高斯核的一维维度
这里有一件好事:核居中很重要,这将避免循环环绕问题,但不仅如此,居中内核的虚部为 0!这将大大简化之前提到的“复数乘法”,因为虚部为 0,将变得简单
dft_a_real = dft_a_real * dft_b_real * 缩放器;
dft_a_imaginary = dft_a_imaginary * dft_b_real * 缩放器;
填充图像的边框进行“反射”以避免边缘出现暗模糊很重要,这是通过对每个图像使用 std::reverse_iterator 来完成的平铺
最快、最简单的可移植库,仅标头,我使用过Pommier 的 pffft,pocketfft 也非常快,并且具有 1D 和 2D 方法(后者更容易实现,但不太内存友好,并且在并行计算方面似乎没有那么优化)
有关更多技术解释和更清晰的文档,请参阅我的存储库 Blur_algorithms,我认为这对于学术目的非常有用,不仅如此,我目前正在 WebAsssembly 中重写一个更容易运行和测试的代码,也许还有一些插件。
基准 3 通道图像、M3 pro 12 核、尺寸不断增加、OpenCV、vs pocketfft1D、pocketfft2D 与 pffft(仅限1D)
I wrote an exhaustive algorithm very fast and optimized for apply a true Gaussian Blur to an image, different approaches (OpenCV gaussian, pocketfft1D, pocketfft2D, pffft) are experimented and are available for testing, all comparable 1:1 against each other, I am currently making more elegant after two years. Blur_algorithms
Assuming that:
An image is considered 2D but it can be processed for tiles, row by row, transposed, and again row by row, for each of the three channels. This approach ensure a memory and cache friendly operations and the possibility to use many threads at same time indipendently
A gaussian kernel 2D can be decomposed in 2x1D dimensions, this will allow to apply it for rows and for columns separately
The convolution in the frequency domain is just the complex multiplication (real and imaginary parts) between the tile of the image and the 1D dimension of the gaussian kernel
Here a nice thing: is important that the kernel is centered, this will avoid the circular-wrap issue, but not only, the imaginary part of a centered kernel is 0! this will simplify a lot the "complex multiplication" mentioned before since the imaginary part is 0, will be come a simple
dft_a_real = dft_a_real * dft_b_real * scaler;
dft_a_imaginary = dft_a_imaginary * dft_b_real * scaler;
Is important to padd the borders of the image doing a "reflection" to avoid dark blur on the edges, this is done using std::reverse_iterator for each tile
The fastest and easiest portable library, header only, that I used if pffft by Pommier, also pocketfft is very fast and has both 1D and 2D approaches (the latter is easier to implement but not so memory friendly and doesn't seems so optimized in terms of parallel computing)
For more technical explanation and kinda more clear documentation see my repository Blur_algorithms, I think is very useful for academic purposes and not only, I am currently rewriting in WebAsssembly a more easier code to run and test and perhaps some plugin around.
benchmark 3 channel images, M3 pro 12 cores, at increasing sizes, OpenCV, vs pocketfft1D, pocketfft2D vs pffft(only 1D)