最快的高斯模糊实现

绝對不後悔。 2024-07-11 16:17:01

我发现Quasimondo：孵化器：处理：快速高斯模糊。该方法包含很多近似值，例如使用整数和查找表而不是浮点数和浮点除法。我不知道现代 Java 代码有多少加速。
矩形上的快速阴影有一个使用B 样条。
C#中的快速高斯模糊算法 声称有一些很酷的优化。
此外，快速高斯模糊 (PDF) David Everly 有一种快速的高斯模糊处理方法。

我会尝试各种方法，对它们进行基准测试并将结果发布在这里。

出于我的目的，我从互联网上复制并实现了基本（独立处理 XY 轴）方法和 David Everly 的快速高斯模糊方法。它们的参数不同，所以我无法直接比较它们。然而，对于大的模糊半径，后者经历的迭代次数要少得多。而且，后者是一种近似算法。

回复收藏 0 原文

岁吢 2024-07-11 16:17:01

使用现已实现（截至 2016 年）的新库来回答这个老问题，因为 Java 的 GPU 技术有许多新的进步。

正如其他几个答案所建议的，CUDA 是一种替代方案。但是java 现在支持 CUDA。

IBM CUDA4J 库：提供用于管理和访问 GPU 设备、库、内核和内存的 Java API。使用这些新的 API，可以编写管理 GPU 设备特性的 Java 程序，并利用 Java 内存模型、异常和自动资源管理的便利性将工作卸载到 GPU。

Jcuda：NVIDIA CUDA 和相关库的 Java 绑定。使用 JCuda，可以从 Java 程序与 CUDA 运行时和驱动程序 API 进行交互。

Aparapi：允许Java开发人员通过在GPU上执行数据并行代码片段来利用GPU和APU设备的计算能力，而不是局限于本地CPU。

一些Java OpenCL绑定库

https://github.com/ ochafik/JavaCL ：OpenCL 的 Java 绑定：面向对象的 OpenCL 库，基于自动生成的低级绑定

http://jogamp.org/jocl/www/：OpenCL 的 Java 绑定：面向对象的 OpenCL 库，基于自动生成的低级绑定

http://www.lwjgl.org/ ：OpenCL 的 Java 绑定：自动生成的低级绑定和面向对象的便利类

http://jocl.org/ ：OpenCL 的 Java 绑定：低级绑定，是原始 OpenCL API 的 1:1 映射

上述所有库都将有助于比 CPU 上的 Java 实现更快地实现高斯模糊。

回复收藏 0 原文

萌无敌 2024-07-11 16:17:01

您可能需要框模糊，这样速度要快得多。请参阅此链接获取精彩教程和一些复制 & 粘贴 C 代码。

回复收藏 0 原文

绿阴红影里的.如风往事 2024-07-11 16:17:01

对于更大的模糊半径，请尝试应用框模糊三次。这将非常接近高斯模糊，并且比真正的高斯模糊快得多。

回复收藏 0 原文

花桑 2024-07-11 16:17:01

我已将 Ivan Kuckir 的快速高斯模糊实现转换为 java，该实现使用带有线性框模糊的三遍。正如他在他自己的博客中所说，最终的过程是 O(n)。如果您想了解更多关于为什么 3 次框模糊近似于高斯模糊（3%）的信息，我的朋友，您可以查看框模糊和高斯模糊。

这是java实现。

@Override
public BufferedImage ProcessImage(BufferedImage image) {
    int width = image.getWidth();
    int height = image.getHeight();

    int[] pixels = image.getRGB(0, 0, width, height, null, 0, width);
    int[] changedPixels = new int[pixels.length];

    FastGaussianBlur(pixels, changedPixels, width, height, 12);

    BufferedImage newImage = new BufferedImage(width, height, image.getType());
    newImage.setRGB(0, 0, width, height, changedPixels, 0, width);

    return newImage;
}

private void FastGaussianBlur(int[] source, int[] output, int width, int height, int radius) {
    ArrayList<Integer> gaussianBoxes = CreateGausianBoxes(radius, 3);
    BoxBlur(source, output, width, height, (gaussianBoxes.get(0) - 1) / 2);
    BoxBlur(output, source, width, height, (gaussianBoxes.get(1) - 1) / 2);
    BoxBlur(source, output, width, height, (gaussianBoxes.get(2) - 1) / 2);
}

private ArrayList<Integer> CreateGausianBoxes(double sigma, int n) {
    double idealFilterWidth = Math.sqrt((12 * sigma * sigma / n) + 1);

    int filterWidth = (int) Math.floor(idealFilterWidth);

    if (filterWidth % 2 == 0) {
        filterWidth--;
    }

    int filterWidthU = filterWidth + 2;

    double mIdeal = (12 * sigma * sigma - n * filterWidth * filterWidth - 4 * n * filterWidth - 3 * n) / (-4 * filterWidth - 4);
    double m = Math.round(mIdeal);

    ArrayList<Integer> result = new ArrayList<>();

    for (int i = 0; i < n; i++) {
        result.add(i < m ? filterWidth : filterWidthU);
    }

    return result;
}

private void BoxBlur(int[] source, int[] output, int width, int height, int radius) {
    System.arraycopy(source, 0, output, 0, source.length);
    BoxBlurHorizantal(output, source, width, height, radius);
    BoxBlurVertical(source, output, width, height, radius);
}

private void BoxBlurHorizontal(int[] sourcePixels, int[] outputPixels, int width, int height, int radius) {
    int resultingColorPixel;
    float iarr = 1f / (radius + radius);
    for (int i = 0; i < height; i++) {
        int outputIndex = i * width;
        int li = outputIndex;
        int sourceIndex = outputIndex + radius;

        int fv = Byte.toUnsignedInt((byte) sourcePixels[outputIndex]);
        int lv = Byte.toUnsignedInt((byte) sourcePixels[outputIndex + width - 1]);
        float val = (radius) * fv;

        for (int j = 0; j < radius; j++) {
            val += Byte.toUnsignedInt((byte) (sourcePixels[outputIndex + j]));
        }

        for (int j = 0; j < radius; j++) {
            val += Byte.toUnsignedInt((byte) sourcePixels[sourceIndex++]) - fv;
            resultingColorPixel = Byte.toUnsignedInt(((Integer) Math.round(val * iarr)).byteValue());
            outputPixels[outputIndex++] = (0xFF << 24) | (resultingColorPixel << 16) | (resultingColorPixel << 8) | (resultingColorPixel);
        }

        for (int j = (radius + 1); j < (width - radius); j++) {
            val += Byte.toUnsignedInt((byte) sourcePixels[sourceIndex++]) - Byte.toUnsignedInt((byte) sourcePixels[li++]);
            resultingColorPixel = Byte.toUnsignedInt(((Integer) Math.round(val * iarr)).byteValue());
            outputPixels[outputIndex++] = (0xFF << 24) | (resultingColorPixel << 16) | (resultingColorPixel << 8) | (resultingColorPixel);
        }

        for (int j = (width - radius); j < width; j++) {
            val += lv - Byte.toUnsignedInt((byte) sourcePixels[li++]);
            resultingColorPixel = Byte.toUnsignedInt(((Integer) Math.round(val * iarr)).byteValue());
            outputPixels[outputIndex++] = (0xFF << 24) | (resultingColorPixel << 16) | (resultingColorPixel << 8) | (resultingColorPixel);
        }
    }
}

private void BoxBlurVertical(int[] sourcePixels, int[] outputPixels, int width, int height, int radius) {
    int resultingColorPixel;
    float iarr = 1f / (radius + radius + 1);
    for (int i = 0; i < width; i++) {
        int outputIndex = i;
        int li = outputIndex;
        int sourceIndex = outputIndex + radius * width;

        int fv = Byte.toUnsignedInt((byte) sourcePixels[outputIndex]);
        int lv = Byte.toUnsignedInt((byte) sourcePixels[outputIndex + width * (height - 1)]);
        float val = (radius + 1) * fv;

        for (int j = 0; j < radius; j++) {
            val += Byte.toUnsignedInt((byte) sourcePixels[outputIndex + j * width]);
        }
        for (int j = 0; j <= radius; j++) {
            val += Byte.toUnsignedInt((byte) sourcePixels[sourceIndex]) - fv;
            resultingColorPixel = Byte.toUnsignedInt(((Integer) Math.round(val * iarr)).byteValue());
            outputPixels[outputIndex] = (0xFF << 24) | (resultingColorPixel << 16) | (resultingColorPixel << 8) | (resultingColorPixel);
            sourceIndex += width;
            outputIndex += width;
        }
        for (int j = radius + 1; j < (height - radius); j++) {
            val += Byte.toUnsignedInt((byte) sourcePixels[sourceIndex]) - Byte.toUnsignedInt((byte) sourcePixels[li]);
            resultingColorPixel = Byte.toUnsignedInt(((Integer) Math.round(val * iarr)).byteValue());
            outputPixels[outputIndex] = (0xFF << 24) | (resultingColorPixel << 16) | (resultingColorPixel << 8) | (resultingColorPixel);
            li += width;
            sourceIndex += width;
            outputIndex += width;
        }
        for (int j = (height - radius); j < height; j++) {
            val += lv - Byte.toUnsignedInt((byte) sourcePixels[li]);
            resultingColorPixel = Byte.toUnsignedInt(((Integer) Math.round(val * iarr)).byteValue());
            outputPixels[outputIndex] = (0xFF << 24) | (resultingColorPixel << 16) | (resultingColorPixel << 8) | (resultingColorPixel);
            li += width;
            outputIndex += width;
        }
    }
}

I have converted Ivan Kuckir's implementation of a fast Gaussian blur which uses three passes with linear box blurs to java. The resulting process is O(n) as he has stated at his own blog. If you would like to learn more about why does 3 time box blur approximates to Gaussian blur(3%) my friend you may check out box blur and Gaussian blur.

Here is the java implementation.

@Override
public BufferedImage ProcessImage(BufferedImage image) {
    int width = image.getWidth();
    int height = image.getHeight();

    int[] pixels = image.getRGB(0, 0, width, height, null, 0, width);
    int[] changedPixels = new int[pixels.length];

    FastGaussianBlur(pixels, changedPixels, width, height, 12);

    BufferedImage newImage = new BufferedImage(width, height, image.getType());
    newImage.setRGB(0, 0, width, height, changedPixels, 0, width);

    return newImage;
}

private void FastGaussianBlur(int[] source, int[] output, int width, int height, int radius) {
    ArrayList<Integer> gaussianBoxes = CreateGausianBoxes(radius, 3);
    BoxBlur(source, output, width, height, (gaussianBoxes.get(0) - 1) / 2);
    BoxBlur(output, source, width, height, (gaussianBoxes.get(1) - 1) / 2);
    BoxBlur(source, output, width, height, (gaussianBoxes.get(2) - 1) / 2);
}

private ArrayList<Integer> CreateGausianBoxes(double sigma, int n) {
    double idealFilterWidth = Math.sqrt((12 * sigma * sigma / n) + 1);

    int filterWidth = (int) Math.floor(idealFilterWidth);

    if (filterWidth % 2 == 0) {
        filterWidth--;
    }

    int filterWidthU = filterWidth + 2;

    double mIdeal = (12 * sigma * sigma - n * filterWidth * filterWidth - 4 * n * filterWidth - 3 * n) / (-4 * filterWidth - 4);
    double m = Math.round(mIdeal);

    ArrayList<Integer> result = new ArrayList<>();

    for (int i = 0; i < n; i++) {
        result.add(i < m ? filterWidth : filterWidthU);
    }

    return result;
}

private void BoxBlur(int[] source, int[] output, int width, int height, int radius) {
    System.arraycopy(source, 0, output, 0, source.length);
    BoxBlurHorizantal(output, source, width, height, radius);
    BoxBlurVertical(source, output, width, height, radius);
}

private void BoxBlurHorizontal(int[] sourcePixels, int[] outputPixels, int width, int height, int radius) {
    int resultingColorPixel;
    float iarr = 1f / (radius + radius);
    for (int i = 0; i < height; i++) {
        int outputIndex = i * width;
        int li = outputIndex;
        int sourceIndex = outputIndex + radius;

        int fv = Byte.toUnsignedInt((byte) sourcePixels[outputIndex]);
        int lv = Byte.toUnsignedInt((byte) sourcePixels[outputIndex + width - 1]);
        float val = (radius) * fv;

        for (int j = 0; j < radius; j++) {
            val += Byte.toUnsignedInt((byte) (sourcePixels[outputIndex + j]));
        }

        for (int j = 0; j < radius; j++) {
            val += Byte.toUnsignedInt((byte) sourcePixels[sourceIndex++]) - fv;
            resultingColorPixel = Byte.toUnsignedInt(((Integer) Math.round(val * iarr)).byteValue());
            outputPixels[outputIndex++] = (0xFF << 24) | (resultingColorPixel << 16) | (resultingColorPixel << 8) | (resultingColorPixel);
        }

        for (int j = (radius + 1); j < (width - radius); j++) {
            val += Byte.toUnsignedInt((byte) sourcePixels[sourceIndex++]) - Byte.toUnsignedInt((byte) sourcePixels[li++]);
            resultingColorPixel = Byte.toUnsignedInt(((Integer) Math.round(val * iarr)).byteValue());
            outputPixels[outputIndex++] = (0xFF << 24) | (resultingColorPixel << 16) | (resultingColorPixel << 8) | (resultingColorPixel);
        }

        for (int j = (width - radius); j < width; j++) {
            val += lv - Byte.toUnsignedInt((byte) sourcePixels[li++]);
            resultingColorPixel = Byte.toUnsignedInt(((Integer) Math.round(val * iarr)).byteValue());
            outputPixels[outputIndex++] = (0xFF << 24) | (resultingColorPixel << 16) | (resultingColorPixel << 8) | (resultingColorPixel);
        }
    }
}

private void BoxBlurVertical(int[] sourcePixels, int[] outputPixels, int width, int height, int radius) {
    int resultingColorPixel;
    float iarr = 1f / (radius + radius + 1);
    for (int i = 0; i < width; i++) {
        int outputIndex = i;
        int li = outputIndex;
        int sourceIndex = outputIndex + radius * width;

        int fv = Byte.toUnsignedInt((byte) sourcePixels[outputIndex]);
        int lv = Byte.toUnsignedInt((byte) sourcePixels[outputIndex + width * (height - 1)]);
        float val = (radius + 1) * fv;

        for (int j = 0; j < radius; j++) {
            val += Byte.toUnsignedInt((byte) sourcePixels[outputIndex + j * width]);
        }
        for (int j = 0; j <= radius; j++) {
            val += Byte.toUnsignedInt((byte) sourcePixels[sourceIndex]) - fv;
            resultingColorPixel = Byte.toUnsignedInt(((Integer) Math.round(val * iarr)).byteValue());
            outputPixels[outputIndex] = (0xFF << 24) | (resultingColorPixel << 16) | (resultingColorPixel << 8) | (resultingColorPixel);
            sourceIndex += width;
            outputIndex += width;
        }
        for (int j = radius + 1; j < (height - radius); j++) {
            val += Byte.toUnsignedInt((byte) sourcePixels[sourceIndex]) - Byte.toUnsignedInt((byte) sourcePixels[li]);
            resultingColorPixel = Byte.toUnsignedInt(((Integer) Math.round(val * iarr)).byteValue());
            outputPixels[outputIndex] = (0xFF << 24) | (resultingColorPixel << 16) | (resultingColorPixel << 8) | (resultingColorPixel);
            li += width;
            sourceIndex += width;
            outputIndex += width;
        }
        for (int j = (height - radius); j < height; j++) {
            val += lv - Byte.toUnsignedInt((byte) sourcePixels[li]);
            resultingColorPixel = Byte.toUnsignedInt(((Integer) Math.round(val * iarr)).byteValue());
            outputPixels[outputIndex] = (0xFF << 24) | (resultingColorPixel << 16) | (resultingColorPixel << 8) | (resultingColorPixel);
            li += width;
            outputIndex += width;
        }
    }
}

回复收藏 0 原文

饮惑 2024-07-11 16:17:01

我会考虑为此使用 CUDA 或其他一些 GPU 编程工具包，特别是如果您想使用更大的内核。如果做不到这一点，总是需要在装配中手动调整循环。

回复收藏 0 原文

千纸鹤 2024-07-11 16:17:01

我在不同的地方看到了几个答案，现在将它们收集在这里，以便我可以尝试集中注意力并记住它们以供以后使用：

无论您使用哪种方法，使用一维滤波器分别过滤水平和垂直尺寸，而不是使用单个方形滤波器。

标准的“慢”方法：卷积滤波器
分辨率降低的图像分层金字塔，如 SIFT
受中心极限定理启发的重复框模糊。盒子模糊是 Viola 和 Jones 面部检测的核心，如果我没记错的话，他们称之为完整图像。我认为类似 Haar 的特征也使用类似的东西。
堆栈模糊：一种介于卷积和框模糊方法之间的基于队列的替代方法
IIR滤波器
- Derich 滤波器 (维基百科) 二阶 IIR 滤波器
- van Vliet 滤波器我对此一无所知
- 贝塞尔滤波器尽管对此存在一些争论

在回顾了所有这些之后，我想起了这一点简单的、较差的近似在实践中通常效果很好。在另一个领域，Alex Krizhevsky 发现 ReLU 比他开创性的 AlexNet 中的经典 sigmoid 函数更快，尽管乍一看它们与 Sigmoid 非常接近。

回复收藏 0 原文

§对你不离不弃 2024-07-11 16:17:01

有几种快速方法可用于二维数据的高斯模糊。你应该了解什么。

这是可分离滤波器，因此只需要两个一维卷积。
对于大内核，您可以处理缩小的图像副本，然后再放大。
可以通过多个盒式滤波器（也可分离）来完成良好的近似，（可以调整迭代次数和内核大小）
存在 O(n) 复杂度算法（对于任何内核大小），通过 IIR 滤波器进行精确的高斯近似。

您的选择取决于所需的速度、精度和实现复杂性。

回复收藏 0 原文

浅浅 2024-07-11 16:17:01

CWP 的 Dave Hale 有一个minejtk 包，其中包括递归高斯滤波器（Deriche 方法和 Van Vliet 方法）。 java子例程可以在 https://github.com/dhale/jtk/blob/0350c23f91256181d415ea7369dbd62855ac4460/core/src/main/java/edu/mines/jtk/dsp/RecursiveGaussianFilter.java

Deriche的方法似乎是一个对于高斯模糊（以及高斯导数）来说非常好。

回复收藏 0 原文

二智少女 2024-07-11 16:17:01

尝试像我在这里那样使用 Box Blur：
使用扩展盒模糊来近似高斯模糊

这是最好的近似值。

使用积分图像可以使其更快。
如果您这样做，请分享您的解决方案。

回复收藏 0 原文

江城子 2024-07-11 16:17:01

我在研究中努力解决这个问题，并尝试了有趣的快速高斯模糊方法。首先，如上所述，最好将模糊分成两个一维模糊，但根据实际计算像素值的硬件，您实际上可以预先计算所有可能的值并将它们存储在查找表中。

换句话说，预先计算高斯系数 * 输入像素值的每个组合。当然，您需要离散化系数，但我只是想添加这个解决方案。如果您有 IEEE 订阅，您可以在 使用查找表进行快速图像模糊以进行实时特征提取。

最终，我最终使用了 CUDA :)

回复收藏 0 原文

不醒的梦 2024-07-11 16:17:01

在一维中：

重复使用几乎任何内核进行模糊将趋向于高斯内核。这就是高斯分布的魅力所在，也是统计学家喜欢它的原因。因此，选择容易模糊的东西并多次应用。

例如，使用盒形内核很容易模糊。首先计算累计和：

y(i) = y(i-1) + x(i)

然后：

blurred(i) = y(i+radius) - y(i-radius)

重复几次。

或者您可能会使用某种 IIR 过滤器来回几次，这些是同样快。

在 2D 或更高维度中：

在每个维度中一个接一个地模糊，如 DarenW 所说。

In 1D:

Blurring using almost any kernel repeatedly will tend to a Gaussian kernel. This is what's so cool about the Gaussian distribution, and is why statisticians like it. So choose something that's easy to blur with and apply it several times.

For example, it's easy to blur with a box shaped kernel. First calculate a cumulative sum:

y(i) = y(i-1) + x(i)

then:

blurred(i) = y(i+radius) - y(i-radius)

Repeat several times.

Or you might go back and forth a few times with some variety of an IIR filter, these are similarly fast.

In 2D or higher:

Blur in each dimension one after the other, as DarenW said.

回复收藏 0 原文

冷︶言冷语的世界 2024-07-11 16:17:01

您应该利用高斯核是可分离的这一事实，即您可以将 2D 卷积表示为两个 1D 卷积的组合。

如果滤波器很大，那么使用空间域中的卷积相当于频域（傅立叶）域中的乘法这一事实也可能有意义。这意味着您可以对图像和滤波器进行傅里叶变换，将（复数）结果相乘，然后进行傅里叶逆变换。 FFT（快速傅立叶变换）的复杂度为 O(n log n)，而卷积的复杂度为 O(n^2)。此外，如果您需要使用相同的滤镜对许多图像进行模糊处理，则只需对滤镜进行一次 FFT。

如果您决定使用 FFT，FFTW 库是一个不错的选择。

回复收藏 0 原文

差↓一点笑了 2024-07-11 16:17:01

步骤 1：SIMD 一维高斯模糊
步骤 2：转置
步骤 3：重复步骤 1

最好在小块上完成，因为全图像转置速度很慢，而使用链可以非常快地完成小块转置PUNPCK (PUNPCKHBW、PUNPCKHDQ、PUNPCKHWD、PUNPCKLBW、PUNPCKLDQ、PUNPCKLWD）。

回复收藏 0 原文

孤独岁月 2024-07-11 16:17:01

以下是性能方面从最差到最佳的 3 种解决方案。这是受到此页面的启发。

一个简单的实现是循环每个像素，然后计算相邻像素的加权平均值。即使使用预先计算的权重表，这也是非常慢的。

更快的方法是使用框模糊。它与高斯类似，但权重是恒定的。如果应用多个通道（例如：3 个通道），它会提供非常好的高斯模糊近似值。

for (y = 0 to h-1) {
    for (x = 0 to w-1) {
        sum = 0;
        for (j = -radius to radius) {  //box blur
            for (i = -radius to radius) {
                sum += input[x + i, y + j]
            }
        }           
        output[x, y] = sum / (radius * 2 + 1)
    }
}

注意：这是伪代码。没有边界检查，但它很容易实现。

复杂度：O(n radius^2)

正如有人在之前答案的评论中所述，二维高斯可以分解为两个一维高斯（一个垂直，一个水平）：

for (y = 0 to h-1) {
    for (x = 0 to w-1) { //horizontal   
        sum = 0;
        for (r = -radius to radius) { 
            sum += input[x + r, y]
        }           
        tmp[x, y] = sum / (radius * 2 + 1)
    }
}
for (x = 0 to w-1) {  
    for (y = 0 to h-1) { //vertical 
        sum = 0;
        for (r = -radius to radius) { 
            sum += tmp[x, y + r]
        }           
        output[x, y] = sum / (radius * 2 + 1)
    }
}

复杂度： O(n radius)

如果您仔细观察第二个算法，您会发现我们总是一遍又一遍地总结相同的值。

总和 = p0 + p1 + p2 + p3 + p4 + p5 + p6
总和 = p1 + p2 + p3 + p4 + p5 + p6 + p7
总和 = p2 + p3 + p4 + p5 + p6 + p7 + p8

这可以简化：

总和 = p0 + p1 + p2 + p3 + p4 + p5 + p6
总和 = 总和 - p0 + p7
总和 = 总和 - p1 + p8

伪代码：

for (y = 0 to h-1) { 
    sum = 0;
    for (r = -radius to radius) {
        sum += input[x + r, y]
    }

    for (x = 0 to w-1) { //horizontal   
        tmp[x, y] = sum / (radius * 2 + 1)
        sum -= input[x - radius, y]
        sum += input[x + radius + 1, y]
    }
}   
for (x = 0 to w-1) {        
    sum = 0;
    for (r = -radius to radius) {
        sum += tmp[x, y + r]
    }

    for (y = 0 to h-1) { //vertical         
        output[x, y] = sum / (radius * 2 + 1)
        sum -= tmp[x, y - radius]
        sum += tmp[x, y + radius + 1]
    }
}

复杂度：O(n)

Here is 3 solutions, from worst to best, in terms of performance. This is inspired from this page.

A naïve implementation is to loop on each pixel and then to calculate the weighted average of the adjacent pixels. This is very slow, even if a pre-calculated weight table is used.

A faster approach is to use a box blur. It's similar to gaussian but weight is constant. If multiple passes are applied (eg: 3 passes) it gives a very good approximation of a gaussian blur.

for (y = 0 to h-1) {
    for (x = 0 to w-1) {
        sum = 0;
        for (j = -radius to radius) {  //box blur
            for (i = -radius to radius) {
                sum += input[x + i, y + j]
            }
        }           
        output[x, y] = sum / (radius * 2 + 1)
    }
}

Note: it's pseudocode. there is no bounds checking but it's fairly easy to implement.

Complexity : O(n radius^2)

As someone stated in the comments of a previous answer, the 2D gaussian can be decomposed into two 1D ones (one vertical and one horizontal) :

for (y = 0 to h-1) {
    for (x = 0 to w-1) { //horizontal   
        sum = 0;
        for (r = -radius to radius) { 
            sum += input[x + r, y]
        }           
        tmp[x, y] = sum / (radius * 2 + 1)
    }
}
for (x = 0 to w-1) {  
    for (y = 0 to h-1) { //vertical 
        sum = 0;
        for (r = -radius to radius) { 
            sum += tmp[x, y + r]
        }           
        output[x, y] = sum / (radius * 2 + 1)
    }
}

Complexity : O(n radius)

If you take a close look of second algorithm, you will notify that we always sum up the same values over and over.

sum = p0 + p1 + p2 + p3 + p4 + p5 + p6
sum = p1 + p2 + p3 + p4 + p5 + p6 + p7
sum = p2 + p3 + p4 + p5 + p6 + p7 + p8

This can be simplified:

sum = p0 + p1 + p2 + p3 + p4 + p5 + p6
sum = sum - p0 + p7
sum = sum - p1 + p8

Pseudocode:

for (y = 0 to h-1) { 
    sum = 0;
    for (r = -radius to radius) {
        sum += input[x + r, y]
    }

    for (x = 0 to w-1) { //horizontal   
        tmp[x, y] = sum / (radius * 2 + 1)
        sum -= input[x - radius, y]
        sum += input[x + radius + 1, y]
    }
}   
for (x = 0 to w-1) {        
    sum = 0;
    for (r = -radius to radius) {
        sum += tmp[x, y + r]
    }

    for (y = 0 to h-1) { //vertical         
        output[x, y] = sum / (radius * 2 + 1)
        sum -= tmp[x, y - radius]
        sum += tmp[x, y + radius + 1]
    }
}

Complexity : O(n)

回复收藏 0 原文

小草泠泠 2024-07-11 16:17:01

数学运动员可能知道这一点，但对其他人来说..

由于高斯的良好数学特性，您可以通过首先在图像的每一行上运行一维高斯模糊，然后运行一维模糊来快速模糊 2D 图像在每一列上。

回复收藏 0 原文

最快的高斯模糊实现

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（16）

关于作者

相关话题

热门标签

推荐作者

束缚ｍ

alipaysp_VP2a8Q4rgx

α

一口甜

厌味

转身泪倾城

友情链接

最快的高斯模糊实现

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（16）

关于作者

相关话题

热门标签

推荐作者

束缚ｍ

alipaysp_VP2a8Q4rgx

α

一口甜

厌味

转身泪倾城

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。