无符号字符图像上的快速高斯模糊 - ARM Neon Intrinsics - iOS Dev

发布于 2025-01-02 19:59:59 字数 952 浏览 1 评论 0 原文

有人可以告诉我一个使用 5x5 掩模查找图像高斯模糊的快速函数吗?我需要它用于 iOS 应用程序开发。我直接处理定义为的图像的内存,

unsigned char *image_sqr_Baseaaddr = (unsigned char *) malloc(noOfPixels);

for (row = 2; row < H-2; row++) 
{
    for (col = 2; col < W-2; col++) 
    {
        newPixel = 0;
        for (rowOffset=-2; rowOffset<=2; rowOffset++)
        {
            for (colOffset=-2; colOffset<=2; colOffset++) 
            {
                rowTotal = row + rowOffset;
                colTotal = col + colOffset;
                iOffset = (unsigned long)(rowTotal*W + colTotal);
                newPixel += (*(imgData + iOffset)) * gaussianMask[2 + rowOffset][2 + colOffset];
            }
        }
        i = (unsigned long)(row*W + col);
        *(imgData + i) = newPixel / 159;
    }
}

这显然是最慢的函数。我听说 iOS 上的 ARM Neon 内在函数可用于在 1 个周期内执行多个操作。也许这就是要走的路?

问题是我不太熟悉,目前没有足够的时间学习汇编语言。因此,如果任何人都可以发布针对上述问题的 Neon 内在函数代码或 C/C++ 中的任何其他快速实现,那就太好了。

Can someone tell me a fast function to find the gaussian blur of an image using a 5x5 mask. I need it for iOS app dev. I am working directly on the memory of the image defined as

unsigned char *image_sqr_Baseaaddr = (unsigned char *) malloc(noOfPixels);

for (row = 2; row < H-2; row++) 
{
    for (col = 2; col < W-2; col++) 
    {
        newPixel = 0;
        for (rowOffset=-2; rowOffset<=2; rowOffset++)
        {
            for (colOffset=-2; colOffset<=2; colOffset++) 
            {
                rowTotal = row + rowOffset;
                colTotal = col + colOffset;
                iOffset = (unsigned long)(rowTotal*W + colTotal);
                newPixel += (*(imgData + iOffset)) * gaussianMask[2 + rowOffset][2 + colOffset];
            }
        }
        i = (unsigned long)(row*W + col);
        *(imgData + i) = newPixel / 159;
    }
}

This is obviously the slowest function possible. I heard that ARM Neon intrinsics on the iOS can be used to make several operations in 1 cycle. Maybe that's the way to go ?

The problem is that I am not very familiar and don't have enough time to learn assembly language at the moment. So it would be great if anyone can post a Neon intrinsics code for the problem mentioned above or any other fast implementation in C/C++.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

梦与时光遇 2025-01-09 20:00:00

在使用 NEON 进行 SIMD 优化之前,您应该首先改进标量实现。目前代码的最大问题是,它的实现就像是一个不可分离的滤波器,而高斯内核是可分离的。通过切换到可分离的实现,您可以将操作数量从 N^2 减少到 2N,在您的 5x5 内核的情况下,这将从 25 次乘加减少到 10 次,即只需很少的努力即可将速度提高 2.5 倍。

充分优化的标量实现可能会满足您的需求,而无需求助于 SIMD。如果没有,那么您至少可以将这些标量优化转移到矢量化实现中。


http://en.wikipedia.org/wiki/Gaussian_blur

http://blogs.mathworks.com/steve/2006/11/28/separable-volving-part-2/

Before you get into SIMD optimisation with NEON you should first improve your scalar implementation. The biggest problem with your code as it stands is that it has been implemented as if it were a non-separable filter, whereas a Gaussian kernel is separable. By switching to a separable implementation you reduce the number of operations form N^2 to 2N, which in your case of a 5x5 kernel would be a reduction from 25 multiply-adds to 10, i.e. a 2.5x speed up for very little effort.

It may be that a sufficiently optimised scalar implementation will meet your needs without the need to resort to SIMD. If not then you can at least carry these scalar optimisations over into a vectorized implementation.


http://en.wikipedia.org/wiki/Gaussian_blur

http://blogs.mathworks.com/steve/2006/11/28/separable-convolution-part-2/

长不大的小祸害 2025-01-09 20:00:00
  1. 分离你的内核,正如 Paul R 所描述的那样。
  2. 不要重新发明轮子。使用 vImage,它是 Accelerate 框架的一部分,并为您实现矢量化、多线程卷积。具体来说,您似乎想要函数vImageConvolve_Planar8
  1. Separate your kernel, as described by Paul R.
  2. Don't re-invent the wheel. Use vImage, which is part of the Accelerate framework, and implements a vectorized, multi-threaded convolution for you. Specifically, it seems like you want the function vImageConvolve_Planar8.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文