当前位置：文江博客话题详情

使用 Accelerate Framework 和 vDSP 进行 iPhone 图像处理

发布于 2024-11-05 14:32:24 字数 4582 浏览 12 评论 0 原文

更新：请参阅下面的附加问题以及更多代码；

我正在尝试编写一个用于模糊图像的类别。我的起点是Jeff LaMarche 的示例。虽然这个（在其他人建议的修复之后）工作正常，但对于我的要求来说太慢了一个数量级 - 在 3GS 上可能需要 3 秒才能完成像样的模糊，我希望将其降低到 0.5 以下秒全屏（越快越好）。

他提到 Accelerate 框架是一种性能增强，所以我花了最后一天的时间研究这个，特别是 vDSP_f3x3，根据 Apple 文档

通过执行以下操作来过滤图像 3x3 的二维卷积核心;单精度。

完美 - 我有一个合适的滤波器矩阵，并且我有一个图像......但这就是我被难住的地方。

vDSP_f3x3 假设图像数据是 (float *) 但我的图像来自；

srcData = (unsigned char *)CGBitmapContextGetData (context);

并且上下文来自 CGBitmapContextCreate 和 kCGImageAlphaPremultipliedFirst，所以我的 srcData 实际上是 ARGB，每个组件有 8 位。

我怀疑我真正需要的是带有浮动组件的上下文，但是根据到这里的 Quartz 文档，kCGBitMapFloatComponents 仅适用于 Mac OS，不适用于 iOS :-(

有没有一种非常快速的方法使用加速框架将我拥有的整数组件转换为 vDSP_f3x3 需要的浮点组件？我的意思是我可以自己做，但是当我这样做时，然后进行卷积，然后转换回来，我怀疑我会使其比现在更慢，因为我可能会进行卷积，

也许我已经这样做了。错误的方法吗？

有人可以给我一些使用 vDSP 在 iphone 上进行图像处理的提示吗？我能找到的文档非常面向参考，对于新手来说不太友好。

如果有人有参考的话。对于真正快速的模糊（和高质量，而不是降低分辨率，然后重新缩放我见过的东西，看起来很裤子），那就太棒了！

编辑：

谢谢@Jason。我已经这样做了，它几乎可以工作，但现在我的问题是，尽管图像确实模糊，但在每次调用时它都会向左移动 1 个像素。它似乎也使图像变成黑白，但也可能是别的东西。

这段代码中是否有任何明显不正确的内容？我还没有优化它，它有点粗糙，但希望卷积代码足够清晰。

CGImageRef CreateCGImageByBlurringImage(CGImageRef inImage, NSUInteger pixelRadius, NSUInteger gaussFactor)
{
unsigned char *srcData, *finalData;

CGContextRef context = CreateARGBBitmapContext(inImage);
if (context == NULL) 
    return NULL;

size_t width = CGBitmapContextGetWidth(context);
size_t height = CGBitmapContextGetHeight(context);
size_t bpr = CGBitmapContextGetBytesPerRow(context);

int componentsPerPixel = 4; // ARGB

CGRect rect = {{0,0},{width,height}}; 
CGContextDrawImage(context, rect, inImage); 

// Now we can get a pointer to the image data associated with the bitmap
// context.

srcData = (unsigned char *)CGBitmapContextGetData (context);

if (srcData != NULL)
{

    size_t dataSize = bpr * height;
    finalData = malloc(dataSize);
    memcpy(finalData, srcData, dataSize);

    //Generate Gaussian kernel

    float *kernel;  

    // Limit the pixelRadius

    pixelRadius = MIN(MAX(1,pixelRadius), 248);
    int kernelSize = pixelRadius * 2 + 1;

    kernel = malloc(kernelSize * sizeof *kernel);

    int gauss_sum =0;

    for (int i = 0; i < pixelRadius; i++)
    {
        kernel[i] = 1 + (gaussFactor*i);
        kernel[kernelSize - (i + 1)] = 1 + (gaussFactor * i);
        gauss_sum += (kernel[i] + kernel[kernelSize - (i + 1)]);
    }

    kernel[(kernelSize - 1)/2] = 1 + (gaussFactor*pixelRadius);

    gauss_sum += kernel[(kernelSize-1)/2];

    // Scale the kernel

    for (int i=0; i<kernelSize; ++i) {
        kernel[i] = kernel[i]/gauss_sum;
    }

    float * srcAsFloat,* resultAsFloat;

    srcAsFloat = malloc(width*height*sizeof(float)*componentsPerPixel);
    resultAsFloat = malloc(width*height*sizeof(float)*componentsPerPixel);

   // Convert uint source ARGB to floats

    vDSP_vfltu8(srcData,1,srcAsFloat,1,width*height*componentsPerPixel);

    // Convolve (hence the -1) with the kernel

    vDSP_conv(srcAsFloat, 1, &kernel[kernelSize-1],-1, resultAsFloat, 1, width*height*componentsPerPixel, kernelSize);

    // Copy the floats back to ints

    vDSP_vfixu8(resultAsFloat, 1, finalData, 1, width*height*componentsPerPixel);

    free(resultAsFloat);
    free(srcAsFloat);

}

size_t bitmapByteCount = bpr * height;

CGDataProviderRef dataProvider = CGDataProviderCreateWithData(NULL, finalData, bitmapByteCount, &providerRelease);

CGImageRef cgImage = CGImageCreate(width, height, CGBitmapContextGetBitsPerComponent(context),
                                   CGBitmapContextGetBitsPerPixel(context), CGBitmapContextGetBytesPerRow(context), CGBitmapContextGetColorSpace(context), CGBitmapContextGetBitmapInfo(context), 
                                   dataProvider, NULL, true, kCGRenderingIntentDefault);

CGDataProviderRelease(dataProvider);
CGContextRelease(context); 


return cgImage;
}

我应该补充一点，如果我注释掉 vDSP_conv 行，并将下面的行更改为；

       vDSP_vfixu8(srcAsFloat, 1, finalData, 1, width*height*componentsPerPixel);

然后，正如预期的那样，我的结果是原始源的克隆。彩色且不左移。对我来说，这意味着卷积出了问题，但我看不出哪里出问题了:-(

想法：实际上考虑到这一点，在我看来，卷积需要知道输入像素是 ARGB 格式，否则卷积会将这些值相乘，但不知道它们的含义（即它将乘以 R * B 等）。需要比我这里的天真的版本更多......

最终想法：我认为左移是过滤器的自然结果，我需要查看图像尺寸并可能将其填充......所以我认为考虑到我所提供的内容，代码实际上工作正常。

原文

UPDATE: Please see additional question below with more code;

I am trying to code a category for blurring an image. My starting point is Jeff LaMarche's sample here. Whilst this (after the fixes suggested by others) works fine, it is an order of magnitude too slow for my requirements - on a 3GS it takes maybe 3 seconds to do a decent blur and I'd like to get this down to under 0.5 sec for a full screen (faster is better).

He mentions the Accelerate framework as a performance enhancement so I've spent the last day looking at this, and in particular vDSP_f3x3 which according to the Apple Documenation

Filters an image by performing a
two-dimensional convolution with a 3x3
kernel; single precision.

Perfect - I have a suitable filter matrix, and I have an image ... but this is where I get stumped.

vDSP_f3x3 assumes image data is (float *) but my image comes from;

srcData = (unsigned char *)CGBitmapContextGetData (context);

and the context comes from CGBitmapContextCreate with kCGImageAlphaPremultipliedFirst, so my srcData is really ARGB with 8 bits per component.

I suspect what I really need is a context with float components, but according to the Quartz documentation here, kCGBitMapFloatComponents is only available on Mac OS and not iOS :-(

Is there a really fast way using the accelerate framework of converting the integer components I have into the float components that vDSP_f3x3 needs? I mean I could do it myself, but by the time I do that, then the convolution, and then convert back, I suspect I'll have made it even slower than it is now since I might as well convolve as I go.

Maybe I have the wrong approach?

Does anyone have some tips for me having done some image processing on the iphone using vDSP? The documentation I can find is very reference orientated and not very newbie friendly when it comes to this sort of thing.

If anyone has a reference for really fast blurring (and high quality, not the reduce resolution and then rescale stuff I've seen and looks pants) that would be fab!

EDIT:

Thanks @Jason. I've done this and it is almost working, but now my problem is that although the image does blur, on every invocation it shifts left 1 pixel. It also seems to make the image black and white, but that could be something else.

Is there anything in this code that leaps out as obviously incorrect? I haven't optimised it yet and it's a bit rough, but hopefully the convolution code is clear enough.

CGImageRef CreateCGImageByBlurringImage(CGImageRef inImage, NSUInteger pixelRadius, NSUInteger gaussFactor)
{
unsigned char *srcData, *finalData;

CGContextRef context = CreateARGBBitmapContext(inImage);
if (context == NULL) 
    return NULL;

size_t width = CGBitmapContextGetWidth(context);
size_t height = CGBitmapContextGetHeight(context);
size_t bpr = CGBitmapContextGetBytesPerRow(context);

int componentsPerPixel = 4; // ARGB

CGRect rect = {{0,0},{width,height}}; 
CGContextDrawImage(context, rect, inImage); 

// Now we can get a pointer to the image data associated with the bitmap
// context.

srcData = (unsigned char *)CGBitmapContextGetData (context);

if (srcData != NULL)
{

    size_t dataSize = bpr * height;
    finalData = malloc(dataSize);
    memcpy(finalData, srcData, dataSize);

    //Generate Gaussian kernel

    float *kernel;  

    // Limit the pixelRadius

    pixelRadius = MIN(MAX(1,pixelRadius), 248);
    int kernelSize = pixelRadius * 2 + 1;

    kernel = malloc(kernelSize * sizeof *kernel);

    int gauss_sum =0;

    for (int i = 0; i < pixelRadius; i++)
    {
        kernel[i] = 1 + (gaussFactor*i);
        kernel[kernelSize - (i + 1)] = 1 + (gaussFactor * i);
        gauss_sum += (kernel[i] + kernel[kernelSize - (i + 1)]);
    }

    kernel[(kernelSize - 1)/2] = 1 + (gaussFactor*pixelRadius);

    gauss_sum += kernel[(kernelSize-1)/2];

    // Scale the kernel

    for (int i=0; i<kernelSize; ++i) {
        kernel[i] = kernel[i]/gauss_sum;
    }

    float * srcAsFloat,* resultAsFloat;

    srcAsFloat = malloc(width*height*sizeof(float)*componentsPerPixel);
    resultAsFloat = malloc(width*height*sizeof(float)*componentsPerPixel);

   // Convert uint source ARGB to floats

    vDSP_vfltu8(srcData,1,srcAsFloat,1,width*height*componentsPerPixel);

    // Convolve (hence the -1) with the kernel

    vDSP_conv(srcAsFloat, 1, &kernel[kernelSize-1],-1, resultAsFloat, 1, width*height*componentsPerPixel, kernelSize);

    // Copy the floats back to ints

    vDSP_vfixu8(resultAsFloat, 1, finalData, 1, width*height*componentsPerPixel);

    free(resultAsFloat);
    free(srcAsFloat);

}

size_t bitmapByteCount = bpr * height;

CGDataProviderRef dataProvider = CGDataProviderCreateWithData(NULL, finalData, bitmapByteCount, &providerRelease);

CGImageRef cgImage = CGImageCreate(width, height, CGBitmapContextGetBitsPerComponent(context),
                                   CGBitmapContextGetBitsPerPixel(context), CGBitmapContextGetBytesPerRow(context), CGBitmapContextGetColorSpace(context), CGBitmapContextGetBitmapInfo(context), 
                                   dataProvider, NULL, true, kCGRenderingIntentDefault);

CGDataProviderRelease(dataProvider);
CGContextRelease(context); 


return cgImage;
}

I should add that if I comment out the vDSP_conv line, and change the line following to;

       vDSP_vfixu8(srcAsFloat, 1, finalData, 1, width*height*componentsPerPixel);

Then as expected, my result is a clone of the original source. In colour and not shifted left. This implies to me that it IS the convolution that is going wrong, but I can't see where :-(

THOUGHT: Actually thinking about this, it seems to me that the convolve needs to know the input pixels are in ARGB format as otherwise the convolution will be multiplying the values together with no knowledge about their meaning (ie it will multiple R * B etc). This would explain why I get a B&W result I think, but not the shift. Again, I think there might need to be more to it than my naive version here ...

FINAL THOUGHT: I think the shifting left is a natural result of the filter and I need to look at the image dimensions and possibly pad it out ... so I think the code is actually working OK given what I've fed it.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

清欢 2024-11-12 14:32:24

虽然 Accelerate 框架比简单的串行代码更快，但您可能永远不会通过使用它模糊图像来获得最佳性能。

我的建议是使用 OpenGL ES 2.0 着色器（适用于支持此 API 的设备）来进行两遍框模糊。根据我的基准测试，GPU 处理此类图像处理操作的速度是 iPhone 4 上 CPU 的 14-28 倍，而苹果报告的 Accelerate 框架在最佳情况下可能是 4.5 倍。

这个问题，以及GPU Pro 2 书（可以找到示例代码此处）。通过将图像放置在纹理中，然后读取像素之间的值，GPU 上的双线性过滤可以免费为您提供一些模糊效果，然后可以将其与一些快速查找和平均操作结合起来。

如果您需要一个启动项目将图像输入 GPU 进行处理，您可以使用文章中的示例应用程序此处。该示例应用程序将 AVFoundation 视频帧作为纹理传递到处理着色器中，但您可以修改它以发送特定的图像数据并运行模糊操作。您应该能够使用我的 glReadPixels() 代码来检索模糊图像以供以后使用。

由于我最初写了这个答案，所以我创建了一个开源图像和视频处理框架，用于在 GPU 上执行此类操作。该框架内有几种不同的模糊类型，所有这些都可以非常快速地应用于图像或实时视频。 GPUImageGaussianBlurFilter 应用标准的 9 击高斯模糊，在 iPhone 4 上运行 640x480 视频帧的时间为 16 毫秒。GPUImageFastBlurFilter 是改进的 9 击高斯模糊，使用硬件过滤，运行时间为 2.0 毫秒。相同的视频帧。同样，有一个 GPUImageBoxBlurFilter 使用 5 像素框，在相同硬件上运行相同图像的时间为 1.9 毫秒。我还有中值和双边模糊滤镜，尽管它们需要一些性能调整。

在我的基准测试中，加速并没有接近这些速度，特别是在过滤实时视频时。

While the Accelerate framework will be faster than simple serial code, you'll probably never see the greatest performance by blurring an image using it.

My suggestion would be to use an OpenGL ES 2.0 shader (for devices that support this API) to do a two-pass box blur. Based on my benchmarks, the GPU can handle these kinds of image manipulation operations at 14-28X the speed of the CPU on an iPhone 4, versus the maybe 4.5X that Apple reports for the Accelerate framework in the best cases.

Some code for this is described in this question, as well as in the "Post-Processing Effects on Mobile Devices" chapter in the GPU Pro 2 book (for which the sample code can be found here). By placing your image in a texture, then reading values in between pixels, bilinear filtering on the GPU gives you some blurring for free, which can then be combined with a few fast lookups and averaging operations.

If you need a starting project to feed images into the GPU for processing, you might be able to use my sample application from the article here. That sample application passes AVFoundation video frames as textures into a processing shader, but you can modify it to send in your particular image data and run your blur operation. You should be able to use my glReadPixels() code to then retrieve the blurred image for later use.

Since I originally wrote this answer, I've created an open source image and video processing framework for doing these kinds of operations on the GPU. The framework has several different blur types within it, all of which can be applied very quickly to images or live video. The GPUImageGaussianBlurFilter, which applies a standard 9-hit Gaussian blur, runs in 16 ms for a 640x480 frame of video on the iPhone 4. The GPUImageFastBlurFilter is a modified 9-hit Gaussian blur that uses hardware filtering, and it runs in 2.0 ms for that same video frame. Likewise, there's a GPUImageBoxBlurFilter that uses a 5-pixel box and runs in 1.9 ms for the same image on the same hardware. I also have median and bilateral blur filters, although they need a little performance tuning.

In my benchmarks, Accelerate doesn't come close to these kinds of speeds, especially when it comes to filtering live video.

回复收藏 0 原文

鞋纸虽美，但不合脚ㄋ〞 2024-11-12 14:32:24

您肯定希望转换为 float 来执行过滤，因为这是加速函数所采用的，而且如果您想要进行任何额外的处理，它会更加灵活。二维卷积（滤波器）的计算时间很可能会使转换所花费的时间相形见绌。看一下函数 vDSP_vfltu8()，它将快速将 uint8 数据转换为浮点数。 vDSP_vfixu8() 会将其转换回 uint8。

要执行模糊，您可能需要比 3x3 更大的卷积内核，因此我建议使用函数 vDSP_imgfir() ，该函数将采用任何内核大小。

对编辑的响应：

一些事情：

您需要对每个颜色通道独立执行过滤。也就是说，您需要将 R、G 和 B 分量拆分为各自的图像（float 类型），对它们进行过滤，然后将它们重新复用为 ARGB 图像。
vDSP_conv 计算一维卷积，但要模糊图像，您确实需要二维卷积。 vDSP_imgfir 本质上是计算二维卷积。为此，您还需要一个二维内核。您可以查找二维高斯函数的公式来生成内核。
注意：如果您的内核是可分离的（高斯是可分离的），您实际上可以使用一维卷积执行二维卷积。我不会详细说明这意味着什么，但您本质上必须在列上执行一维卷积，然后在结果行上执行一维卷积。除非您知道自己在做什么，否则我不会走这条路。

回复收藏 0 原文

掩于岁月 2024-11-12 14:32:24

因此，在 Jason 的出色帮助下回答我自己的问题，此处提供了最终的工作代码片段以供参考，以防对其他人有帮助。正如您所看到的，策略是将源 ARGB（为了性能而忽略 A，并假设数据是 XRGB）拆分为 3 个浮点数组，应用过滤器，然后重新复用结果。

它确实很不错——但是速度慢得要命。我使用 16x16 的大内核来获得严重模糊，在我的 3GS 上，全屏图像大约需要 5 秒，因此这不是一个可行的解决方案。

下一步是寻找替代方案……但感谢您让我启动并运行。

    vDSP_vfltu8(srcData+1,4,srcAsFloatR,1,pixels);
    vDSP_vfltu8(srcData+2,4,srcAsFloatG,1,pixels);
    vDSP_vfltu8(srcData+3,4,srcAsFloatB,1,pixels);

    // Now apply the filter to each of the components. For a gaussian blur with a 16x16 kernel
    // this turns out to be really slow!

    vDSP_imgfir (srcAsFloatR, height, width, kernel,resultAsFloatR, frows, fcols);
    vDSP_imgfir (srcAsFloatG, height, width, kernel,resultAsFloatG, frows, fcols);
    vDSP_imgfir (srcAsFloatB, height, width, kernel,resultAsFloatB, frows, fcols);

    // Now re-multiplex the final image from the processed float data

    vDSP_vfixu8(resultAsFloatR, 1, finalData+1, 4, pixels);
    vDSP_vfixu8(resultAsFloatG, 1, finalData+2, 4, pixels);
    vDSP_vfixu8(resultAsFloatB, 1, finalData+3, 4, pixels);

So answering my own question with Jason's excellent help, the final working code fragment is provided here for reference in case it helps anyone else. As you can see, the strategy is to split the source ARGB (I'm ignoring A for performance and assuming the data is XRGB) into 3 float arrays, apply the filter and then re-multiplex the result.

It works a treat - but it is achingly slow. I'm using a large kernel of 16x16 to get a heavy blur and on my 3GS it takes about 5 seconds for a full screen image so that's not going to be a viable solution.

Next step is to look at alternatives ... but thanks for getting me up and running.

    vDSP_vfltu8(srcData+1,4,srcAsFloatR,1,pixels);
    vDSP_vfltu8(srcData+2,4,srcAsFloatG,1,pixels);
    vDSP_vfltu8(srcData+3,4,srcAsFloatB,1,pixels);

    // Now apply the filter to each of the components. For a gaussian blur with a 16x16 kernel
    // this turns out to be really slow!

    vDSP_imgfir (srcAsFloatR, height, width, kernel,resultAsFloatR, frows, fcols);
    vDSP_imgfir (srcAsFloatG, height, width, kernel,resultAsFloatG, frows, fcols);
    vDSP_imgfir (srcAsFloatB, height, width, kernel,resultAsFloatB, frows, fcols);

    // Now re-multiplex the final image from the processed float data

    vDSP_vfixu8(resultAsFloatR, 1, finalData+1, 4, pixels);
    vDSP_vfixu8(resultAsFloatG, 1, finalData+2, 4, pixels);
    vDSP_vfixu8(resultAsFloatB, 1, finalData+3, 4, pixels);

回复收藏 0 原文