二进制图像上的快速像素计数 - ARM neon 内在函数 - iOS Dev
有人可以告诉我一个快速函数来计算二进制图像中白色像素的数量。我需要它用于 iOS 应用程序开发。我正在直接处理定义的图像的内存,因为
bool *imageData = (bool *) malloc(noOfPixels * sizeof(bool));
我正在实现该函数,
int whiteCount = 0;
for (int q=i; q<i+windowHeight; q++)
{
for (int w=j; w<j+windowWidth; w++)
{
if (imageData[q*W + w] == 1)
whiteCount++;
}
}
这显然是最慢的函数。我听说 iOS 上有 ARM Neon 内在函数 可用于在 1 个周期内执行多个操作。也许这就是要走的路?
问题是我不太熟悉,目前没有足够的时间学习汇编语言。因此,如果任何人都可以发布针对上述问题的 Neon 内在函数代码或 C/C++ 中的任何其他快速实现,那就太好了。
我能在网上找到的 neon 内在函数中唯一的代码是 rgb 到 grey 的代码 http:// Computer-vision-talks.com/2011/02/a-very-fast-bgra-to-grayscale-conversion-on-iphone/
Can someone tell me a fast function to count the number of white pixels in a binary image. I need it for iOS app dev. I am working directly on the memory of the image defined as
bool *imageData = (bool *) malloc(noOfPixels * sizeof(bool));
I am implementing the function
int whiteCount = 0;
for (int q=i; q<i+windowHeight; q++)
{
for (int w=j; w<j+windowWidth; w++)
{
if (imageData[q*W + w] == 1)
whiteCount++;
}
}
This is obviously the slowest function possible. I heard that ARM Neon intrinsics on the iOS
can be used to make several operations in 1 cycle. Maybe thats the way to go ??
The problem is that I am not very familiar and don't have enough time to learn assembly language at the moment. So it would be great if anyone can post a Neon intrinsics code for the problem mentioned above or any other fast implementation in C/C++.
The only code in neon intrinsics that I am able to find online is the code for rgb to gray
http://computer-vision-talks.com/2011/02/a-very-fast-bgra-to-grayscale-conversion-on-iphone/
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
首先,您可以通过分解乘法并摆脱分支来稍微加快原始代码的速度:(
这假设
imageData[]
是真正的二进制,即每个元素只能是 0 或 1 。)这是一个简单的 NEON 实现:(
这假设
imageData[]
是真正的二进制,imageWidth <= 2^19
和sizeof(bool) == 1
。)更新了
unsigned char
的版本,白色值为 255,黑色值为 0:(假设
imageData[]
的值为 255白色,0 表示黑色,imageWidth <= 2^19
。)请注意,上述所有代码均未经测试,可能需要进一步的工作。
Firstly you can speed up the original code a little by factoring out the multiply and getting rid of the branch:
(This assumes that
imageData[]
is truly binary, i.e. each element can only ever be 0 or 1.)Here is a simple NEON implementation:
(This assumes that
imageData[]
is truly binary,imageWidth <= 2^19
, andsizeof(bool) == 1
.)Updated version for
unsigned char
and values of 255 for white, 0 for black:(This assumes that
imageData[]
is has values of 255 for white and 0 for black, andimageWidth <= 2^19
.)Note that all the above code is untested and may need some further work.
http://gcc.gnu.org/onlinedocs/gcc/ARM-NEON -Intrinsics.html
第 6.55.3.6 节
向量化算法将进行比较并将它们放入结构中,但您仍然需要遍历结构的每个元素并确定是否是否为零。
该循环当前运行的速度有多快以及您需要它运行多快?另请记住,NEON 将在与浮点单元相同的寄存器中工作,因此此处使用 NEON 可能会强制进行 FPU 上下文切换。
http://gcc.gnu.org/onlinedocs/gcc/ARM-NEON-Intrinsics.html
Section 6.55.3.6
The vectorized algorithm will do the comparisons and put them in a structure for you, but you'd still need to go through each element of the structure and determine if it's a zero or not.
How fast does that loop currently run and how fast do you need it to run? Also remember that NEON will work in the same registers as the floating point unit, so using NEON here may force an FPU context switch.