使用 OpenCV/EmguCV 提高人脸检测性能
我目前正在使用 EmguCV(OpenCV C# 包装器)成功地实时检测人脸(网络摄像头)。我的帧速率约为 7 FPS。
现在我正在寻求提高性能(并节省 CPU 周期),并且我正在寻找选项,以下是我的想法:
检测面部,拾取面部特征并尝试在图像中找到这些特征下一帧(使用SURF算法),所以这变成了“人脸检测+跟踪”。如果没有找到,则再次使用人脸检测。
检测人脸,在下一帧中,尝试在之前人脸所在的 ROI 中检测人脸(即在图像的较小部分中寻找人脸)。如果没有找到人脸,则再次尝试在整幅图像中寻找。
小想法:如果 2-3 帧没有检测到人脸,并且图像中没有运动,则在检测到运动之前不要尝试检测更多人脸。
您对我有什么建议吗?
谢谢。
I am currently using EmguCV (OpenCV C# wrapper) sucessfully to detect faces in real-time (webcam). I get around 7 FPS.
Now I'm looking to improve the performances (and save CPU cycles), and I'm looking for options, here are my ideas:
Detect the face, pick up features of the face and try to find those features in the next frames (using SURF algorithm), so this becomes a "face detection + tracking". If not found, use face detection again.
Detect the face, in the next frame, try to detect the face in a ROI where the face previously was (i.e. look for the face in a smaller part of the image). If the face is not found, try looking for it in the whole image again.
Side idea: if no face detected for 2-3 frames, and no movement in the image, don't try to detect anymore faces until movement is detected.
Do you have any suggestions for me ?
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您介绍的所有解决方案似乎都是明智且合理的。但是,如果您使用 Haar 进行人脸检测,您可能会尝试创建阶段较少的级联。尽管建议使用 20 个阶段进行人脸检测,但 10-15 个阶段可能就足够了。这将显着提高性能。有关创建自己的级联的信息,请参阅教程:OpenCV haartraining(使用级联增强分类器进行快速目标检测)基于类似 Haar 的特征)。
再次强调,使用 SURF 是个好主意。您还可以尝试PN学习:通过结构约束引导二元分类器。 YouTube 上有介绍此方法的有趣视频,尝试查找它们。
All the solutions you introduced seem to be smart and reasonable. However, if you use Haar for face detection you might try to create a cascade with less stages. Although 20 stages are recommended for face detection, 10-15 might be enough. That would noticeably improve performance. Information on creating own cascades can be found at Tutorial: OpenCV haartraining (Rapid Object Detection With A Cascade of Boosted Classifiers Based on Haar-like Features).
Again, using SURF is a good idea. You can also try P-N learning: Bootstrapping binary classifiers by structural constraints. There are interesting videos on YouTube presenting this method, try to find them.
对于 SURF 算法,您可以尝试,但我不确定它是否提供了面部的相关特征,可能是眼睛周围,或者如果您距离很近且皮肤不规则,或者如果分辨率足够的话,可能在头发中就足够了。此外,SURF 并不是真的很快,如果你想节省 CPU 时间,我只是避免做更多的计算。
roi是个好主意,你可以通过camshift算法来选择它,它不会节省很多CPU,但你可以尝试,因为camshift是一个非常轻量级的算法。同样,我不确定它是否真正相关,但你在第二篇文章中得到了好主意:最小化搜索区域...
侧面的想法对我来说似乎很好,你可以尝试检测运动(例如全局运动),如果没有那么多,那么就不要尝试再次检测您已经检测到的内容...您可以尝试使用运动模板来执行此操作,因为您知道来自meanshift或面部检测的轮廓...
一个非常简单、轻量级但不稳健的模板与帧 n-1 和帧 n 匹配还可以为您提供一个系数来衡量这两个帧之间的相似性,您可以说低于某个阈值您会激活人脸检测。 ...为什么不呢?如果 C# 包装器具有 matchTemplate() 等效函数,则需要 5 分钟才能实现...
如果我有更好(更深入)的想法,我会回到这里,但现在,我刚刚下班回来,这很难想更多......
朱利安,
For the SURF algorithm, you could try, but i am not sure that it provides relevant features on a face, maybe around the eyes, or if you are close and have skin irregularities, or again maybe in the hair if the resolution is enough. Moreover, SURF is not really really fast, and i would just avoiding doing more calculous if you want to save CPU time.
The roi is a good idea, you would choose it by doing a camshift algorithm, it won't save a lot of CPU, but you could try as camshift is a very lightweight algorithm. Again i am not sure it will be really relevant, but you got the good idea in your second post : minimize the zone where to search...
The side idea seems quite good to me, you could try to detect motion (global motion for instance), if there's not so much, then don't try to detect again what you already detected ... You could try doing that with motion templates as you know the silouhette from meanshift or face detection...
A very simple, lightweight but un-robust template matching with the frame n-1 and frame n could give you aswell a coefficient that measures a sort of similarity between these two frames, you can say that below a certain threshold you activate face detection.... why not ? It should take 5min to implement if the C# wrapper has the matchTemplate() equivalent function...
I'll come back here if i have better (deeper) ideas, but for now, i've just come back from work and it's hard to think more...
Julien,
这不是一个完美的答案,只是一个建议。
在计算机科学学士学位最后一个学期的数字图像处理课程中,我了解了位位切片,以及仅具有 MSB 平面信息的图像如何提供几乎 70% 的有用图像信息。因此,您将使用几乎原始图像,但大小仅为原始图像的八分之一。
因此,虽然我没有在自己的项目中实现它,但我想知道它是否可以加快人脸检测速度。因为后面的眼睛检测、瞳孔和眼角检测也占用了大量的计算时间,使得整个程序变慢。
This is not a perfect answer, but just a suggestion.
In my digital image processing classes in my last semester of B.Tech in CS, i learned about bit place slicing, and how the image with just its MSB plane information gives almost 70% of the useful image information. So, you'll be working with almost the original image but with just one-eighth the size of the original.
So although i haven't implemented it in my own project, i was wondering about it, to speed up face detection. Because later on, eye detection, pupil and eye corner detection also take up a lot of computation time and make the whole program slow.