人脸检测的人工智能技术
任何人都可以在人脸检测中使用所有不同的技术吗?神经网络、支持向量机、特征脸等技术。
还有哪些技术?
Can anyone all the different techniques used in face detection? Techniques like neural networks, support vector machines, eigenfaces, etc.
What others are there?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我要谈论的技术更多的是一种面向机器学习的方法;在我看来,这非常令人着迷,尽管不是最近的:它在 Viola 和 Jones 的文章“鲁棒实时人脸检测”中进行了描述。我在一个大学项目中使用了 OpenCV 实现。
它基于类似 haar 的特征,包括图像矩形区域内像素强度的加法和减法。这可以使用称为积分图像的过程非常快速地完成,该过程也存在 GPGPU 实现(有时称为“前缀扫描”)。在线性时间内计算积分图像后,可以在恒定时间内评估任何类似 haar 的特征。特征基本上是一个函数,它采用图像 S 的 24x24 子窗口并计算值特征(S);三元组(特征,阈值,极性)被称为弱分类器,因为
极性*特征(S)<极性 * 阈值
在某些图像上适用,而在其他图像上适用;弱分类器的性能预计仅比随机猜测好一点(例如,它的准确度应至少为 51-52%)。
极性为-1 或+1。
特征空间很大(约 160'000 个特征),但有限。
尽管阈值原则上可以是任意数字,但从对训练集的简单考虑来看,如果有 N 个示例,则只需检查每个极性和每个特征的 N + 1 个阈值即可找到满足条件的阈值最好的准确度。因此,可以通过穷举搜索三元组空间来找到最佳弱分类器。
基本上,可以通过使用称为“自适应增强”或 AdaBoost 的算法迭代选择最佳的弱分类器来组装强分类器;在每次迭代中,上一次迭代中错误分类的示例的权重更大。强分类器的特点是有自己的全局阈值,由 AdaBoost 计算。
几个强分类器被组合起来作为注意力级联的阶段;注意力级联背后的想法是,在第一阶段丢弃明显不是面部的 24x24 子窗口;一个强分类器通常只包含几个弱分类器(如 30 或 40),因此计算速度非常快。每个阶段都应该有很高的召回率,而误报率不是很重要。如果有 10 个阶段,每个阶段的召回率为 0.99,误报率为 0.3,那么最终的级联将具有 0.9 的召回率和极低的误报率。因此,通常会调整强分类器以提高召回率和误报率。调优主要涉及降低 AdaBoost 计算的全局阈值。
到达级联末尾的子窗口被视为一个面。
必须测试初始图像中的几个子窗口,最终重叠,最终在重新缩放图像之后。
the technique I'm going to talk about is more a machine learning oriented approach; in my opinion is quite fascinating, though not very recent: it was described in the article "Robust Real-Time Face Detection" by Viola and Jones. I used the OpenCV implementation for an university project.
It is based on haar-like features, which consists in additions and subtractions of pixel intensities within rectangular regions of the image. This can be done very fast using a procedure called integral image, for which also GPGPU implementations exist (sometimes are called "prefix scan"). After computing integral image in linear time, any haar-like feature can be evaluated in constant time. A feature is basically a function that takes a 24x24 sub-window of the image S and computes a value feature(S); a triplet (feature, threshold, polarity) is called a weak classifier, because
polarity * feature(S) < polarity * threshold
holds true on certain images and false on others; a weak classifier is expected to perform just a little better than random guess (for instance, it should have an accuracy of at least 51-52%).
Polarity is either -1 or +1.
Feature space is big (~160'000 features), but finite.
Despite threshold could in principle be any number, from simple considerations on the training set it turns out that if there are N examples, only N + 1 threshold for each polarity and for each feature have to be examined in order to find the one that holds the best accuracy. The best weak classifier can thus be found by exhaustively searching the triplets space.
Basically, a strong classifier can be assembled by iteratively choosing the best possible weak classifier, using an algorithm called "adaptive boosting", or AdaBoost; at each iteration, examples which were misclassified in the previous iteration are weighed more. The strong classifier is characterized by its own global threshold, computed by AdaBoost.
Several strong classifiers are combined as stages in an attentional cascade; the idea behind the attentional cascade is that 24x24 sub-windows that are obviously not faces are discarded in the first stages; a strong classifier usually contains only a few weak classifiers (like 30 or 40), hence is very fast to compute. Each stage should have a very high recall, while false positive rate is not very important. if there are 10 stages each with 0.99 recall and 0.3 false positive rate, the final cascade will have 0.9 recall and extremely low false positive rate. For this reason, strong classifier are usually tuned in order to increase recall and false positive rate. Tuning basically involves reducing the global threshold computed by AdaBoost.
A sub-window that makes it way to the end of the cascade is considered a face.
Several sub-window in the initial image, eventually overlapping, eventually after rescaling the image, must be tested.
解决包括人脸检测在内的广泛视觉问题的一种新兴但相当有效的方法是使用 分层时间记忆 (HTM),由开发的概念/技术努门塔。
非常宽松地说,这是一种类似神经网络的方法。这种类型的网络具有树形形状,其中每个级别的节点数量都显着减少。 HTM 对新皮质的一些结构和算法特性进行了建模。 [可能]与新皮质不同的是,在每个节点级别实现的分类算法使用贝叶斯算法。 HTM模型基于大脑功能的记忆预测理论,并且严重依赖于输入的时间性质;这可以解释它处理视觉问题的能力,因为这些问题通常是暂时的(或可以这样做),并且还需要容忍噪声和“模糊性”。
虽然 Numemta 生产视觉套件和演示应用程序已有一段时间了,但 Vitamin D 最近生产了 - 我认为 - 第一个HTM技术的商业应用至少在视觉应用领域。
An emerging but rather effective approach to the broad class of vision problems, including face detection, is the use of Hierarchical Temporal Memory (HTM), a concept/technology developed by Numenta.
Very loosely speaking, this is a neuralnetwork-like approach. This type of network has a tree shape where the number of nodes decreases significantly at each level. HTM models some of the structural and algorithmic properties of the neocortex. In [possible] departure with the neocortex the classification algorithm implemented at the level of each node uses a Bayesian algorithm. HTM model is based on the memory-prediction theory of brain function and relies heavily on the the temporal nature of inputs; this may explain its ability to deal with vision problem, as these are typically temporal (or can be made so) and also require tolerance for noise and "fuzziness".
While Numemta has produced vision kits and demo applications for some time, Vitamin D recently produced -I think- the first commercial application of HTM technology at least in the domain of vision applications.
如果您不仅需要理论知识,而且确实想要进行人脸检测,那么我建议您找到已经实施的解决方案。
有很多针对不同语言的经过测试的库,并且它们被广泛用于此目的。查看此 SO 线程以获取更多信息:人脸识别库。
If you need it not just as theoretical stuff but you really want to do face detection then I recommend you to find already implemented solutions.
There are plenty tested libraries for different languages and they are widely used for this purpose. Look at this SO thread for more information: Face recognition library.