特征检测和目标检测之间的区别
我知道最常见的对象检测涉及 Haar 级联,并且有许多用于特征检测的技术,例如 SIFT、SURF、STAR、ORB 等......但如果我的最终目标是识别对象,那么两种方法最终都不会给出我也是同样的结果吗?我理解在简单的形状和图案上使用特征技术,但对于复杂的对象,这些特征算法似乎也有效。
我不需要知道它们在功能上的差异,但是否拥有其中之一就足以排除另一个。如果我使用 Haar 级联,我需要费心使用 SIFT 吗?何苦呢?
谢谢
编辑:出于我的目的,我想在广泛的事物上实现对象识别。这意味着任何形状与杯子相似的杯子都将被选为班级杯子的一部分。但我还想指定实例,这意味着 NYC 杯将被选为 NYC 杯实例。
I know that most common object detection involves Haar cascades and that there are many techniques for feature detection such as SIFT, SURF, STAR, ORB, etc... but if my end goal is to recognizes objects doesn't both ways end up giving me the same result? I understand using feature techniques on simple shapes and patterns but for complex objects these feature algorithms seem to work as well.
I don't need to know the difference in how they function but whether or not having one of them is enough to exclude the other. If I use Haar cascading, do I need to bother with SIFT? Why bother?
thanks
EDIT: for my purposes I want to implement object recognition on a broad class of things. Meaning that any cups that are similarly shaped as cups will be picked up as part of class cups. But I also want to specify instances, meaning a NYC cup will be picked up as an instance NYC cup.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
目标检测通常包括两个步骤:特征检测和分类。
在特征检测步骤中,收集待检测对象的相关特征。
这些特征被输入到第二步,分类。 (甚至可以使用Haar级联
据我所知,用于特征检测。)分类涉及算法
比如神经网络、K近邻等等。分类的目标是找到
判断检测到的特征是否与待检测物体的特征相对应
会的。分类通常属于机器学习领域。
例如,面部检测就是对象检测的一个示例。
编辑(2018 年 7 月 9 日):
随着深度学习的出现,具有多个隐藏层的神经网络已得到广泛使用,使得人们相对容易看出特征检测和对象检测之间的差异。深度学习神经网络由两个或多个隐藏层组成,每个隐藏层专门用于当前任务的特定部分。对于从图像中检测对象的神经网络,前面的层将低级特征排列到多维空间中(特征检测),后面的层根据这些特征在多维空间中的位置对对象进行分类多维空间(对象检测)。 Wolfram 博客文章“启动 Wolfram 神经网络存储库”对此类神经网络进行了很好的介绍。
Object detection usually consists of two steps: feature detection and classification.
In the feature detection step, the relevant features of the object to be detected are gathered.
These features are input to the second step, classification. (Even Haar cascading can be used
for feature detection, to my knowledge.) Classification involves algorithms
such as neural networks, K-nearest neighbor, and so on. The goal of classification is to find
out whether the detected features correspond to features that the object to be detected
would have. Classification generally belongs to the realm of machine learning.
Face detection, for example, is an example of object detection.
EDIT (Jul. 9, 2018):
With the advent of deep learning, neural networks with multiple hidden layers have come into wide use, making it relatively easy to see the difference between feature detection and object detection. A deep learning neural network consists of two or more hidden layers, each of which is specialized for a specific part of the task at hand. For neural networks that detect objects from an image, the earlier layers arrange low-level features into a many-dimensional space (feature detection), and the later layers classify objects according to where those features are found in that many-dimensional space (object detection). A nice introduction to neural networks of this kind is found in the Wolfram Blog article "Launching the Wolfram Neural Net Repository".
通常,对象是特征的集合。功能往往是非常低级的原始事物。物体意味着将对场景的理解提升到一个新的水平。
特征可能是角、边缘等,而对象可能是书、盒子、桌子。这些对象都由多个特征组成,其中一些特征可能在任何给定场景中可见。
Normally objects are collections of features. A feature tends to be a very low-level primitive thing. An object implies moving the understanding of the scene to the next level up.
A feature might be something like a corner, an edge etc. whereas an object might be something like a book, a box, a desk. These objects are all composed of multiple features, some of which may be visible in any given scene.
不变性、速度、存储;几个原因,我可以在我的脑海中思考。另一种方法是保留完整图像,然后检查给定图像是否与数据库中的玻璃图像相似。但是,如果您有玻璃的压缩表示,则它将需要更少的计算(因此更快),需要更少的存储,并且这些功能会告诉您图像之间的不变性。
您提到的两种方法本质上是相同的,只是略有不同。对于 Haar,您可以检测 Haar 特征,然后增强它们以增加置信度。 Boosting只不过是一个元分类器,它巧妙地选择将哪些所有 Harr 特征包含在最终的元分类中,以便它可以给出更好的估计。另一种方法也或多或少地做到了这一点,只是你有更“复杂”的功能。主要区别在于,您不直接使用 boosting。您倾向于使用某种分类或聚类,例如 MoG(高斯混合)或 K-Mean 或其他一些启发式方法来对数据进行聚类。您的集群很大程度上取决于您的功能和应用程序。
什么对你的情况有效:这是一个棘手的问题。如果我是你,我会尝试使用 Haar,如果它不起作用,我会尝试其他方法(obs :>)。请注意,您可能想要分割图像并在周围提供某种边界以供其检测眼镜。
Invariance, speed, storage; few reasons, I can think on top of my head. The other method to do would be to keep the complete image and then check whether the given image is similar to glass images you have in your database. But if you have a compressed representation of the glass, it will need lesser computation (thus faster), will need lesser storage and the features tells you the invariance across images.
Both the methods you mentioned are essentially the same with slight differences. In case of Haar, you detect the Haar features then you boost them to increase the confidence. Boosting is nothing but a meta-classifier, which smartly chooses which all Harr features to be included in your final meta-classification, so that it can give a better estimate. The other method, also more or less does this, except that you have more "sophisticated" features. The main difference is that, you don't use boosting directly. You tend to use some sort of classification or clustering, like MoG (Mixture of Gaussian) or K-Mean or some other heuristic to cluster your data. Your clustering largely depends on your features and application.
What will work in your case : that is a tough question. If I were you, I would play around with Haar and if it doesn't work, would try the other method (obs :>). Be aware that you might want to segment the image and give some sort of a boundary around for it to detect glasses.