使用图像关键点提取对象特征的想法
如果您帮助我使用关键点创建简单对象的特征向量,我将不胜感激。目前,我使用 ETH-80 数据集,对象有几乎蓝色的背景,图片是从不同的角度拍摄的。像这样:
创建特征向量后,我想用该向量训练神经网络,并使用该神经网络来识别对象的输入图像。我不想让它变得复杂,输入图像将像火车图像一样简单。 我之前问过类似的问题,有人建议使用 20x20 邻域的平均值关键点。我尝试了一下,似乎它不适用于 ETH-80 图像,因为图像的视图不同。这就是为什么我问另一个问题。
I'll be appreciated if you help me to create a feature vector of an simple object using keypoints. For now, I use ETH-80 dataset, objects have an almost blue background and pictures are took from different views. Like this:
After creating a feature vector, I want to train a neural network with this vector and use that neural network to recognize an input image of an object. I don't want make it complex, input images will be as simple as train images.
I asked similar questions before, some one suggested using average value of 20x20 neighborhood of keypoints. I tried it, It seems it's not working with ETH-80 images, because of different views of images. It's why I asked another question.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
冲浪或筛选。寻找兴趣点检测器。 MATLAB SIFT 实现是免费提供的。
更新: 本地物体识别尺度不变特征
SURF or SIFT. Look for interest point detectors. A MATLAB SIFT implementation is freely available.
Update: Object Recognition from Local Scale-Invariant Features
SIFT 和 SURF 特征由两部分组成:检测器和描述符。检测器在某个 n 维空间(SIFT 为 4D)中找到点,描述符用于稳健地描述所述点的周围环境。后者越来越多地用于图像分类和识别,通常称为“词袋”或“视觉词”方法。以最简单的形式,可以从所有图像的所有描述符收集所有数据并对它们进行聚类,例如使用 k 均值。然后,每个原始图像都具有有助于多个聚类的描述符。这些簇的质心,即视觉词,可以用作图像的新描述符。 VLfeat 网站包含此方法的一个很好的演示,对 caltech 101 数据集进行分类:
http://www.vlfeat.org/applications/apps.html#apps.caltech-101
SIFT and SURF features consist of two parts, the detector and the descriptor. The detector finds the point in some n-dimensional space (4D for SIFT), the descriptor is used to robustly describe the surroundings of said points. The latter is increasingly used for image categorization and identification in what is commonly known as the "bag of word" or "visual words" approach. In the most simple form, one can collect all data from all descriptors from all images and cluster them, for example using k-means. Every original image then has descriptors that contribute to a number of clusters. The centroids of these clusters, i.e. the visual words, can be used as a new descriptor for the image. The VLfeat website contains a nice demo of this approach, classifying the caltech 101 dataset:
http://www.vlfeat.org/applications/apps.html#apps.caltech-101