从大型数据集中识别被遮挡的纹理块
我手头的假设任务是能够从上面一张几何形状不失真的饮料罐图片中给出其分类(例如饮料的品牌和名称)。不需要分段。分类函数的输入只是罐头任一侧面的一个视图,一个视图。数据集应该很大,大约有 2000 种不同的饮料。所有罐头的尺寸都相同。为了进行训练,每个罐子都会旋转数百次以涵盖几乎任何角度。
有什么想法解决这个问题的最佳方法吗?对我来说,这似乎是一个纹理识别问题,其中物体本身的形状无关。分类也应该很快,因此排除了模板匹配。如果有人能为我指出正确的方向,那将是向前迈出的一大步。我想出的想法似乎都不适合这项任务。局部特征(SIFT/SURF)等?太一般了。一个品牌可以在其生产的不同饮料上使用相同的徽标。神经网络?罐子的不同侧面看起来可能非常不同,如果它们都映射到相同的标签,这将会扰乱训练。词袋?用于训练 SVM 的 HOG/颜色直方图等?也许是一些我不知道、我不知道的完全不同的东西?
My hypothetical task at hand is to be able to from a single from above picture of a geometrically undistorted beverage can, give its classification (e.g. brand & name of beverage). No segmentation is needed. The input to the classification function is just a view, ONE view, of the can from any of its sides. The dataset should be large, around 2000 different kinds of beverages. The cans all have the same size. For training, each can is rotated a few hundred times to include almost any angle.
Any ideas what would be the best way to approach this? To me it seems like a texture recognition problem, where the shape of the object itself is irrelevant. Classification should also be fast, so template matching is ruled out. If someone can just point out the right direction for me it would be a huge step forwards. No ideas I come up with seems really fitting for the task. Local features (SIFT/SURF) etc? Too general. A brand can have the same logo on different beverages they produce. Neural nets? The can can look very different on different sides, which will mess up the training if they all map to the same label. Bag of words? HOGs/colour histograms etc for training an SVM? Something completely different that I don't know that I don't know about perhaps?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
一种好的方法是对罐头的形状进行建模,这样您就可以将罐头中的纹理和标签映射到平面矩形。由此,您可以使用低分辨率版本或基于高斯金字塔的模板匹配进行模板匹配,以进行快速匹配。
第二种选择是提取此“平面化”图像的 SIFT 或 SURF 关键点,并尝试在训练集中找到相应的点。尽管相同的徽标或文本可能出现在多个不同的罐子上,但您可以使用关键点的位置来区分罐子。
One good approach would be model the shape of the can, so you can map the texture and the labels in the can to a planar rectangle. From this you could do template matching using low resolution versions or gaussian pyramidal based template matching to make a fast match.
Second option would be to extract SIFT or SURF keypoints of this 'planarized' image and try to find corresponding points in the training set. Although same logos or texts might appear on several different cans, you can use the locations of the keypoints to differentiate between the cans.