是什么让对象表示和识别变得困难？

发布于 2024-10-18 09:48:54 字数 126 浏览 12 评论 0原文

直观上，似乎给定几乎任何物体的不同角度的十几个左右的 2D 图像，应该很容易构建该物体的 3D 表示。随后，以这种方式获得的 3D 表示库可用于识别新的 2D 图像。

有哪些类似的文献，为什么还没有产生强大的物体识别能力？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

因为看清所以看轻 2024-10-25 09:48:54

正是你的“直觉”一词给你带来了麻烦。你的大脑并不是被设计来非常擅长某些任务，比如瞬间乘以数千个数字。然而，就原始计算能力而言，你的大脑使得最快的计算机看起来也不过是眨眼而已（神经响应时间只有大约 10 毫秒，但所有这些 10^14 个左右的神经元全部并行工作，完全击败了任何现代机器）。只是你的大脑被设计用来解决计算上更加复杂的问题，比如识别图片中的物体、解析声音数据以及在背景噪音中挑选出单独的说话者。学习对数以万计的物体进行分类和处理。

你的大脑被设计用来真正出色地完成那些计算强度极高的事情，对一个人来说，这些事情似乎是“直觉的”。它的设计目的并不是很好，但它看起来“不直观”或很困难。但是强大的对象识别需要原始计算（因为对象的种类很多，其中许多确实有子对象，并且有多种分类和非刚性形式，例如“裤子”，“水”，“狗”）远远超出了完成人们认为只有计算机才能完成的事情所需的能力。像使用“常识”来解决日常问题这样的事情对于一个人来说同样是微不足道的，但计算上却极其复杂。

回复收藏 0 原文

伪心 2024-10-25 09:48:54

（有很多但是）

你想做的事情确实是可能的，但是对于 3D 重建

：对于除了最简单的形状之外的任何东西，你需要的不仅仅是几十个图像。
您正在重建的形状需要具有许多可识别的特征，这些特征从不同角度看起来足够相似，以便您可以将它们匹配。
整个图像集上的光照需要相当恒定，否则
即使对于特征非常丰富的对象（即颜色和形状有很多变化），阴影也会让您感到困惑（或者您需要更多图像）任何匹配对的 3D 重建精度如果您不完全了解用于拍摄每张照片的相机的参数（位置、视角方向和张角），那么功能的丰富性将会很糟糕。

这些都是可以解决的问题，所以假设您已经解决了，现在您有了一张来自要与 3D 形状匹配的对象的新图片。

您当然可以尝试找到适合新图片的形状的 2D 投影，但搜索空间巨大。使用您为初始 3D 重建构建的特征查找和匹配系统来直接将新图片与现有图片集进行匹配，并以这种方式找到它适合对象的位置，可能会更容易、更快捷。

因此，一旦解决了创建初始 3D 重建的问题，第二步就基本上完成了。

Photosynth 是这两个步骤的一个出色示例。浏览该网站，尝试找到他们在那里的一些参考资料。

至于最后一步，强大的物体识别，想象一下搜索空间！要实现强大的对象识别，除了对要识别的对象有良好的表示之外，还需要一种搜索已知对象空间的好方法，以及表示新对象的好方法（对象的图像）。在这种情况下）在那个空间。这是我几乎一无所知的事情。

为了匹配不同 2D 图像中的同一对象，可以使用 SIFT 功能。但我认为这不能很好地转化为 3D。

What you want to do is indeed possible, but (there are quite a few buts)

for the 3D reconstruction:

For anything but the simplest shapes you need more than just a few dozen images.
The shape you are reconstructing needs to have a lot of recognizable features that look similar enough from different angles so that you can match them.
Lighting needs to be fairly constant over your entire set of images, otherwise shadows will throw you off (or you need even more images)
even with very feature rich objects (i.e. lot of variation in colour and shape) 3D reconstruction accuracy from any matched pair of features is going to be terrible if you do not have full knowledge of the parameters (position, view direction and opening angle) of the camera used to take each picture.

These are all problems can be solved, so suppose you did, and now you have a new picture from the object that you want to match to your 3D shape.

You could of course try to find a 2D projection of your shape that fit the new picture, but the search space there is enormous. It would probably be a lot easier and faster to use the feature finding and matching system you built for the initial 3D reconstruction to directly match the new picture to the existing set, and find where it fits on the object that way.

So once you've solved the problem of creating the initial 3D reconstruction your second step is basically done as well.

Photosynth is a brilliant example of these two steps. Browse the site, try to find some of the references they have there.

As for your final step, strong object recognition, just imagine the search space! What you need for strong object recognition, apart from a good representation of the objects you want to recognize, is a good way to search the space of objects you know, and a good way to represent your new object (the image of an object in this case) in that space. This is something I know nearly nothing about.

For just matching the same object in different 2D images there are SIFT features. But I don't think this translates well to 3D.

回复收藏 0 原文