使用多个校准相机唯一地识别场景中的对象
我有一个带有多个摄像机的设置,它们都指向同一场景。所有相机都校准到相同的世界坐标系(即:我知道所有相机相对于世界坐标系原点的位置)。 在相机拍摄的每张图像中,我将检测场景中的对象(分割)。我的目标是计算场景中的所有对象,并且我不想对一个对象进行两次计数,因为它会出现在多个图像中。这意味着如果我在图像 A 中检测到一个对象并且在图像 B 中检测到一个对象,那么我应该能够确认这是不是同一个对象。应该可以做到这一点使用我通过校准相机获得的 3D 信息。我在想以下内容:
体素雕刻。我用检测到的物体从所有图像中创建轮廓。我应用体素雕刻,然后计算我拥有的簇状体素的唯一数量。这是场景中唯一对象的数量?
我还考虑过,例如,获取对象的中心,然后将光线从它投射到 3D 世界中,对于每个摄像机,然后检测线条是否相互交叉(来自不同的摄像机)。但这很容易出错,因为每个图像中的对象的大小/形状可能略有不同,并且中心可能会偏离。另外,相机的位置并不是 100% 准确,这会导致光线关闭。
解决这个问题的好方法是什么?
I have a setup with multiple cameras that all point towards the same scene. All cameras are calibrated to the same world coordinate system (i.e.: I know the location of all the cameras with respect to the origin of the world coordinate system).
In each image from the cameras, I will detect objects in the scene (segmentation). My goal is to count all objects in the scene and I do not want to count an object twice as it will appear in multiple images. This means that if I detect an object in image A and I detect an object in image B, then I should be able to confirm that this is the same object or not. It should be possible to do this using the 3D info I have due to my calibrated cameras. I was thinking of the following:
Voxel carving. I create silhouettes out of all images with the detected objects. I apply voxel carving and then count the unique number of clustered voxels I have. this will be the number of unique objects in the scene?
I also thought about for example taking the center of the object and then casting a ray from it into the 3D world, this for each camera and then detecting if the lines cross each other (from different cameras). But this would be very error-prone as the objects might have a slightly different size/shape in each image and the center might be off. Also, the locations of the cameras are not 100% exact, which will result in the ray being off.
What would be a good approach to tackle this issue?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您是否只知道“对象”,但不知道类别或身份,并且除了边界框或掩模之外没有其他图像信息?那么就不可能了。
考虑进行彻底的简化,因为我现在不想绘制视锥体
黑盒子是真实的物体。左轴和下轴是它们的投影。考虑到这些预测,深灰色框也将是框的有效假设。
你无法判断盒子到底在哪里。
如果您有某种东西来消除不同对象检测的歧义,那么是的,这是可能的。
其中一种非常精细的变体是块匹配以获得视差图(立体视觉)。这是“运动结构”的一个特例。
如果你的“物体”有纹理,并且你愿意计算点云,那么你就可以做到。
Do you only know "object", but no categories or identities, and no other image information other than a bounding box or mask? Then it's impossible.
Consider a stark simplification because I don't feel like drawing viewing frustrums right now
Black boxes are real objects. Left and bottom axis are projections of those. Dark gray boxes would also be valid hypotheses of boxes, given these projections.
You can't tell where the boxes really are.
If you had something to disambiguate different object detections, then yes, it would be possible.
One very fine-detail variant of that would be block matching to obtain disparity maps (stereo vision). That's a special case of "Structure from Motion".
If your "objects" have texture, and you are willing to calculate point clouds, then you can do it.