如果外部和内部参数已知,则从 2D 图像像素获取 3D 坐标
我正在使用 tsai 算法进行相机校准。我获得了内在矩阵和外在矩阵,但如何根据该信息重建 3D 坐标?
我现在有2种方法来找到X,Y,Z:
我可以使用高斯消元法来找到X,Y,Z,W然后点将是X / W,Y/W ,Z/W为齐次系统。
我可以使用OpenCV文档方法:< /p>
据我所知,
u
、v
、R
、t
,我可以计算X, Y,Z
。
然而,这两种方法最终都会得到不正确的不同结果。
我做错了什么?
I am doing camera calibration from tsai algo. I got intrinsics and extrinsics matrices, but how can I reconstruct the 3D coordinates from that information?
I have now 2 ways to find X,Y,Z:
I can use Gaussian Elimination for find X,Y,Z,W and then points will be X/W , Y/W , Z/W as homogeneous system.
I can use the OpenCV documentation approach:
As I know
u
,v
,R
,t
, I can computeX,Y,Z
.
However both methods end up in different results that are not correct.
What am I'm doing wrong?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果你有外部参数,那么你就得到了一切。这意味着您可以从外部获得单应性(也称为 CameraPose)。位姿是一个 3x4 矩阵,单应性是一个 3x3 矩阵,H 定义为
K 是相机固有矩阵,r1 和 r2 为旋转矩阵的前两列,R; t 是平移向量。
然后将所有内容除以 t3 进行归一化。
r3 列会发生什么情况,我们不使用它吗?不,因为它是多余的,因为它是姿势的前 2 列的叉积。
现在您已经有了单应性,请投影点。你的 2d 点是 x,y。添加它们 az=1,所以它们现在是 3d 的。将它们投影如下:
If you got extrinsic parameters then you got everything. That means that you can have Homography from the extrinsics (also called CameraPose). Pose is a 3x4 matrix, homography is a 3x3 matrix, H defined as
with K being the camera intrinsic matrix, r1 and r2 being the first two columns of the rotation matrix, R; t is the translation vector.
Then normalize dividing everything by t3.
What happens to column r3, don't we use it? No, because it is redundant as it is the cross-product of the 2 first columns of pose.
Now that you have homography, project the points. Your 2d points are x,y. Add them a z=1, so they are now 3d. Project them as follows:
正如上面评论中明确指出的,将 2D 图像坐标投影到 3D“相机空间”本质上需要弥补 z 坐标,因为该信息在图像中完全丢失。一种解决方案是在投影之前为每个 2D 图像空间点分配一个虚拟值 (z = 1),如 Jav_Rock 所回答的。
这种虚拟解决方案的一个有趣的替代方案是训练一个模型,在重新投影到 3D 相机空间之前预测每个点的深度。我尝试了这种方法,并使用在 KITTI 数据集中的 3D 边界框上训练的 Pytorch CNN 获得了高度成功。很乐意提供代码,但在这里发布会有点冗长。
As nicely stated in the comments above, projecting 2D image coordinates into 3D "camera space" inherently requires making up the z coordinates, as this information is totally lost in the image. One solution is to assign a dummy value (z = 1) to each of the 2D image space points before projection as answered by Jav_Rock.
One interesting alternative to this dummy solution is to train a model to predict the depth of each point prior to reprojection into 3D camera-space. I tried this method and had a high degree of success using a Pytorch CNN trained on 3D bounding boxes from the KITTI dataset. Would be happy to provide code but it'd be a bit lengthy for posting here.