如何使用 3x3 单应性将立方体增强到特定位置
我能够通过计算同一场景的不同图像之间的 3x3 单应性来跟踪它们之间的 4 个坐标。通过这样做,我可以将其他 2D 图像叠加到这些坐标上。我想知道是否可以使用这个单应性将立方体增强到这个位置,而不是使用 opengl?我认为 3x3 矩阵没有提供足够的信息,但如果我知道相机校准矩阵,我可以获得足够的信息来创建模型视图矩阵来执行此操作吗?
感谢您提供的任何帮助。
I am able to track 4 coordinates over different images of the same scene by calculating a 3x3 homography between them. Doing this I can overlay other 2D images onto these coordinates. I am wondering if I could use this homography to augment a cube onto this position instead using opengl? I think the 3x3 matrix doesn't give enough information but if I know the camera calibration matrix can I get enough to create a model view matrix to do this?
Thank you for any help you can give.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果您有相机校准矩阵(固有参数)和单应性,因为单应性(同一平面物体的两个视图之间)定义为:
H = K[R|T]
其中 K 是3x3 校准矩阵、R(3x3 旋转矩阵)和 T(3x1 平移向量)是视图变换(从对象坐标到相机坐标)。关于如何从 H 计算 R 和 T 有很多话要说。一种方法是计算直接解,另一种方法是使用某种非线性最小化技术来计算 R 和 T。显然,后一种方法更好,因为它将给出更好的近似解。前者只是开始进行增强现实的一种方法;):
让我们看看如何在使用直接方法时导出 R 和 T。如果 h1、h2 和 h3 是
H 的列向量,用 K、R 和 T 定义为:
H = K [r1 r2 t]
(请记住,我们正在谈论 z=0 的点),其中 r1 是第一列R 的向量,r2 第二个,t 是平移向量。那么:
r1 = l1 * (K^-1) h1
r2 = l2 * (K^-1) h2
r3 = r1 x r2 (r1 和 r2 之间的叉积)
t = l3 * (K^-1) h3
其中 l1,l2,l3 是比例因子(实数值):
l1 = 1 / 范数((K^-1)*h1)
l2 = 1 / 范数((K^-1)*h2)
l3 = ( l1+l2)/2
请记住,应使用非线性最小化方法来细化此解决方案(例如,您可以使用此解决方案作为起点)。您还可以使用一些畸变模型来恢复镜头畸变,但这一步是不必要的(即使没有它,您也会得到很好的结果)。
如果您想使用最小化方法来计算 R 和 T 的更好近似值,有很多不同的方法。我建议您阅读
Lu、Hager 的论文“从视频图像快速且全局收敛的姿势估计”,
该论文提出了适合您的目的的最佳算法之一。
If you have the camera calibration matrix (intrinsic parameters) and the homography, since the homography (between two view of the same planar object) is defined as:
H = K[R|T]
where K is the 3x3 calibration matrix, R (3x3 rotation matrix) and T (3x1 translation vector) is the view transform (from object coordinates to camera coordinates). There is a lot to say about how to compute R and T from H. One way is to compute a direct solution, the other way is to use some non-linear minimization technique to compute R and T. Obviously, the latter method is better, since it will give the better approximate solution. The former is just a way to start doing augmented reality ;):
Let'see how to derive R and T for when using a direct method. If h1,h2 and h3 are the
column vectors of H, define in terms of K,R and T as:
H = K [r1 r2 t]
(remember that we are speaking of points with z=0)where r1 is the first column vector of R, r2 the second and t is the translation vector. Then:
r1 = l1 * (K^-1) h1
r2 = l2 * (K^-1) h2
r3 = r1 x r2
(cross product between r1 and r2)t = l3 * (K^-1) h3
where l1,l2,l3 are scaling factors (real values):
l1 = 1 / norm((K^-1)*h1)
l2 = 1 / norm((K^-1)*h2)
l3 = (l1+l2)/2
Keep in mind that this solution should be refined using a non linear minimization method (for example, you can use this solution as a starting point). You can also use some distorsion model to recover from lens distorsions, but this step is unnecessary (you will get good results even without it).
If you want to use a minimization method to compute a better approximation to R and T, there are a lot of different ways. I suggest to you to read the paper
"Fast and globally convergent pose estimation from video images", Lu, Hager
which presents one of the best algorithms out there for your purpose.