基于4个共面点的单应矩阵计算相机位姿

发布于 2024-12-27 20:53:13 字数 1353 浏览 3 评论 0原文

我在视频(或图像)中有 4 个共面点,代表一个四边形(不一定是正方形或矩形),我希望能够在它们的顶部显示一个虚拟立方体,其中立方体的角正好位于角上视频四边形的。

由于这些点是共面的,我可以计算单位正方形的角(即 [0,0] [0,1] [1,0] [1,1])和四边形的视频坐标之间的单应性。

根据这个单应性,我应该能够计算出正确的相机姿态,即 [R|t],其中 R 是 3x3 旋转矩阵,t 是 3x1 平移向量,以便虚拟立方体位于视频四边形上。

我已经阅读了许多解决方案(其中一些是关于 SO 的)并尝试实现它们,但它们似乎只在某些“简单”情况下有效(例如当视频四边形是正方形时),但在大多数情况下不起作用。

以下是我尝试过的方法(大多数都是基于相同的原理,只是翻译的计算略有不同)。令 K 为相机的内函数矩阵,H 为单应矩阵。我们计算:

A = K-1 * H

设 a1,a2,a3 为 A 的列向量,r1,r2,r3 为旋转矩阵 R 的列向量。

r1 = a1 / ||a1||
r2 = a2 / ||a2||
r3 = r1 x r2
t = a3 / sqrt(||a1||*||a2||)

问题是这在大多数情况下不起作用。为了检查我的结果,我将R和t与OpenCV的solvePnP方法获得的结果进行了比较(使用以下3D点[0,0,0] [0,1,0] [1,0,0] [1,1 ,0])。

由于我以相同的方式显示立方体,我注意到在每种情况下,solvePnP 都会提供正确的结果,而从单应性获得的姿势大多是错误的。

理论上,因为我的点是共面的,所以可以从单应性计算姿势,但我找不到从 H 计算姿势的正确方法。

对我做错了什么有什么见解吗?

尝试@Jav_Rock的方法后进行编辑

Hi Jav_Rock,非常感谢您的回答,我尝试了您的方法(以及许多其他方法),这似乎或多或少都可以。 尽管如此,在基于 4 个共面点计算姿势时,我仍然遇到一些问题。为了检查结果,我将结果与solvePnP的结果进行比较(由于迭代重投影误差最小化方法,结果会好得多)。

下面是一个示例:

cube

  • 黄色立方体:解决 PNP
  • 黑色立方体:Jav_Rock 的技术
  • 青色(和紫色)立方体:一些其他技术给出了完全相同的结果

正如您所看到的,黑色立方体或多或少都可以,但看起来比例不太好,尽管向量看起来是正交的。

EDIT2:我在计算 v3 后对其进行了归一化(为了强制正交性),它似乎也解决了一些问题。

I have 4 coplanar points in a video (or image) representing a quad (not necessarily a square or rectangle) and I would like to be able to display a virtual cube on top of them where the corners of the cube stand exactly on the corners of the video quad.

Since the points are coplanar I can compute the homography between the corners of a unit square (i.e. [0,0] [0,1] [1,0] [1,1]) and the video coordinates of the quad.

From this homography I should be able to compute a correct camera pose, i.e. [R|t] where R is a 3x3 rotation matrix and t is a 3x1 translation vector so that the virtual cube lies on the video quad.

I have read many solutions (some of them on SO) and tried implementing them but they seem to work only in some "simple" cases (like when the video quad is a square) but do not work in most cases.

Here are the methods I tried (most of them are based on the same principles, only the computation of the translation are slightly different). Let K be the intrinsics matrix from the camera and H be the homography. We compute:

A = K-1 * H

Let a1,a2,a3 be the column vectors of A and r1,r2,r3 the column vectors of the rotation matrix R.

r1 = a1 / ||a1||
r2 = a2 / ||a2||
r3 = r1 x r2
t = a3 / sqrt(||a1||*||a2||)

The issue is that this does not work in most cases. In order to check my results, I compared R and t with those obtained by OpenCV's solvePnP method (using the following 3D points [0,0,0] [0,1,0] [1,0,0] [1,1,0]).

Since I display the cube in the same way, I noticed that in every case solvePnP provides correct results, while the pose obtained from the homography is mostly wrong.

In theory since my points are coplanar, it is possible to compute the pose from a homography but I couldn't find the correct way to compute the pose from H.

Any insights on what I am doing wrong?

Edit after trying @Jav_Rock's method

Hi Jav_Rock, thanks very much for your answer, I tried your approach (and many others as well) which seems to be more or less OK.
Nevertheless I still happen to have some issues when computing the pose based on 4 coplanar point. In order to check the results I compare with results from solvePnP (which will be much better due to the iterative reprojection error minimization approach).

Here is an example:

cube

  • Yellow cube: Solve PNP
  • Black Cube: Jav_Rock's technique
  • Cyan (and Purple) cube(s): some other techniques given the exact same results

As you can see, the black cube is more or less OK but doesn't seem well proportioned, although the vectors seem orthonormal.

EDIT2: I normalized v3 after computing it (in order to enforce orthonormality) and it seems to solve some problems as well.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

忆悲凉 2025-01-03 20:53:13

如果你有单应性,你可以用这样的方法计算相机姿势:

void cameraPoseFromHomography(const Mat& H, Mat& pose)
{
    pose = Mat::eye(3, 4, CV_32FC1);      // 3x4 matrix, the camera pose
    float norm1 = (float)norm(H.col(0));  
    float norm2 = (float)norm(H.col(1));  
    float tnorm = (norm1 + norm2) / 2.0f; // Normalization value

    Mat p1 = H.col(0);       // Pointer to first column of H
    Mat p2 = pose.col(0);    // Pointer to first column of pose (empty)

    cv::normalize(p1, p2);   // Normalize the rotation, and copies the column to pose

    p1 = H.col(1);           // Pointer to second column of H
    p2 = pose.col(1);        // Pointer to second column of pose (empty)

    cv::normalize(p1, p2);   // Normalize the rotation and copies the column to pose

    p1 = pose.col(0);
    p2 = pose.col(1);

    Mat p3 = p1.cross(p2);   // Computes the cross-product of p1 and p2
    Mat c2 = pose.col(2);    // Pointer to third column of pose
    p3.copyTo(c2);       // Third column is the crossproduct of columns one and two

    pose.col(3) = H.col(2) / tnorm;  //vector t [R|t] is the last column of pose
}

这个方法对我有效。祝你好运。

If you have your Homography, you can calculate the camera pose with something like this:

void cameraPoseFromHomography(const Mat& H, Mat& pose)
{
    pose = Mat::eye(3, 4, CV_32FC1);      // 3x4 matrix, the camera pose
    float norm1 = (float)norm(H.col(0));  
    float norm2 = (float)norm(H.col(1));  
    float tnorm = (norm1 + norm2) / 2.0f; // Normalization value

    Mat p1 = H.col(0);       // Pointer to first column of H
    Mat p2 = pose.col(0);    // Pointer to first column of pose (empty)

    cv::normalize(p1, p2);   // Normalize the rotation, and copies the column to pose

    p1 = H.col(1);           // Pointer to second column of H
    p2 = pose.col(1);        // Pointer to second column of pose (empty)

    cv::normalize(p1, p2);   // Normalize the rotation and copies the column to pose

    p1 = pose.col(0);
    p2 = pose.col(1);

    Mat p3 = p1.cross(p2);   // Computes the cross-product of p1 and p2
    Mat c2 = pose.col(2);    // Pointer to third column of pose
    p3.copyTo(c2);       // Third column is the crossproduct of columns one and two

    pose.col(3) = H.col(2) / tnorm;  //vector t [R|t] is the last column of pose
}

This method works form me. Good luck.

尘世孤行 2025-01-03 20:53:13

Jav_Rock 提出的答案并没有为三维空间中的相机姿势提供有效的解决方案。

为了估计单应性引起的树维变换和旋转,存在多种方法。 其中一个提供了分解单应性的封闭公式,但它们非常复杂。而且,解决方案从来都不是唯一的。

幸运的是,OpenCV 3 已经实现了这种分解(decomposeHomographyMat)。给定单应性和正确缩放的内在矩阵,该函数提供一组四种可能的旋转和平移。

The answer proposed by Jav_Rock does not provide a valid solution for camera poses in three-dimensional space.

For estimating a tree-dimensional transform and rotation induced by a homography, there exist multiple approaches. One of them provides closed formulas for decomposing the homography, but they are very complex. Also, the solutions are never unique.

Luckily, OpenCV 3 already implements this decomposition (decomposeHomographyMat). Given an homography and a correctly scaled intrinsics matrix, the function provides a set of four possible rotations and translations.

绮筵 2025-01-03 20:53:13

以防万一有人需要 @Jav_Rock 编写的函数的 python 移植:

def cameraPoseFromHomography(H):
    H1 = H[:, 0]
    H2 = H[:, 1]
    H3 = np.cross(H1, H2)

    norm1 = np.linalg.norm(H1)
    norm2 = np.linalg.norm(H2)
    tnorm = (norm1 + norm2) / 2.0;

    T = H[:, 2] / tnorm
    return np.mat([H1, H2, H3, T])

在我的任务中工作得很好。

Just in case anybody needs python porting of the function written by @Jav_Rock:

def cameraPoseFromHomography(H):
    H1 = H[:, 0]
    H2 = H[:, 1]
    H3 = np.cross(H1, H2)

    norm1 = np.linalg.norm(H1)
    norm2 = np.linalg.norm(H2)
    tnorm = (norm1 + norm2) / 2.0;

    T = H[:, 2] / tnorm
    return np.mat([H1, H2, H3, T])

Works fine in my tasks.

小兔几 2025-01-03 20:53:13

从单应矩阵计算 [R|T] 比 Jav_Rock 的答案稍微复杂一些。

在 OpenCV 3.0 中,有一个名为 cv::decomposeHomographyMat 的方法,它返回四个可能的解决方案,其中一个是正确的。然而,OpenCV 没有提供一种方法来选择正确的。

我现在正在研究这个问题,也许本月晚些时候会在这里发布我的代码。

Computing [R|T] from the homography matrix is a little more complicated than Jav_Rock's answer.

In OpenCV 3.0, there is a method called cv::decomposeHomographyMat that returns four potential solutions, one of them is correct. However, OpenCV didn't provide a method to pick out the correct one.

I'm now working on this and maybe will post my codes here later this month.

帅气尐潴 2025-01-03 20:53:13

图像上包含您的方块的平面具有消失的车道代理您的相机。
这条线的方程是Ax+By+C=0。

你的飞机的法线是(A,B,C)!

设 p00,p01,p10,p11 是应用相机内参后的齐次形式的点坐标,例如,p00=(x00,y00,1)

消失线可以计算为:

  • down = p00 cross p01;
  • 左 = p00 交叉 p10;
  • 右 = p01 交叉 p11;
  • 向上 = p10 交叉 p11;
  • v1=左交叉右;
  • v2=上叉下;
  • 消失线 = v1 交叉 v2;

标准向量叉积中的 cross

Plane that contain your Square on image has vanishing lane agents your camera.
Equation of this line is Ax+By+C=0.

Normal of your plane is (A,B,C)!

Let p00,p01,p10,p11 are coordinates of point after applying camera's intrinsic parameters and in homogenous form e.g, p00=(x00,y00,1)

Vanishing line can be calculated as:

  • down = p00 cross p01;
  • left = p00 cross p10;
  • right = p01 cross p11;
  • up = p10 cross p11;
  • v1=left cross right;
  • v2=up cross down;
  • vanish_line = v1 cross v2;

Where cross in standard vector cross product

踏月而来 2025-01-03 20:53:13

你可以使用这个功能。对我有用。

def find_pose_from_homography(H, K):
    '''
    function for pose prediction of the camera from the homography matrix, given the intrinsics 
    
    :param H(np.array): size(3x3) homography matrix
    :param K(np.array): size(3x3) intrinsics of camera
    :Return t: size (3 x 1) vector of the translation of the transformation
    :Return R: size (3 x 3) matrix of the rotation of the transformation (orthogonal matrix)
    '''
    
    
    #to disambiguate two rotation marices corresponding to the translation matrices (t and -t), 
    #multiply H by the sign of the z-comp on the t-matrix to enforce the contraint that z-compoment of point
    #in-front must be positive and thus obtain a unique rotational matrix
    H=H*np.sign(H[2,2])

    h1,h2,h3 = H[:,0].reshape(-1,1), H[:,1].reshape(-1,1) , H[:,2].reshape(-1,1)
    
    R_ = np.hstack((h1,h2,np.cross(h1,h2,axis=0))).reshape(3,3)
    
    U, S, V = np.linalg.svd(R_)
    
    R = [email protected]([[1,0,0],
                   [0,1,0],
                    [0,0,np.linalg.det([email protected])]])@V.T
    
    t = (h3/np.linalg.norm(h1)).reshape(-1,1)
    
    return R,t

You could use this function. Works for me.

def find_pose_from_homography(H, K):
    '''
    function for pose prediction of the camera from the homography matrix, given the intrinsics 
    
    :param H(np.array): size(3x3) homography matrix
    :param K(np.array): size(3x3) intrinsics of camera
    :Return t: size (3 x 1) vector of the translation of the transformation
    :Return R: size (3 x 3) matrix of the rotation of the transformation (orthogonal matrix)
    '''
    
    
    #to disambiguate two rotation marices corresponding to the translation matrices (t and -t), 
    #multiply H by the sign of the z-comp on the t-matrix to enforce the contraint that z-compoment of point
    #in-front must be positive and thus obtain a unique rotational matrix
    H=H*np.sign(H[2,2])

    h1,h2,h3 = H[:,0].reshape(-1,1), H[:,1].reshape(-1,1) , H[:,2].reshape(-1,1)
    
    R_ = np.hstack((h1,h2,np.cross(h1,h2,axis=0))).reshape(3,3)
    
    U, S, V = np.linalg.svd(R_)
    
    R = [email protected]([[1,0,0],
                   [0,1,0],
                    [0,0,np.linalg.det([email protected])]])@V.T
    
    t = (h3/np.linalg.norm(h1)).reshape(-1,1)
    
    return R,t
暖阳 2025-01-03 20:53:13

这是一个 Python 版本,基于 Dmitriy Voloshyn 提交的版本,该版本对旋转矩阵进行归一化并将结果转置为 3x4。

def cameraPoseFromHomography(H):  
    norm1 = np.linalg.norm(H[:, 0])
    norm2 = np.linalg.norm(H[:, 1])
    tnorm = (norm1 + norm2) / 2.0;

    H1 = H[:, 0] / norm1
    H2 = H[:, 1] / norm2
    H3 = np.cross(H1, H2)
    T = H[:, 2] / tnorm

    return np.array([H1, H2, H3, T]).transpose()

Here's a python version, based on the one submitted by Dmitriy Voloshyn that normalizes the rotation matrix and transposes the result to be 3x4.

def cameraPoseFromHomography(H):  
    norm1 = np.linalg.norm(H[:, 0])
    norm2 = np.linalg.norm(H[:, 1])
    tnorm = (norm1 + norm2) / 2.0;

    H1 = H[:, 0] / norm1
    H2 = H[:, 1] / norm2
    H3 = np.cross(H1, H2)
    T = H[:, 2] / tnorm

    return np.array([H1, H2, H3, T]).transpose()
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文