为什么有 3 个相互冲突的 OpenCV 相机标定公式？

发布于 2024-09-01 22:07:52 字数 2616 浏览 16 评论 0原文

我对用于相机校准目的的 OpenCV 各种坐标参数化有疑问。问题在于，关于图像失真公式的三种不同信息源显然给出了所涉及的参数和方程的三种非等效描述：

（1）在他们的书中“学习 OpenCV...” Bradski 和 Kaehler 写了关于镜头畸变（第 376 页）：

xcorrected = x * ( 1 + k1 * r^2 + k2 * r^4  + k3 * r^6 ) + [ 2 * p1 * x * y + p2 * ( r^2 + 2 * x^2 ) ],

ycorrected = y * ( 1 + k1 * r^2 + k2 * r^4  + k3 * r^6 ) + [ p1 * ( r^2 + 2 * y^2 ) + 2 * p2 * x * y ],

其中 r = sqrt( x^2 + y^2 )。

假设 (x, y) 是未校正的捕获图像中的像素坐标，对应于坐标为 (X, Y, Z) 的世界点对象，以相机框架为参考，其中

xcorrected = fx * ( X / Z ) + cx    and     ycorrected = fy * ( Y / Z ) + cy,

fx、fy、cx 和 cy，是相机的内在参数。因此，从捕获的图像中获取 (x, y)，我们可以获得所需的坐标 ( x Corrected, y Corrected )，以通过应用上述前两个校正表达式来生成捕获的世界场景的未失真图像。

然而...

(2) 当我们查看相机校准和 3D 重建部分下的 OpenCV 2.0 C 参考条目时，问题就出现了。为了便于比较，我们从相对于相机参考系表示的所有世界点（X、Y、Z）坐标开始，就像#1 中一样。因此，变换矩阵 [ R | t] 无关紧要。

在 C 参考文献中，表示为：

x' = X / Z,

y' = Y / Z,

x'' = x' * ( 1 + k1 * r'^2 + k2 * r'^4  + k3 * r'^6 ) + [ 2 * p1 * x' * y' + p2 * ( r'^2 + 2 * x'^2 ) ],

y'' = y' *  ( 1 + k1 * r'^2 + k2 * r'^4  + k3 * r'^6 ) + [ p1 * ( r'^2 + 2 * y'^2 )  + 2 * p2 * x' * y' ],

where r' = sqrt( x'^2 + y'^2 )，最后

u = fx * x'' + cx,

v = fy * y'' + cy.

如人们所见，这些表达式与 #1 中提出的表达式并不等效，结果为两组校正坐标 ( x Corrected, y Corrected ) 和 ( u, v ) 不相同。为什么会出现矛盾呢？在我看来，第一组更有意义，因为我可以为其中的每个 x 和 y 赋予物理意义，而当相机聚焦时，我发现 x' = X / Z 和 y' = Y / Z 没有物理意义length 不完全是 1。此外，我们无法计算 x' 和 y'，因为我们不知道 (X, Y, Z)。

(3) 不幸的是，当我们参考英特尔开源计算机视觉库参考手册的镜头畸变部分（第 6-4 页）中的文字时，事情变得更加模糊，其中部分指出：

“让 ( u, v ) 为真实像素图像坐标，即理想投影的坐标，( u ̃, v ̃ ) 是对应的真实观察到的（失真的）图像坐标，类似地， ( x, y ) 是理想的（无失真）图像坐标， ( x ̃, y ̃ ) 是理想的（无失真）图像坐标。是真实的（扭曲的）图像物理坐标。考虑到两个扩展项，得出以下结果：

x ̃  =  x * ( 1 +  k1 * r^2 + k2 * r^4 ) + [ 2 p1 * x * y + p2 * ( r^2 + 2 * x^2 ) ] 

y ̃  =  y * ( 1 +  k1 * r^2 + k2 * r^4 ] + [ 2 p2 * x * y + p2 * ( r^2 + 2 * y^2 ) ],

其中 r = sqrt( x^2 + y^2 ) ...

“因为 u ̃ = cx + fx * u 和 v ̃ =。 cy + fy * v ，...生成的系统可以重写如下：

u ̃  = u + ( u – cx ) * [ k1 * r^2 + k2 * r^4 + 2 * p1 * y + p2 * ( r^2 / x + 2 * x ) ]

v ̃  = v + ( v – cy ) * [ k1 * r^2 + k2 * r^4 + 2 * p2 * x + p1 * ( r^2 / y + 2 * y ) ]

后一个关系用于使来自相机的图像不失真。”

好吧，看起来涉及 x ̃ 和 y ̃ 的表达式与给出的两个表达式一致这篇文章的顶部涉及 x Corrected 和 y Corrected 但是，根据给出的描述，x ̃ 和 y ̃ 并不是指校正后的坐标。我不明白坐标 ( x ̃, y ̃ ) 和 ( u ̃, v ̃ ) 之间的含义之间的区别，或者就此而言， ( x, y ) 和 ( u, v ) 对之间的区别。从他们的描述来看，它们唯一的区别是 ( x ̃, y ̃ ) 和 ( x, y ) 指的是“物理”坐标，而 ( u ̃, v ̃ ) 和 ( u, v ) 则不是。这个区别到底是什么？不都是物理坐标吗？我迷路了！

感谢您的任何意见！

原文

I'm having a problem with OpenCV's various parameterization of coordinates used for camera calibration purposes. The problem is that three different sources of information on image distortion formulae apparently give three non-equivalent description of the parameters and equations involved:

(1) In their book "Learning OpenCV…" Bradski and Kaehler write regarding lens distortion (page 376):

xcorrected = x * ( 1 + k1 * r^2 + k2 * r^4  + k3 * r^6 ) + [ 2 * p1 * x * y + p2 * ( r^2 + 2 * x^2 ) ],

ycorrected = y * ( 1 + k1 * r^2 + k2 * r^4  + k3 * r^6 ) + [ p1 * ( r^2 + 2 * y^2 ) + 2 * p2 * x * y ],

where r = sqrt( x^2 + y^2 ).

Assumably, (x, y) are the coordinates of pixels in the uncorrected captured image corresponding to world-point objects with coordinates (X, Y, Z), camera-frame referenced, for which

xcorrected = fx * ( X / Z ) + cx    and     ycorrected = fy * ( Y / Z ) + cy,

where fx, fy, cx, and cy, are the camera's intrinsic parameters. So, having (x, y) from a captured image, we can obtain the desired coordinates ( xcorrected, ycorrected ) to produced an undistorted image of the captured world scene by applying the above first two correction expressions.

However...

(2) The complication arises as we look at OpenCV 2.0 C Reference entry under the Camera Calibration and 3D Reconstruction section. For ease of comparison we start with all world-point (X, Y, Z) coordinates being expressed with respect to the camera's reference frame, just as in #1. Consequently, the transformation matrix [ R | t ] is of no concern.

In the C reference, it is expressed that:

x' = X / Z,

y' = Y / Z,

x'' = x' * ( 1 + k1 * r'^2 + k2 * r'^4  + k3 * r'^6 ) + [ 2 * p1 * x' * y' + p2 * ( r'^2 + 2 * x'^2 ) ],

y'' = y' *  ( 1 + k1 * r'^2 + k2 * r'^4  + k3 * r'^6 ) + [ p1 * ( r'^2 + 2 * y'^2 )  + 2 * p2 * x' * y' ],

where r' = sqrt( x'^2 + y'^2 ), and finally that

u = fx * x'' + cx,

v = fy * y'' + cy.

As one can see these expressions are not equivalent to those presented in #1, with the result that the two sets of corrected coordinates ( xcorrected, ycorrected ) and ( u, v ) are not the same. Why the contradiction? It seems to me the first set makes more sense as I can attach physical meaning to each and every x and y in there, while I find no physical meaning in x' = X / Z and y' = Y / Z when the camera focal length is not exactly 1. Furthermore, one cannot compute x' and y' for we don't know (X, Y, Z).

(3) Unfortunately, things get even murkier when we refer to the writings in Intel's Open Source Computer Vision Library Reference Manual's section Lens Distortion (page 6-4), which states in part:

"Let ( u, v ) be true pixel image coordinates, that is, coordinates with ideal projection, and ( u ̃, v ̃ ) be corresponding real observed (distorted) image coordinates. Similarly, ( x, y ) are ideal (distortion-free) and ( x ̃, y ̃ ) are real (distorted) image physical coordinates. Taking into account two expansion terms gives the following:

x ̃  =  x * ( 1 +  k1 * r^2 + k2 * r^4 ) + [ 2 p1 * x * y + p2 * ( r^2 + 2 * x^2 ) ] 

y ̃  =  y * ( 1 +  k1 * r^2 + k2 * r^4 ] + [ 2 p2 * x * y + p2 * ( r^2 + 2 * y^2 ) ],

where r = sqrt( x^2 + y^2 ). ...

"Because u ̃ = cx + fx * u and v ̃ = cy + fy * v , … the resultant system can be rewritten as follows:

u ̃  = u + ( u – cx ) * [ k1 * r^2 + k2 * r^4 + 2 * p1 * y + p2 * ( r^2 / x + 2 * x ) ]

v ̃  = v + ( v – cy ) * [ k1 * r^2 + k2 * r^4 + 2 * p2 * x + p1 * ( r^2 / y + 2 * y ) ]

The latter relations are used to undistort images from the camera."

Well, it would appear that the expressions involving x ̃ and y ̃ coincided with the two expressions given at the top of this writing involving xcorrected and ycorrected. However, x ̃ and y ̃ do not refer to corrected coordinates, according to the given description. I don't understand the distinction between the meaning of the coordinates ( x ̃, y ̃ ) and ( u ̃, v ̃ ), or for that matter, between the pairs ( x, y ) and ( u, v ). From their descriptions it appears their only distinction is that ( x ̃, y ̃ ) and ( x, y ) refer to 'physical' coordinates while ( u ̃, v ̃ ) and ( u, v ) do not. What is this distinction all about? Aren't they all physical coordinates? I'm lost!

Thanks for any input!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

帅气尐潴 2024-09-08 22:07:52

相机标定没有唯一的公式，它们都是有效的。请注意，第一个包含常数 K1、K2 和 K1、K2。 K3 表示 r^2、r^4 和 r^2 r^6，另外两个只有r^2和r^4常数？那是因为它们都是近似模型。第一个可能更准确，因为它有更多参数。

任何时候你看到：

r = sqrt( x^2 + y^2 )

假设 x =（x 坐标像素）-（相机中心以像素为单位）可能是安全的，因为 r 通常表示距中心的半径。

顺便问一下你想做什么？估计相机参数，校正镜头畸变，或两者兼而有之？

There is no one and only formula for camera calibration, they are all valid. Notice the first one contains constants K1, K2 & K3 for r^2, r^4 & r^6, and the other two only have constants for r^2 and r^4? That is because they are all approximate models. The first one is likely to be more accurate since it has more parameters.

Anytime you see: