重建对应点的 3D 位置
我正在开展一个项目,我想重建 3D 位置 我从相机图像中提取的特征点。这个想法是:
- 进行相机记录(灰度信息,VGA尺寸:640 x 480)
- 提取相机帧中的特征点(我为此使用SIFT)
- 将帧[k-1]中的特征与帧[中的特征相对应k](我打算 为此使用 RANSAC,稍后详细介绍...)
- 计算/估计这些特征之间的一些相对距离信息 点(这将在某些 (x,y,z) 坐标系中)
我在许多论文中读到 RANSAC 是一种用于 重建,最终结果是某种点云。我想成为 能够做到这一点。不过我也遇到了一些困难,希望大家能指正 可以帮助我解决这些问题。
第一个障碍是我不太明白如何使用 RANSAC 执行此点对应。我理解RANSAC的概念 作为一个模型拟合工具,我只是不明白它如何用于做 对应解决。
第二个障碍是,假设我有通信信息,如何获取 所有这些点之间的某种距离信息。我读过 透视投影可以用来解决这个问题,反过来人们应该尝试 估计基本矩阵。然后做一些数学魔术就能得到 点云。 重点是,我不明白基本矩阵中的实际值是什么 意思是。我知道它给出了 2 的位置之间的数学关系 摄像机(或者在我的例子中,摄像机移动的视频中的 2 帧),以及 它利用了对极几何。但除此之外,我就是不知道 基本矩阵实际上意味着什么。这个 3x3 矩阵是如何捕获的 一个相机相对于另一个相机的 6DOF? 另外我认为我提到的“数学魔法”是某种矩阵 乘法,但我还没有找到任何信息来源来解释我的意思 它的作用以及配方是什么。
因此,我的问题是: 你们中有人能指出我正确的方向吗?我一直在挖掘 到目前为止我读过的论文的参考文献,但这些也给了我“我们 使用 RANSAC 算法解决这个问题”-line,我越来越感觉我是 看向错误的方向。 对这些事情有一些很好的解释吗,也许用外行的话和/或 有一些插图? 简而言之:我应该在哪里寻找或者在哪里可以找到这个难以捉摸的部分 信息?
提前致谢, Xilconic
PS:检查了维基百科,但这对我没有多大帮助。还听了 “基本矩阵之歌”,也是同样的故事。
I'm working on a project where I would like to reconstruct the 3D locations of
feature points I've extracted from my camera images. The idea is to:
- Make a camera recording (Greyscale information, VGA size: 640 x 480)
- Extract feature points in the camera frames (I'm using SIFT for this)
- Correspond features from frame[k-1] with features from frame[k] (I intend to
use RANSAC for this, more on that later...) - Calculate/estimate some relative distance information between these feature
points (this would be in some (x,y,z) coordinate system)
I've read in many papers that RANSAC is an algorithm that is used in
reconstruction, with the end result being some kind of point cloud. I want to be
able to do just that. However, I've ran into a few snags, and I hope you guys
can help me out with these.
The first snag is that I do not really understand how I would be able to use
RANSAC to perform this point correspondence. I understand the concept of RANSAC
being a model-fitting tool, I just don't see how it could be used for doing
correspondence solving.
The second snag is, assuming I have my correspondence information, how to get
some kind of distance information between all these points. I've read that
perspective projection could be used to solve this, and in turn one should try
to estimate the Fundamental Matrix. Then do some math magic to be able to get
the point cloud.
Point is, I don't understand what the actual values in a Fundamental Matrix
mean. I know it's gives a mathematical relation between the position of 2
cameras (or in my case, 2 frames in a video there the camera is moving), and
that it exploits epipolar geometry. But besides this, I just don't have a clue
what the Fundamental Matrix actually entails. How is this 3x3 matrix capturing
the 6DOF of 1 camera with respect to another?
Also I think the 'math magic' I referred to are some kind of matrix
multiplications, but I haven't found any informational source to explain me what
it does and what the formulation is.
Therefore, my question is:
Could anyone of you point me into the right direction? I've been digging through
the references of the papers I've read so far, but these also give me the "we
solve this using the RANSAC algorithm"-line and I'm getting more the feeling I'm
looking in the wrong direction.
Is there some nice explanation of these things, perhaps in laymen's terms and/or
with some illustrations?
In short: where should I be looking or where can I find this elusive piece of
information?
Thanks in advance,
Xilconic
PS: Checked wikipedia, but it's not helping me much. Also listened to the
'Fundamental Matrix Song', and it's the same story.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
为此写了我的论文,在我的论文中也使用了RANSAC算法。
这个主题的内容远不止这里几段所能涵盖的内容。考虑获取这本优秀的书多视图几何 。
Snag 1
即使存在大量异常值, RANSAC 也会找到一个模型,在本例中为基本矩阵F。在这种情况下,一些点对应候选者还很遥远。这是一个异常值。基本上,您只需从随机抽取的点中不断拟合 F 矩阵即可。最终你会发现一些点集共同创建了一个一致的模型。这些是内点。它们现在可用于更准确地估计模型 (F)。
我的论文中有一个简单的例子,其中有一个线拟合示例可以帮助您入门,并且对应用于对应问题的 RANSAC 进行了易于理解的解释。
障碍 2
关于 F 矩阵最重要的是它将一幅图像中的点映射到另一幅图像中的线:
Fx = < strong>l',其中 x 是一幅图像中的点,l' 是另一幅图像中的一条线。
F 矩阵有 9 个元素,但必须具有秩 2 并且尺度并不重要,因此它只有 7 个自由度。对于F矩阵的元素没有简单的解释。
使用点对应x <->如果您知道相机的内部参数(例如焦距),则可以提取所描绘点的 x' 和 F 世界 3D 坐标 X 。
请注意,当使用连续的电影帧时,摄像机通常移动很少,并且可能很难计算基本矩阵。不过,它是可以解决的。我建议研究 Marc Pollefeys 的作品
Wrote my thesis on this, also using the RANSAC algorithm in my paper.
There is more to this topic than can be captured in a few paragraphs here. Consider getting the excellent book Multiple View Geometry.
Snag 1
RANSAC will find a model, in this case the fundamental matrix F, even in the presence of huge amount of outliers. In this case, some point-correspondance candidates are way off. This is an outlier. Basically you just keep fitting the F matrix from randomly drawn points. Eventually you find some set of points that together creates a consistent model. These are the inliers. They can now be used to estimate the model (F) more accurately.
There is an easy example in my paper with a line-fitting example to get you started and a easy-to-grasp explanation of RANSAC applied to the correspondance problem.
Snag 2
The most important thing about the F matrix is that it maps a point in one image to a line in the other:
Fx = l' where x is a point in one image and l' is a line in the other.
The F matrix has 9 elements but must have rank 2 and also the scale does not matter, thus it has only 7 degrees of freedom. There is no easy explanation for the elements of the F matrix.
Using a point correspondance x <-> x' and F the world 3d coordinate, X, of the depicted point can be extracted if you know the cameras internal parameters, like focal length.
Note that when using consecutive movie frames the camera usually moves very little and it might be hard to compute the fundamental matrix. It can be worked around though. I suggest looking into the works of Marc Pollefeys'
查看基本原理上维基百科条目中的第一个公式矩阵:
这是您尝试使用 RANSAC 求解的“模型”。您有两个
3xn (n>=7)
矩阵x
和x'
代表两个图像中所有相应的x,y
-x',y'
点(第三个坐标始终是数字 1)。还有一个未知的3x3
矩阵F
,您想要找出其值。 维基百科条目中的 RANSAC 伪代码算法是一个很好的解释。现在,基本矩阵是什么?
将图像中的点视为连接相机位置和 3D 空间中该点的 3D 线。这条线向两个方向延伸至无穷大。如果您使用不同的相机查看该线上的 3D 点,那么在该相机的图像中您会看到一条线正好穿过该点。图像中的点到 3D 线的变换(实际上是投影)只是一个矩阵运算。将 3D 线投影到 2D 图像也是矩阵运算。
F
在一个矩阵中捕获这两种矩阵运算。F
也可用于确定也许这有一点帮助?否则,我从 哈特利和齐瑟曼。
Look at the first formula in the wikipedia entry on the fundamental matrix:
This is the "model" you are trying to solve using RANSAC. You have two
3xn (n>=7)
matricesx
andx'
that represent all your correspondingx,y
-x',y'
points in both images (the 3rd coordinate is just the number 1 all the time). And an unknown3x3
matrixF
for which you want to find out the values. The pseudocode algorithm for RANSAC in the wikipedia entry is a pretty good explanation.Now, what is the fundamental matrix?
One way to think of a point in an image is as a 3D line connecting the camera position and that point in 3D space. This line extends to infinity in both directions. If you look at a 3D point on that line with a different camera then in the image from that camera you see a line going right across it. The transformation (projection really) of a point in an image to a 3D line is just a matrix operation. The projection of a line in 3D onto a 2D image is also a matrix operation.
F
captures both these matrix operations in one matrix.F
can also be used to determine the camera matrix of both camera's, which can then be used for the 3D reconstruction.Maybe this helps a bit? Otherwise, I've learned most I know about this from Hartley and Zisserman.
使用 5 点或 8 点算法等对基本矩阵进行稳健的解决方案肯定是一个好的开始。也就是说,基本矩阵解可能容易受到异常值的影响,您可能需要一些额外的总体系统来执行实际的 3D 求解。您可以使用卡尔曼滤波器类型的方法(快速,可以在嵌入式系统上实时完成)或捆绑调整(非常准确,但可能更慢)。
您可以使用一些优秀的 SFM 软件或从中汲取灵感:
VSLAM(由 Konolige 开发,是斯坦福大学的教授,也在 Willow Garage(OpenCV 人员)工作。可能是我见过的最快的捆绑调整解决方案。
RSLAM(由牛津移动机器人小组开发,显示了一些优异的成绩)
A robust solution to a fundamental matrix using something like a 5 point or 8 point algorithm is certain a good start. That said, fundamental matrix solution can be susceptible to outliers and you'll likely want some additional overarching system to do the actual 3D solving. You can use a Kalman Filter type approach (fast, can be done in real time on embedded systems) or bundle adjustment (very accurate but can be slower).
Some good SFM software out there that you can use or draw inspiration from:
VSLAM (developed by Konolige who is a professor at Stanford and also works at Willow Garage, the OpenCV folks). Probably the fastest bundle adjustment solution I've seen.
RSLAM (developed by the Oxford Mobile Robotics Group, showing some excellent results)