如何确定与视频中物体的距离?

发布于 2024-08-19 15:19:25 字数 207 浏览 19 评论 0原文

我有一个从行驶中的车辆前面录制的视频文件。我将使用 OpenCV 进行对象检测和识别,但我停留在一方面。如何确定距已识别物体的距离。

我可以知道我当前的速度和现实世界的 GPS 位置,但仅此而已。我无法对我正在跟踪的对象做出任何假设。我计划用它来跟踪和跟随物体而不与它们发生碰撞。理想情况下,我想使用这些数据来得出对象的真实位置,如果我可以确定从相机到对象的距离,我就可以做到这一点。

I have a video file recorded from the front of a moving vehicle. I am going to use OpenCV for object detection and recognition but I'm stuck on one aspect. How can I determine the distance from a recognized object.

I can know my current speed and real-world GPS position but that is all. I can't make any assumptions about the object I'm tracking. I am planning to use this to track and follow objects without colliding with them. Ideally I would like to use this data to derive the object's real-world position, which I could do if I could determine the distance from the camera to the object.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

乖乖公主 2024-08-26 15:19:25

你的问题在该领域很标准。

首先,

您需要校准相机。这可以离线完成(让生活更简单)或通过自校准在线

离线校准 - 请。

其次,

一旦有了相机的校准矩阵K,就可以确定相机在连续场景中的投影矩阵(您需要使用其他人提到的视差)。 OpenCV 教程对此进行了很好的描述。

您必须使用 GPS 信息来查找连续场景中摄像机之间的相对方向(由于大多数 GPS 设备固有的噪声,这可能会出现问题),即 R教程中提到的或者两个相机之间的旋转和平移。

一旦解决了所有这些问题,您将拥有两个投影矩阵——表示这些连续场景中的摄像机。使用这些所谓的相机矩阵之一,您可以将场景上的 3D 点M“投影”到相机的 2D 图像上的像素坐标m(如在教程中)。

我们将使用它从视频中找到的 2D 点对真实 3D 点进行三角测量。

第三,

使用兴趣点检测器来跟踪视频中位于感兴趣对象上的同一点。有几种可用的检测器,我推荐 SURF 因为你有 OpenCV,它也还有其他几个检测器,例如 Shi-Tomasi 角哈里斯

第四,

一旦您在序列中跟踪了对象的点并获得了相应的 2D 像素坐标,您必须 三角测量 以获得最佳拟合 3D 点。
Triangulation

上面的图片很好地捕捉了不确定性以及如何计算最佳拟合 3D 点。当然,在您的情况下,摄像机可能位于彼此的前面!

最后,

一旦获得了物体上的 3D 点,您就可以轻松计算相机中心(在大多数情况下为原点)和该点之间的欧几里得距离。

注意

这显然不是一件容易的事,但也不是那么难。我推荐 Hartley 和 Zisserman 的优秀著作 Multiple View Geometry ,它描述了上述所有内容带有 MATLAB 代码的明确详细信息可供启动。

玩得开心并继续提问!

Your problem's quite standard in the field.

Firstly,

you need to calibrate your camera. This can be done offline (makes life much simpler) or online through self-calibration.

Calibrate it offline - please.

Secondly,

Once you have the calibration matrix of the camera K, determine the projection matrix of the camera in a successive scene (you need to use parallax as mentioned by others). This is described well in this OpenCV tutorial.

You'll have to use the GPS information to find the relative orientation between the cameras in the successive scenes (that might be problematic due to noise inherent in most GPS units), i.e. the R and t mentioned in the tutorial or the rotation and translation between the two cameras.

Once you've resolved all that, you'll have two projection matrices --- representations of the cameras at those successive scenes. Using one of these so-called camera matrices, you can "project" a 3D point M on the scene to the 2D image of the camera on to pixel coordinate m (as in the tutorial).

We will use this to triangulate the real 3D point from 2D points found in your video.

Thirdly,

use an interest point detector to track the same point in your video which lies on the object of interest. There are several detectors available, I recommend SURF since you have OpenCV which also has several other detectors like Shi-Tomasi corners, Harris, etc.

Fourthly,

Once you've tracked points of your object across the sequence and obtained the corresponding 2D pixel coordinates you must triangulate for the best fitting 3D point given your projection matrix and 2D points.
Triangulation

The above image nicely captures the uncertainty and how a best fitting 3D point is computed. Of course in your case, the cameras are probably in front of each other!

Finally,

Once you've obtained the 3D points on the object, you can easily compute the Euclidean distance between the camera center (which is the origin in most cases) and the point.

Note

This is obviously not easy stuff but it's not that hard either. I recommend Hartley and Zisserman's excellent book Multiple View Geometry which has described everything above in explicit detail with MATLAB code to boot.

Have fun and keep asking questions!

再可℃爱ぅ一点好了 2024-08-26 15:19:25

当您有移动视频时,您可以使用时间视差来确定对象的相对距离。视差:(定义)。

其效果与我们用眼睛获得的效果相同,眼睛可以通过从稍微不同的角度观看同一物体来获得深度知觉。由于您在移动,因此您可以使用两个连续的视频帧来获得略有不同的角度。

使用视差计算,您可以确定对象的相对大小和距离(相对于彼此)。但是,如果您想要绝对尺寸和距离,您将需要一个已知的参考点。

您还需要知道行进的速度和方向(以及视频帧速率)才能进行计算。您也许能够使用视觉数据得出车辆的速度,但这又增加了另一个维度的复杂性。

这项技术已经存在。卫星通过比较短时间内拍摄的多张图像来确定地形突出(高度)。我们通过拍摄地球绕太阳轨道上不同点的夜空照片,利用视差来确定恒星的距离。我能够通过在短时间内连续拍摄两张照片来创建飞机窗外的 3D 图像。

确切的技术和计算(即使我已经清楚地知道它们)远远超出了这里讨论的范围。如果我能找到合适的参考,我会将其发布在这里。

When you have moving video, you can use temporal parallax to determine the relative distance of objects. Parallax: (definition).

The effect would be the same we get with our eyes which which can gain depth perception by looking at the same object from slightly different angles. Since you are moving, you can use two successive video frames to get your slightly different angle.

Using parallax calculations, you can determine the relative size and distance of objects (relative to one another). But, if you want the absolute size and distance, you will need a known point of reference.

You will also need to know the speed and direction being traveled (as well as the video frame rate) in order to do the calculations. You might be able to derive the speed of the vehicle using the visual data but that adds another dimension of complexity.

The technology already exists. Satellites determine topographic prominence (height) by comparing multiple images taken over a short period of time. We use parallax to determine the distance of stars by taking photos of night sky at different points in earth's orbit around the sun. I was able to create 3-D images out of an airplane window by taking two photographs within short succession.

The exact technology and calculations (even if I knew them off the top of my head) are way outside the scope of discussing here. If I can find a decent reference, I will post it here.

只为守护你 2024-08-26 15:19:25

您需要在相距已知距离的两个不同帧上识别同一对象中的相同点。由于您知道相机在每一帧中的位置,因此您有一条基线(两个相机位置之间的矢量)。根据已知基线和到已识别点的角度构造一个三角形。三角学为您提供了未知边的长度已知基线长度以及基线和未知边之间的已知角度的三角形

您可以使用两个摄像机或一个摄像机连续拍摄,因此,如果您的车辆以 1 m/s 的速度移动,并且您会取得成功。每秒,然后连续的帧会给你一个 1m 的基线,这应该可以很好地测量 5m 以内的物体的距离,如果你需要比所使用的帧更远的距离 - 然而更多。 。

观察者在 F1 处看到的目标在 T 处的距离为 a1 到 F2 处的目标在 T 处的角度为 a2 的距离为 三角

等式 对于余弦,得出

Cos( 90 – a1 ) = x / r1 = c1

Cos( 90 - a2 ) = x / r2 = c2

Cos( a1 ) = (b + z) / r1 = c3

Cos( a2 ) = z / r2 = c4

x 是与观察者速度矢量正交的目标距离

z 是从 F2 到与 x 交点的距离

求解 r1

r1 = b / ( c3 – c1 . c4 / c2 )

You need to identify the same points in the same object on two different frames taken a known distance apart. Since you know the location of the camera in each frame, you have a baseline ( the vector between the two camera positions. Construct a triangle from the known baseline and the angles to the identified points. Trigonometry gives you the length of the unknown sides of the traingles for the known length of the baseline and the known angles between the baseline and the unknown sides.

You can use two cameras, or one camera taking successive shots. So, if your vehicle is moving a 1 m/s and you take fames every second, then successibe frames will gibe you a 1m baseline which should be good to measure the distance of objects up to, say, 5m away. If you need to range objects further away than the frames used need to be further apart - however more distant objects will in view for longer.

Observer at F1 sees target at T with angle a1 to velocity vector. Observer moves distance b to F2. Sees target at T with angle a2.

Required to find r1, range from target at F1

The trigonometric identity for cosine gives

Cos( 90 – a1 ) = x / r1 = c1

Cos( 90 - a2 ) = x / r2 = c2

Cos( a1 ) = (b + z) / r1 = c3

Cos( a2 ) = z / r2 = c4

x is distance to target orthogonal to observer’s velocity vector

z is distance from F2 to intersection with x

Solving for r1

r1 = b / ( c3 – c1 . c4 / c2 )

花开雨落又逢春i 2024-08-26 15:19:25

两个摄像头,以便您可以检测视差。这就是人类所做的。

编辑

请参阅 ravenspoint 的答案以获取更多详细信息。另外,请记住,带有分光器的单个相机可能就足够了。

Two cameras so you can detect parallax. It's what humans do.

edit

Please see ravenspoint's answer for more detail. Also, keep in mind that a single camera with a splitter would probably suffice.

哆啦不做梦 2024-08-26 15:19:25

使用立体视差图。许多实现正在进行中,以下是一些链接:
http://homepages.inf.ed。 ac.uk/rbf/CVonline/LOCAL_COPIES/OWENS/LECT11/node4.html

http://www.ece.ucsb.edu/~manj/ece181bS04/L14(morestereo).pdf

如果您没有立体摄像头,但可以评估深度使用视频

我认为上述内容 是对你最有帮助的。

到目前为止,研究已经取得进展,可以从单个单眼图像评估深度(尽管达不到令人满意的程度)
http://www.cs.cornell.edu/~asaxena/learningdepth/

use stereo disparity maps. lots of implementations are afloat, here are some links:
http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/OWENS/LECT11/node4.html

http://www.ece.ucsb.edu/~manj/ece181bS04/L14(morestereo).pdf

In you case you don't have stereo camera, but depth can be evaluated using video
http://www.springerlink.com/content/g0n11713444148l2/

I think the above will be what might help you the most.

research has progressed so far that depth can be evaluated ( though not to a satisfactory extend) from a single monocular image
http://www.cs.cornell.edu/~asaxena/learningdepth/

谈下烟灰 2024-08-26 15:19:25

如果我错了,请有人纠正我,但在我看来,如果您只是使用单个相机并仅依赖于软件解决方案,那么您可能执行的任何处理都容易出现误报。我非常怀疑是否有任何处理可以区分真正处于感知距离的物体和电影中仅出现在该距离的物体(例如“强制透视”)。

您有机会添加超声波传感器吗?

Someone please correct me if I'm wrong, but it seems to me that if you're going to simply use a single camera and simply relying on a software solution, any processing you might do would be prone to false positives. I highly doubt that there is any processing that could tell the difference between objects that really are at the perceived distance and those which only appear to be at that distance (like the "forced perspective") in movies.

Any chance you could add an ultrasonic sensor?

梦醒灬来后我 2024-08-26 15:19:25

首先,您应该校准相机,以便获得相机计划中的物体位置与现实世界计划中的位置之间的关系,如果您使用单个相机,则可以使用“光流技术”
如果你使用两个相机,你可以使用三角测量方法来找到真实位置(很容易找到物体的距离),但第二种方法的探针是匹配的,这意味着你如何找到物体的位置相机 2 中的对象“x”(如果您已经知道其在相机 1 中的位置),在这里您可以使用“SIFT”算法。
我刚刚给了你一些关键词希望它可以帮助你。

first, you should calibrate your camera so you can get the relation between the objects positions in the camera plan and their positions in the real world plan, if you are using a single camera you can use the "optical flow technic"
if you are using two cameras you can use the triangulation method to find the real position (it will be easy to find the distance of the objects) but the probem with the second method is the matching, which means how can you find the position of an object 'x' in camera 2 if you already know its position in camera 1, and here you can use the 'SIFT' algorithme.
i just gave you some keywords wish it could help you.

寂寞花火° 2024-08-26 15:19:25

将已知尺寸的物体放入摄像机视野中。这样您就可以有更客观的指标来测量角距离。如果没有第二个视点/相机,您将只能估计大小/距离,但至少它不会是一个完整的猜测。

Put and object of known size in the cameras field of view. That way you can have a more objective metric to measure angular distances. Without a second viewpoint/camera you'll be limited to estimating size/distance but at least it won't be a complete guess.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文