我正在用 C++ 构建一个识别程序,为了使其更加健壮,我需要能够找到图像中物体的距离。
假设我有一张图像是在距离 8.5 x 11 图片 22.3 英寸处拍摄的。系统正确识别尺寸为 319 像素 x 409 像素的框中的该图片。
将实际高度和宽度(AH 和 AW)以及像素高度和宽度(PH 和 PW)与距离(D)相关联的有效方法是什么?
我假设当我实际使用该方程时,PH 和 PW 将与 D 成反比,AH 和 AW 是常数(因为识别的对象始终是用户可以指示宽度和高度的对象)。
I am building a recognition program in C++ and to make it more robust, I need to be able to find the distance of an object in an image.
Say I have an image that was taken 22.3 inches away of an 8.5 x 11 picture. The system correctly identifies that picture in a box with the dimensions 319 pixels by 409 pixels.
What is an effective way for relating the actual Height and width (AH and AW) and the pixel Height and width (PH and PW) to the distance (D)?
I am assuming that when I actually go to use the equation, PH and PW will be inversely proportional to D and AH and AW are constants (as the recognized object will always be an object where the user can indicate width and height).
发布评论
评论(3)
我不知道你是否在某个时候改变了你的问题,但我的第一个答案对于你想要的东西来说相当复杂。你也许可以做一些更简单的事情。
1)漫长而复杂的解决方案(更普遍的问题)
首先您需要知道对象的大小。
您可以查看计算机视觉算法。如果您知道该物体(其尺寸和形状)。您的主要问题是姿势估计问题(即找到物体相对于相机的位置),从中您可以找到距离。你可以看一下[1][2](比如有兴趣可以找其他相关文章)或者搜索POSIT、SoftPOSIT。您可以将问题表述为优化问题:找到姿势以最小化真实图像和预期图像(给定估计姿势的对象的投影)之间的“差异”。该差值通常是每个图像点 Ni 与当前参数的对应对象 (3D) 点 Mi 的投影 P(Mi) 之间的(平方)距离之和。
由此您可以提取距离。
为此,您需要校准相机(粗略地,找到像素位置和视角之间的关系)。
现在您可能不想自己编写所有这些代码,您可以使用计算机视觉库,例如 OpenCV、Gandalf [3] ...
现在您可能想做一些更简单(和近似)的事情。如果您可以找到距相机相同“深度”(Z) 的两点之间的图像距离,则可以将图像距离 d 与实际距离 D 联系起来:d = a D/Z(其中 a 是与焦距相关的相机,您可以使用相机校准找到的像素数)
2)简短的解决方案(对于您简单的问题)
但是这是(简单,简短的)答案:如果您在与“相机平面”(即它完全面向相机)你可以使用:
其中Z是图片平面的深度,a是相机的内在参数。
作为参考,针孔相机模型将图像坐标 m=(u,v) 与世界坐标 M=(X,Y,Z) 相关联:
其中“~”表示“与”成比例,K 是相机的内部参数矩阵。您需要进行相机标定才能找到 K 参数。这里我假设 au=av=a 和 as=0。
您可以从任何这些方程中恢复 Z 参数(或取两个方程的平均值)。请注意,Z 参数不是距物体的距离(根据物体的不同点而变化),而是物体的深度(相机平面与物体平面之间的距离)。但我想这就是你想要的。
[1] 线性N点相机姿态确定,Long Quan和Zhongdan Lan
[2] 一种完整的线性4点相机姿态确定算法,Lihongzhi和Jianliang Tang
[3] http://gandalf-library.sourceforge.net/
I don't know if you changed your question at some point but my first answer it quite complicated for what you want. You probably can do something simpler.
1) Long and complicated solution (more general problems)
First you need the know the size of the object.
You can to look at computer vision algorithms. If you know the object (its dimensions and shape). Your main problem is the problem of pose estimation (that is find the position of the object relative the camera) from this you can find the distance. You can look at [1] [2] (for example, you can find other articles on it if you are interested) or search for POSIT, SoftPOSIT. You can formulate the problem as an optimization problem : find the pose in order to minimize the "difference" between the real image and the expected image (the projection of the object given the estimated pose). This difference is usually the sum of the (squared) distances between each image point Ni and the projection P(Mi) of the corresponding object (3D) point Mi for the current parameters.
From this you can extract the distance.
For this you need to calibrate you camera (roughly, find the relation between the pixel position and the viewing angle).
Now you may not want do code all of this for by yourself, you can use Computer Vision libs such as OpenCV, Gandalf [3] ...
Now you may want to do something more simple (and approximate). If you can find the image distance between two points at the same "depth" (Z) from the camera, you can relate the image distance d to the real distance D with : d = a D/Z (where a is a parameter of the camera related to the focal length, number of pixels that you can find using camera calibration)
2) Short solution (for you simple problem)
But here is the (simple, short) answer : if you picture in on a plane parallel to the "camera plane" (i.e. it is perfectly facing the camera) you can use :
where Z is the depth of the plane of the picture and a in an intrinsic parameter of the camera.
For reference the pinhole camera model relates image coordinated m=(u,v) to world coordinated M=(X,Y,Z) with :
where "~" means "proportional to" and K is the matrix of intrinsic parameters of the camera. You need to do camera calibration to find the K parameters. Here I assumed au=av=a and as=0.
You can recover the Z parameter from any of those equations (or take the average for both). Note that the Z parameter is not the distance from the object (which varies on the different points of the object) but the depth of the object (the distance between the camera plane and the object plane). but I guess that is what you want anyway.
[1] Linear N-Point Camera Pose Determination, Long Quan and Zhongdan Lan
[2] A Complete Linear 4-Point Algorithm for Camera Pose Determination, Lihong Zhi and Jianliang Tang
[3] http://gandalf-library.sourceforge.net/
如果您知道现实世界物体的大小和相机的视角,那么假设您知道水平视角 alpha(*),图像的水平分辨率为 xres,则到物体的距离 dw图像的中间部分在图像中为 xp 像素宽,在现实世界中为 xw 米宽,可以按如下方式推导(您的 三角?):
(*) alpha = 相机轴与穿过刚刚可见的图像中间行最左边点的线之间的角度。
链接到您的变量:
dw = D,xw = AW,xp = PW
If you know the size of the real-world object and the angle of view of the camera then assuming you know the horizontal angle of view alpha(*), the horizontal resolution of the image is xres, then the distance dw to an object in the middle of the image that is xp pixels wide in the image, and xw meters wide in the real world can be derived as follows (how is your trigonometry?):
(*) alpha = The angle between the camera axis and a line going through the leftmost point on the middle row of the image that is just visible.
Link to your variables:
dw = D, xw = AW, xp = PW
这可能不是一个完整的答案,但可能会推动您走向正确的方向。你知道美国宇航局是如何处理那些来自太空的照片的吗?他们在图像上到处都有那些微小的十字。据我所知,这就是他们如何正确了解物体的深度和大小的方法。解决方案可能是拥有一个您知道图片中正确尺寸和深度的物体,然后计算其他物体相对于该物体的尺寸和深度。是时候让你做一些研究了。如果美国宇航局就是这么做的,那么它应该值得一看。
我不得不说这是我长期以来在 stackoverflow 上看到的最有趣的问题之一:D。我刚刚注意到这个问题只附加了两个标签。添加更多与图像相关的内容可能会更好地帮助您。
This may not be a complete answer but may push you in the right direction. Ever seen how NASA does it on those pictures from space? The way they have those tiny crosses all over the images. Thats how they get a fair idea about the deapth and size of the object as far as I know. The solution might be to have an object that you know the correct size and deapth of in the picture and then calculate the others' relative to that. Time for you to do some research. If thats the way NASA does it then it should be worth checking out.
I have got to say This is one of the most interesting questions i have seen for a long time on stackoverflow :D. I just noticed you have only two tags attached to this question. Adding something more in relation to images might help you better.