当前位置：文江博客话题详情

如何判断图片中物体的（现实世界）距离？

发布于 2024-11-14 04:46:32 字数 265 浏览 6 评论 0 原文

我正在用 C++ 构建一个识别程序，为了使其更加健壮，我需要能够找到图像中物体的距离。

假设我有一张图像是在距离 8.5 x 11 图片 22.3 英寸处拍摄的。系统正确识别尺寸为 319 像素 x 409 像素的框中的该图片。
将实际高度和宽度（AH 和 AW）以及像素高度和宽度（PH 和 PW）与距离（D）相关联的有效方法是什么？

我假设当我实际使用该方程时，PH 和 PW 将与 D 成反比，AH 和 AW 是常数（因为识别的对象始终是用户可以指示宽度和高度的对象）。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

痴情换悲伤 2024-11-21 04:46:32

我不知道你是否在某个时候改变了你的问题，但我的第一个答案对于你想要的东西来说相当复杂。你也许可以做一些更简单的事情。

1）漫长而复杂的解决方案（更普遍的问题）

首先您需要知道对象的大小。

您可以查看计算机视觉算法。如果您知道该物体（其尺寸和形状）。您的主要问题是姿势估计问题（即找到物体相对于相机的位置），从中您可以找到距离。你可以看一下[1][2]（比如有兴趣可以找其他相关文章）或者搜索POSIT、SoftPOSIT。您可以将问题表述为优化问题：找到姿势以最小化真实图像和预期图像（给定估计姿势的对象的投影）之间的“差异”。该差值通常是每个图像点 Ni 与当前参数的对应对象 (3D) 点 Mi 的投影 P(Mi) 之间的（平方）距离之和。

由此您可以提取距离。

为此，您需要校准相机（粗略地，找到像素位置和视角之间的关系）。

现在您可能不想自己编写所有这些代码，您可以使用计算机视觉库，例如 OpenCV、Gandalf [3] ...

现在您可能想做一些更简单（和近似）的事情。如果您可以找到距相机相同“深度”(Z) 的两点之间的图像距离，则可以将图像距离 d 与实际距离 D 联系起来：d = a D/Z（其中 a 是与焦距相关的相机，您可以使用相机校准找到的像素数）

2）简短的解决方案（对于您简单的问题）

但是这是（简单，简短的）答案：如果您在与“相机平面”（即它完全面向相机）你可以使用：

PH = a AH / Z
PW = a AW / Z

其中Z是图片平面的深度，a是相机的内在参数。

作为参考，针孔相机模型将图像坐标 m=(u,v) 与世界坐标 M=(X,Y,Z) 相关联：

m   ~       K       M

[u]   [ au as u0 ] [X]
[v] ~ [    av v0 ] [Y]
[1]   [        1 ] [Z]

[u] = [ au as ] X/Z + u0
[v]   [    av ] Y/Z + v0

其中“~”表示“与”成比例，K 是相机的内部参数矩阵。您需要进行相机标定才能找到 K 参数。这里我假设 au=av=a 和 as=0。

您可以从任何这些方程中恢复 Z 参数（或取两个方程的平均值）。请注意，Z 参数不是距物体的距离（根据物体的不同点而变化），而是物体的深度（相机平面与物体平面之间的距离）。但我想这就是你想要的。

[1] 线性N点相机姿态确定，Long Quan和Zhongdan Lan

[2] 一种完整的线性4点相机姿态确定算法，Lihongzhi和Jianliang Tang

[3] http://gandalf-library.sourceforge.net/

I don't know if you changed your question at some point but my first answer it quite complicated for what you want. You probably can do something simpler.

1) Long and complicated solution (more general problems)

First you need the know the size of the object.

You can to look at computer vision algorithms. If you know the object (its dimensions and shape). Your main problem is the problem of pose estimation (that is find the position of the object relative the camera) from this you can find the distance. You can look at [1] [2] (for example, you can find other articles on it if you are interested) or search for POSIT, SoftPOSIT. You can formulate the problem as an optimization problem : find the pose in order to minimize the "difference" between the real image and the expected image (the projection of the object given the estimated pose). This difference is usually the sum of the (squared) distances between each image point Ni and the projection P(Mi) of the corresponding object (3D) point Mi for the current parameters.

From this you can extract the distance.

For this you need to calibrate you camera (roughly, find the relation between the pixel position and the viewing angle).

Now you may not want do code all of this for by yourself, you can use Computer Vision libs such as OpenCV, Gandalf [3] ...

Now you may want to do something more simple (and approximate). If you can find the image distance between two points at the same "depth" (Z) from the camera, you can relate the image distance d to the real distance D with : d = a D/Z (where a is a parameter of the camera related to the focal length, number of pixels that you can find using camera calibration)

2) Short solution (for you simple problem)

But here is the (simple, short) answer : if you picture in on a plane parallel to the "camera plane" (i.e. it is perfectly facing the camera) you can use :

PH = a AH / Z
PW = a AW / Z

where Z is the depth of the plane of the picture and a in an intrinsic parameter of the camera.

For reference the pinhole camera model relates image coordinated m=(u,v) to world coordinated M=(X,Y,Z) with :

m   ~       K       M

[u]   [ au as u0 ] [X]
[v] ~ [    av v0 ] [Y]
[1]   [        1 ] [Z]

[u] = [ au as ] X/Z + u0
[v]   [    av ] Y/Z + v0

where "~" means "proportional to" and K is the matrix of intrinsic parameters of the camera. You need to do camera calibration to find the K parameters. Here I assumed au=av=a and as=0.

You can recover the Z parameter from any of those equations (or take the average for both). Note that the Z parameter is not the distance from the object (which varies on the different points of the object) but the depth of the object (the distance between the camera plane and the object plane). but I guess that is what you want anyway.

[1] Linear N-Point Camera Pose Determination, Long Quan and Zhongdan Lan

[2] A Complete Linear 4-Point Algorithm for Camera Pose Determination, Lihong Zhi and Jianliang Tang

[3] http://gandalf-library.sourceforge.net/

回复收藏 0 原文

没有伤那来痛 2024-11-21 04:46:32

如果您知道现实世界物体的大小和相机的视角，那么假设您知道水平视角 alpha(*)，图像的水平分辨率为 xres，则到物体的距离 dw图像的中间部分在图像中为 xp 像素宽，在现实世界中为 xw 米宽，可以按如下方式推导（您的三角?):

# Distance in "pixel space" relates to dinstance in the real word 
# (we take half of xres, xw and xp because we use the half angle of view):
(xp/2)/dp = (xw/2)/dw 
dw = ((xw/2)/(xp/2))*dp = (xw/xp)*dp (1)

# we know xp and xw, we're looking for dw, so we need to calculate dp:
# we can do this because we know xres and alpha 
# (remember, tangent = oposite/adjacent):
tan(alpha) = (xres/2)/dp
dp = (xres/2)/tan(alpha) (2)

# combine (1) and (2):
dw = ((xw/xp)*(xres/2))/tan(alpha)
# pretty print:
dw = (xw*xres)/(xp*2*tan(alpha))

(*) alpha = 相机轴与穿过刚刚可见的图像中间行最左边点的线之间的角度。

链接到您的变量：
dw = D，xw = AW，xp = PW

If you know the size of the real-world object and the angle of view of the camera then assuming you know the horizontal angle of view alpha(*), the horizontal resolution of the image is xres, then the distance dw to an object in the middle of the image that is xp pixels wide in the image, and xw meters wide in the real world can be derived as follows (how is your trigonometry?):

# Distance in "pixel space" relates to dinstance in the real word 
# (we take half of xres, xw and xp because we use the half angle of view):
(xp/2)/dp = (xw/2)/dw 
dw = ((xw/2)/(xp/2))*dp = (xw/xp)*dp (1)

# we know xp and xw, we're looking for dw, so we need to calculate dp:
# we can do this because we know xres and alpha 
# (remember, tangent = oposite/adjacent):
tan(alpha) = (xres/2)/dp
dp = (xres/2)/tan(alpha) (2)

# combine (1) and (2):
dw = ((xw/xp)*(xres/2))/tan(alpha)
# pretty print:
dw = (xw*xres)/(xp*2*tan(alpha))

(*) alpha = The angle between the camera axis and a line going through the leftmost point on the middle row of the image that is just visible.

Link to your variables:
dw = D, xw = AW, xp = PW

回复收藏 0 原文

猛虎独行 2024-11-21 04:46:32

这可能不是一个完整的答案，但可能会推动您走向正确的方向。你知道美国宇航局是如何处理那些来自太空的照片的吗？他们在图像上到处都有那些微小的十字。据我所知，这就是他们如何正确了解物体的深度和大小的方法。解决方案可能是拥有一个您知道图片中正确尺寸和深度的物体，然后计算其他物体相对于该物体的尺寸和深度。是时候让你做一些研究了。如果美国宇航局就是这么做的，那么它应该值得一看。

我不得不说这是我长期以来在 stackoverflow 上看到的最有趣的问题之一：D。我刚刚注意到这个问题只附加了两个标签。添加更多与图像相关的内容可能会更好地帮助您。