透视变形矩形的比例

发布于 2024-07-29 13:40:07 字数 1167 浏览 2 评论 0原文

给定一个被透视扭曲的矩形的二维图片:

在此处输入图像描述

我知道该形状最初是一个矩形,但是我不知道它的原始尺寸。

如果我知道这张图片中角点的像素坐标,我如何计算原始比例,即矩形的商(宽度/高度)?

(背景:目标是自动使矩形文档的照片不失真,边缘检测可能会通过霍夫变换完成)

更新:

已经有一些关于是否可以根据给定信息确定宽度:高度比的讨论。 我天真的想法是这一定是可能的,因为我想不出有什么办法可以将 1:4 的矩形投影到上面描绘的四边形上。 该比率显然接近 1:1,因此应该有一种方法可以用数学方法确定它。 然而,除了我的直觉猜测之外,我没有任何证据证明这一点。

我还没有完全理解下面提出的论点,但我认为一定有一些我们在这里遗漏的隐含假设,并且有不同的解释。

然而,经过几个小时的搜索,我终于找到了一些与该问题相关的论文。 我正在努力理解其中使用的数学,到目前为止还没有成功。 特别是第一篇论文似乎准确地讨论了我想做的事情,不幸的是没有代码示例和非常密集的数学。

  • 张正友,何立伟,“白板扫描与图像增强” http://research.microsoft.com/en -us/um/people/zhang/papers/tr03-39.pdf第11页

    <块引用>

    “由于透视畸变,矩形的图像看起来是一个四边形。但是,由于我们知道它在空间中是一个矩形,因此我们能够估计相机的焦距和矩形的长宽比。 ”

  • ROBERT M. HARALICK“从矩形的透视投影确定相机参数” http://portal.acm.org/itation.cfm?id=87146

    <块引用>

    “我们展示了如何使用 3D 空间中未知大小和位置的矩形的 2D 透视投影来确定相对于矩形平面的相机视角参数。”

Given a 2d picture of a rectangle distorted by perspective:

enter image description here

I know that the shape was originally a rectangle, but I do not know its original size.

If I know the pixel coordinates of the corners in this picture, how can I calculate the original proportions, i.e. the quotient ( width / height ) of the rectangle?

(background: the goal is to automatically undistort photos of rectangular documents, edge detection will probably be done with hough transform)

UPDATE:

There has been some discussion on whether it is possible at all to determine the width:height ratio with the information given. My naive thought was that it must be possible, since I can think of no way to project for example a 1:4 rectangle onto the quadrangle depicted above. The ratio appears clearly close to 1:1, so there should be a way to determine it mathematically. I have however no proof for this beyond my intuitive guess.

I have not yet fully understood the arguments presented below, but I think there must be some implicit assumption that we are missing here and that is interpreted differently.

However, after hours of searching, I have finally found some papers relevant to the problem.
I am struggling to understand the math used in there, so far without success. Particularly the first paper seems to discuss exactly what I wanted to do, unfortunately without code examples and very dense math.

  • Zhengyou Zhang , Li-Wei He, "Whiteboard scanning and image enhancement"
    http://research.microsoft.com/en-us/um/people/zhang/papers/tr03-39.pdf p.11

    "Because of the perspective distortion, the image of a rectangle appears to be a quadrangle. However, since we know that it is a rectangle in space, we are able to estimate both the camera’s focal length and the rectangle’s aspect ratio."

  • ROBERT M. HARALICK "Determining camera parameters from the perspective projection of a rectangle"
    http://portal.acm.org/citation.cfm?id=87146

    "we show how to use the 2D perspective projection of a rectangle of unknown size and position in 3D space to determine the camera look angle parameters relative to the plans of the rectangle."

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(11

﹎☆浅夏丿初晴 2024-08-05 13:40:07

论文后尝试回答我的问题

我在 SAGE 中操作了方程一段时间,并以 C 风格提出了这个伪代码:


// in case it matters: licensed under GPLv2 or later
// legend:
// sqr(x)  = x*x
// sqrt(x) = square root of x

// let m1x,m1y ... m4x,m4y be the (x,y) pixel coordinates
// of the 4 corners of the detected quadrangle
// i.e. (m1x, m1y) are the cordinates of the first corner, 
// (m2x, m2y) of the second corner and so on.
// let u0, v0 be the pixel coordinates of the principal point of the image
// for a normal camera this will be the center of the image, 
// i.e. u0=IMAGEWIDTH/2; v0 =IMAGEHEIGHT/2
// This assumption does not hold if the image has been cropped asymmetrically

// first, transform the image so the principal point is at (0,0)
// this makes the following equations much easier
m1x = m1x - u0;
m1y = m1y - v0;
m2x = m2x - u0;
m2y = m2y - v0;
m3x = m3x - u0;
m3y = m3y - v0;
m4x = m4x - u0;
m4y = m4y - v0;


// temporary variables k2, k3
double k2 = ((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x) /
            ((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) ;

double k3 = ((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y - m1y*m4x) / 
            ((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) ;

// f_squared is the focal length of the camera, squared
// if k2==1 OR k3==1 then this equation is not solvable
// if the focal length is known, then this equation is not needed
// in that case assign f_squared= sqr(focal_length)
double f_squared = 
    -((k3*m3y - m1y)*(k2*m2y - m1y) + (k3*m3x - m1x)*(k2*m2x - m1x)) / 
                      ((k3 - 1)*(k2 - 1)) ;

//The width/height ratio of the original rectangle
double whRatio = sqrt( 
    (sqr(k2 - 1) + sqr(k2*m2y - m1y)/f_squared + sqr(k2*m2x - m1x)/f_squared) /
    (sqr(k3 - 1) + sqr(k3*m3y - m1y)/f_squared + sqr(k3*m3x - m1x)/f_squared) 
) ;

// if k2==1 AND k3==1, then the focal length equation is not solvable 
// but the focal length is not needed to calculate the ratio.
// I am still trying to figure out under which circumstances k2 and k3 become 1
// but it seems to be when the rectangle is not distorted by perspective, 
// i.e. viewed straight on. Then the equation is obvious:
if (k2==1 && k3==1) whRatio = sqrt( 
    (sqr(m2y-m1y) + sqr(m2x-m1x)) / 
    (sqr(m3y-m1y) + sqr(m3x-m1x))


// After testing, I found that the above equations 
// actually give the height/width ratio of the rectangle, 
// not the width/height ratio. 
// If someone can find the error that caused this, 
// I would be most grateful.
// until then:
whRatio = 1/whRatio;

更新:以下是这些方程的确定方法:

以下是 SAGE 中的代码。
可以在线访问 http://www.sagenb.org/home/pub/704/
(Sage 在求解方程方面确实很有用,并且可以在任何浏览器中使用,请查看)

# CALCULATING THE ASPECT RATIO OF A RECTANGLE DISTORTED BY PERSPECTIVE

#
# BIBLIOGRAPHY:
# [zhang-single]: "Single-View Geometry of A Rectangle 
#  With Application to Whiteboard Image Rectification"
#  by Zhenggyou Zhang
#  http://research.microsoft.com/users/zhang/Papers/WhiteboardRectification.pdf

# pixel coordinates of the 4 corners of the quadrangle (m1, m2, m3, m4)
# see [zhang-single] figure 1
m1x = var('m1x')
m1y = var('m1y')
m2x = var('m2x')
m2y = var('m2y')
m3x = var('m3x')
m3y = var('m3y')
m4x = var('m4x')
m4y = var('m4y')

# pixel coordinates of the principal point of the image
# for a normal camera this will be the center of the image, 
# i.e. u0=IMAGEWIDTH/2; v0 =IMAGEHEIGHT/2
# This assumption does not hold if the image has been cropped asymmetrically
u0 = var('u0')
v0 = var('v0')

# pixel aspect ratio; for a normal camera pixels are square, so s=1
s = var('s')

# homogenous coordinates of the quadrangle
m1 = vector ([m1x,m1y,1])
m2 = vector ([m2x,m2y,1])
m3 = vector ([m3x,m3y,1])
m4 = vector ([m4x,m4y,1])


# the following equations are later used in calculating the the focal length 
# and the rectangle's aspect ratio.
# temporary variables: k2, k3, n2, n3

# see [zhang-single] Equation 11, 12
k2_ = m1.cross_product(m4).dot_product(m3) / m2.cross_product(m4).dot_product(m3)
k3_ = m1.cross_product(m4).dot_product(m2) / m3.cross_product(m4).dot_product(m2)
k2 = var('k2')
k3 = var('k3')

# see [zhang-single] Equation 14,16
n2 = k2 * m2 - m1
n3 = k3 * m3 - m1


# the focal length of the camera.
f = var('f')
# see [zhang-single] Equation 21
f_ = sqrt(
         -1 / (
          n2[2]*n3[2]*s^2
         ) * (
          (
           n2[0]*n3[0] - (n2[0]*n3[2]+n2[2]*n3[0])*u0 + n2[2]*n3[2]*u0^2
          )*s^2 + (
           n2[1]*n3[1] - (n2[1]*n3[2]+n2[2]*n3[1])*v0 + n2[2]*n3[2]*v0^2
          ) 
         ) 
        )


# standard pinhole camera matrix
# see [zhang-single] Equation 1
A = matrix([[f,0,u0],[0,s*f,v0],[0,0,1]])


#the width/height ratio of the original rectangle
# see [zhang-single] Equation 20
whRatio = sqrt (
               (n2*A.transpose()^(-1) * A^(-1)*n2.transpose()) / 
               (n3*A.transpose()^(-1) * A^(-1)*n3.transpose())
              ) 

c 代码中的简化方程由下式确定:

print "simplified equations, assuming u0=0, v0=0, s=1"
print "k2 := ", k2_
print "k3 := ", k3_
print "f  := ", f_(u0=0,v0=0,s=1)
print "whRatio := ", whRatio(u0=0,v0=0,s=1)

    simplified equations, assuming u0=0, v0=0, s=1
    k2 :=  ((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)/((m2y
    - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x)
    k3 :=  ((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)/((m3y
    - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x)
    f  :=  sqrt(-((k3*m3y - m1y)*(k2*m2y - m1y) + (k3*m3x - m1x)*(k2*m2x
    - m1x))/((k3 - 1)*(k2 - 1)))
    whRatio :=  sqrt(((k2 - 1)^2 + (k2*m2y - m1y)^2/f^2 + (k2*m2x -
    m1x)^2/f^2)/((k3 - 1)^2 + (k3*m3y - m1y)^2/f^2 + (k3*m3x -
    m1x)^2/f^2))

print "Everything in one equation:"
print "whRatio := ", whRatio(f=f_)(k2=k2_,k3=k3_)(u0=0,v0=0,s=1)

    Everything in one equation:
    whRatio :=  sqrt(((((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y -
    m1y*m4x)/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) -
    1)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)/((m2y -
    m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) - 1)*(((m1y -
    m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)*m2y/((m2y - m4y)*m3x
    - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) - m1y)^2/((((m1y - m4y)*m2x -
    (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)*m3y/((m3y - m4y)*m2x - (m3x -
    m4x)*m2y + m3x*m4y - m3y*m4x) - m1y)*(((m1y - m4y)*m3x - (m1x -
    m4x)*m3y + m1x*m4y - m1y*m4x)*m2y/((m2y - m4y)*m3x - (m2x - m4x)*m3y
    + m2x*m4y - m2y*m4x) - m1y) + (((m1y - m4y)*m2x - (m1x - m4x)*m2y +
    m1x*m4y - m1y*m4x)*m3x/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y
    - m3y*m4x) - m1x)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y -
    m1y*m4x)*m2x/((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x)
    - m1x)) + (((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y -
    m1y*m4x)/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) -
    1)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)/((m2y -
    m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) - 1)*(((m1y -
    m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)*m2x/((m2y - m4y)*m3x
    - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) - m1x)^2/((((m1y - m4y)*m2x -
    (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)*m3y/((m3y - m4y)*m2x - (m3x -
    m4x)*m2y + m3x*m4y - m3y*m4x) - m1y)*(((m1y - m4y)*m3x - (m1x -
    m4x)*m3y + m1x*m4y - m1y*m4x)*m2y/((m2y - m4y)*m3x - (m2x - m4x)*m3y
    + m2x*m4y - m2y*m4x) - m1y) + (((m1y - m4y)*m2x - (m1x - m4x)*m2y +
    m1x*m4y - m1y*m4x)*m3x/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y
    - m3y*m4x) - m1x)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y -
    m1y*m4x)*m2x/((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x)
    - m1x)) - (((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y -
    m1y*m4x)/((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) -
    1)^2)/((((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y -
    m1y*m4x)/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) -
    1)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)/((m2y -
    m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) - 1)*(((m1y -
    m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)*m3y/((m3y - m4y)*m2x
    - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) - m1y)^2/((((m1y - m4y)*m2x -
    (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)*m3y/((m3y - m4y)*m2x - (m3x -
    m4x)*m2y + m3x*m4y - m3y*m4x) - m1y)*(((m1y - m4y)*m3x - (m1x -
    m4x)*m3y + m1x*m4y - m1y*m4x)*m2y/((m2y - m4y)*m3x - (m2x - m4x)*m3y
    + m2x*m4y - m2y*m4x) - m1y) + (((m1y - m4y)*m2x - (m1x - m4x)*m2y +
    m1x*m4y - m1y*m4x)*m3x/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y
    - m3y*m4x) - m1x)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y -
    m1y*m4x)*m2x/((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x)
    - m1x)) + (((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y -
    m1y*m4x)/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) -
    1)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)/((m2y -
    m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) - 1)*(((m1y -
    m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)*m3x/((m3y - m4y)*m2x
    - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) - m1x)^2/((((m1y - m4y)*m2x -
    (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)*m3y/((m3y - m4y)*m2x - (m3x -
    m4x)*m2y + m3x*m4y - m3y*m4x) - m1y)*(((m1y - m4y)*m3x - (m1x -
    m4x)*m3y + m1x*m4y - m1y*m4x)*m2y/((m2y - m4y)*m3x - (m2x - m4x)*m3y
    + m2x*m4y - m2y*m4x) - m1y) + (((m1y - m4y)*m2x - (m1x - m4x)*m2y +
    m1x*m4y - m1y*m4x)*m3x/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y
    - m3y*m4x) - m1x)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y -
    m1y*m4x)*m2x/((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x)
    - m1x)) - (((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y -
    m1y*m4x)/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) -
    1)^2))


# some testing:
# - choose a random rectangle, 
# - project it onto a random plane,
# - insert the corners in the above equations,
# - check if the aspect ratio is correct.

from sage.plot.plot3d.transform import rotate_arbitrary

#redundandly random rotation matrix
rand_rotMatrix = \
           rotate_arbitrary((uniform(-5,5),uniform(-5,5),uniform(-5,5)),uniform(-5,5)) *\
           rotate_arbitrary((uniform(-5,5),uniform(-5,5),uniform(-5,5)),uniform(-5,5)) *\
           rotate_arbitrary((uniform(-5,5),uniform(-5,5),uniform(-5,5)),uniform(-5,5))

#random translation vector
rand_transVector = vector((uniform(-10,10),uniform(-10,10),uniform(-10,10))).transpose()

#random rectangle parameters
rand_width =uniform(0.1,10)
rand_height=uniform(0.1,10)
rand_left  =uniform(-10,10)
rand_top   =uniform(-10,10)

#random focal length and principal point
rand_f  = uniform(0.1,100)
rand_u0 = uniform(-100,100)
rand_v0 = uniform(-100,100)

# homogenous standard pinhole projection, see [zhang-single] Equation 1
hom_projection = A * rand_rotMatrix.augment(rand_transVector)

# construct a random rectangle in the plane z=0, then project it randomly 
rand_m1hom = hom_projection*vector((rand_left           ,rand_top            ,0,1)).transpose()
rand_m2hom = hom_projection*vector((rand_left           ,rand_top+rand_height,0,1)).transpose()
rand_m3hom = hom_projection*vector((rand_left+rand_width,rand_top            ,0,1)).transpose()
rand_m4hom = hom_projection*vector((rand_left+rand_width,rand_top+rand_height,0,1)).transpose()

#change type from 1x3 matrix to vector
rand_m1hom = rand_m1hom.column(0)
rand_m2hom = rand_m2hom.column(0)
rand_m3hom = rand_m3hom.column(0)
rand_m4hom = rand_m4hom.column(0)

#normalize
rand_m1hom = rand_m1hom/rand_m1hom[2]
rand_m2hom = rand_m2hom/rand_m2hom[2]
rand_m3hom = rand_m3hom/rand_m3hom[2]
rand_m4hom = rand_m4hom/rand_m4hom[2]

#substitute random values for f, u0, v0
rand_m1hom = rand_m1hom(f=rand_f,s=1,u0=rand_u0,v0=rand_v0)
rand_m2hom = rand_m2hom(f=rand_f,s=1,u0=rand_u0,v0=rand_v0)
rand_m3hom = rand_m3hom(f=rand_f,s=1,u0=rand_u0,v0=rand_v0)
rand_m4hom = rand_m4hom(f=rand_f,s=1,u0=rand_u0,v0=rand_v0)

# printing the randomly choosen values
print "ground truth: f=", rand_f, "; ratio=", rand_width/rand_height

# substitute all the variables in the equations:
print "calculated: f= ",\
f_(k2=k2_,k3=k3_)(s=1,u0=rand_u0,v0=rand_v0)(
  m1x=rand_m1hom[0],m1y=rand_m1hom[1],
  m2x=rand_m2hom[0],m2y=rand_m2hom[1],
  m3x=rand_m3hom[0],m3y=rand_m3hom[1],
  m4x=rand_m4hom[0],m4y=rand_m4hom[1],
),"; 1/ratio=", \
1/whRatio(f=f_)(k2=k2_,k3=k3_)(s=1,u0=rand_u0,v0=rand_v0)(
  m1x=rand_m1hom[0],m1y=rand_m1hom[1],
  m2x=rand_m2hom[0],m2y=rand_m2hom[1],
  m3x=rand_m3hom[0],m3y=rand_m3hom[1],
  m4x=rand_m4hom[0],m4y=rand_m4hom[1],
)

print "k2 = ", k2_(
  m1x=rand_m1hom[0],m1y=rand_m1hom[1],
  m2x=rand_m2hom[0],m2y=rand_m2hom[1],
  m3x=rand_m3hom[0],m3y=rand_m3hom[1],
  m4x=rand_m4hom[0],m4y=rand_m4hom[1],
), "; k3 = ", k3_(
  m1x=rand_m1hom[0],m1y=rand_m1hom[1],
  m2x=rand_m2hom[0],m2y=rand_m2hom[1],
  m3x=rand_m3hom[0],m3y=rand_m3hom[1],
  m4x=rand_m4hom[0],m4y=rand_m4hom[1],
)

# ATTENTION: testing revealed, that the whRatio 
# is actually the height/width ratio, 
# not the width/height ratio
# This contradicts [zhang-single]
# if anyone can find the error that caused this, I'd be grateful

    ground truth: f= 72.1045134124554 ; ratio= 3.46538779959142
    calculated: f=  72.1045134125 ; 1/ratio= 3.46538779959
    k2 =  0.99114614987 ; k3 =  1.57376280159

Here is my attempt at answering my question after reading the paper

I manipulated the equations for some time in SAGE, and came up with this pseudo-code in c-style:


// in case it matters: licensed under GPLv2 or later
// legend:
// sqr(x)  = x*x
// sqrt(x) = square root of x

// let m1x,m1y ... m4x,m4y be the (x,y) pixel coordinates
// of the 4 corners of the detected quadrangle
// i.e. (m1x, m1y) are the cordinates of the first corner, 
// (m2x, m2y) of the second corner and so on.
// let u0, v0 be the pixel coordinates of the principal point of the image
// for a normal camera this will be the center of the image, 
// i.e. u0=IMAGEWIDTH/2; v0 =IMAGEHEIGHT/2
// This assumption does not hold if the image has been cropped asymmetrically

// first, transform the image so the principal point is at (0,0)
// this makes the following equations much easier
m1x = m1x - u0;
m1y = m1y - v0;
m2x = m2x - u0;
m2y = m2y - v0;
m3x = m3x - u0;
m3y = m3y - v0;
m4x = m4x - u0;
m4y = m4y - v0;


// temporary variables k2, k3
double k2 = ((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x) /
            ((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) ;

double k3 = ((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y - m1y*m4x) / 
            ((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) ;

// f_squared is the focal length of the camera, squared
// if k2==1 OR k3==1 then this equation is not solvable
// if the focal length is known, then this equation is not needed
// in that case assign f_squared= sqr(focal_length)
double f_squared = 
    -((k3*m3y - m1y)*(k2*m2y - m1y) + (k3*m3x - m1x)*(k2*m2x - m1x)) / 
                      ((k3 - 1)*(k2 - 1)) ;

//The width/height ratio of the original rectangle
double whRatio = sqrt( 
    (sqr(k2 - 1) + sqr(k2*m2y - m1y)/f_squared + sqr(k2*m2x - m1x)/f_squared) /
    (sqr(k3 - 1) + sqr(k3*m3y - m1y)/f_squared + sqr(k3*m3x - m1x)/f_squared) 
) ;

// if k2==1 AND k3==1, then the focal length equation is not solvable 
// but the focal length is not needed to calculate the ratio.
// I am still trying to figure out under which circumstances k2 and k3 become 1
// but it seems to be when the rectangle is not distorted by perspective, 
// i.e. viewed straight on. Then the equation is obvious:
if (k2==1 && k3==1) whRatio = sqrt( 
    (sqr(m2y-m1y) + sqr(m2x-m1x)) / 
    (sqr(m3y-m1y) + sqr(m3x-m1x))


// After testing, I found that the above equations 
// actually give the height/width ratio of the rectangle, 
// not the width/height ratio. 
// If someone can find the error that caused this, 
// I would be most grateful.
// until then:
whRatio = 1/whRatio;

Update: here is how these equations were determined:

The following is code in SAGE.
It can be accessed online at http://www.sagenb.org/home/pub/704/.
(Sage is really useful in solving equations, and useable in any browser, check it out)

# CALCULATING THE ASPECT RATIO OF A RECTANGLE DISTORTED BY PERSPECTIVE

#
# BIBLIOGRAPHY:
# [zhang-single]: "Single-View Geometry of A Rectangle 
#  With Application to Whiteboard Image Rectification"
#  by Zhenggyou Zhang
#  http://research.microsoft.com/users/zhang/Papers/WhiteboardRectification.pdf

# pixel coordinates of the 4 corners of the quadrangle (m1, m2, m3, m4)
# see [zhang-single] figure 1
m1x = var('m1x')
m1y = var('m1y')
m2x = var('m2x')
m2y = var('m2y')
m3x = var('m3x')
m3y = var('m3y')
m4x = var('m4x')
m4y = var('m4y')

# pixel coordinates of the principal point of the image
# for a normal camera this will be the center of the image, 
# i.e. u0=IMAGEWIDTH/2; v0 =IMAGEHEIGHT/2
# This assumption does not hold if the image has been cropped asymmetrically
u0 = var('u0')
v0 = var('v0')

# pixel aspect ratio; for a normal camera pixels are square, so s=1
s = var('s')

# homogenous coordinates of the quadrangle
m1 = vector ([m1x,m1y,1])
m2 = vector ([m2x,m2y,1])
m3 = vector ([m3x,m3y,1])
m4 = vector ([m4x,m4y,1])


# the following equations are later used in calculating the the focal length 
# and the rectangle's aspect ratio.
# temporary variables: k2, k3, n2, n3

# see [zhang-single] Equation 11, 12
k2_ = m1.cross_product(m4).dot_product(m3) / m2.cross_product(m4).dot_product(m3)
k3_ = m1.cross_product(m4).dot_product(m2) / m3.cross_product(m4).dot_product(m2)
k2 = var('k2')
k3 = var('k3')

# see [zhang-single] Equation 14,16
n2 = k2 * m2 - m1
n3 = k3 * m3 - m1


# the focal length of the camera.
f = var('f')
# see [zhang-single] Equation 21
f_ = sqrt(
         -1 / (
          n2[2]*n3[2]*s^2
         ) * (
          (
           n2[0]*n3[0] - (n2[0]*n3[2]+n2[2]*n3[0])*u0 + n2[2]*n3[2]*u0^2
          )*s^2 + (
           n2[1]*n3[1] - (n2[1]*n3[2]+n2[2]*n3[1])*v0 + n2[2]*n3[2]*v0^2
          ) 
         ) 
        )


# standard pinhole camera matrix
# see [zhang-single] Equation 1
A = matrix([[f,0,u0],[0,s*f,v0],[0,0,1]])


#the width/height ratio of the original rectangle
# see [zhang-single] Equation 20
whRatio = sqrt (
               (n2*A.transpose()^(-1) * A^(-1)*n2.transpose()) / 
               (n3*A.transpose()^(-1) * A^(-1)*n3.transpose())
              ) 

The simplified equations in the c-code where determined by

print "simplified equations, assuming u0=0, v0=0, s=1"
print "k2 := ", k2_
print "k3 := ", k3_
print "f  := ", f_(u0=0,v0=0,s=1)
print "whRatio := ", whRatio(u0=0,v0=0,s=1)

    simplified equations, assuming u0=0, v0=0, s=1
    k2 :=  ((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)/((m2y
    - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x)
    k3 :=  ((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)/((m3y
    - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x)
    f  :=  sqrt(-((k3*m3y - m1y)*(k2*m2y - m1y) + (k3*m3x - m1x)*(k2*m2x
    - m1x))/((k3 - 1)*(k2 - 1)))
    whRatio :=  sqrt(((k2 - 1)^2 + (k2*m2y - m1y)^2/f^2 + (k2*m2x -
    m1x)^2/f^2)/((k3 - 1)^2 + (k3*m3y - m1y)^2/f^2 + (k3*m3x -
    m1x)^2/f^2))

print "Everything in one equation:"
print "whRatio := ", whRatio(f=f_)(k2=k2_,k3=k3_)(u0=0,v0=0,s=1)

    Everything in one equation:
    whRatio :=  sqrt(((((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y -
    m1y*m4x)/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) -
    1)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)/((m2y -
    m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) - 1)*(((m1y -
    m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)*m2y/((m2y - m4y)*m3x
    - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) - m1y)^2/((((m1y - m4y)*m2x -
    (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)*m3y/((m3y - m4y)*m2x - (m3x -
    m4x)*m2y + m3x*m4y - m3y*m4x) - m1y)*(((m1y - m4y)*m3x - (m1x -
    m4x)*m3y + m1x*m4y - m1y*m4x)*m2y/((m2y - m4y)*m3x - (m2x - m4x)*m3y
    + m2x*m4y - m2y*m4x) - m1y) + (((m1y - m4y)*m2x - (m1x - m4x)*m2y +
    m1x*m4y - m1y*m4x)*m3x/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y
    - m3y*m4x) - m1x)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y -
    m1y*m4x)*m2x/((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x)
    - m1x)) + (((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y -
    m1y*m4x)/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) -
    1)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)/((m2y -
    m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) - 1)*(((m1y -
    m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)*m2x/((m2y - m4y)*m3x
    - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) - m1x)^2/((((m1y - m4y)*m2x -
    (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)*m3y/((m3y - m4y)*m2x - (m3x -
    m4x)*m2y + m3x*m4y - m3y*m4x) - m1y)*(((m1y - m4y)*m3x - (m1x -
    m4x)*m3y + m1x*m4y - m1y*m4x)*m2y/((m2y - m4y)*m3x - (m2x - m4x)*m3y
    + m2x*m4y - m2y*m4x) - m1y) + (((m1y - m4y)*m2x - (m1x - m4x)*m2y +
    m1x*m4y - m1y*m4x)*m3x/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y
    - m3y*m4x) - m1x)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y -
    m1y*m4x)*m2x/((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x)
    - m1x)) - (((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y -
    m1y*m4x)/((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) -
    1)^2)/((((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y -
    m1y*m4x)/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) -
    1)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)/((m2y -
    m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) - 1)*(((m1y -
    m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)*m3y/((m3y - m4y)*m2x
    - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) - m1y)^2/((((m1y - m4y)*m2x -
    (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)*m3y/((m3y - m4y)*m2x - (m3x -
    m4x)*m2y + m3x*m4y - m3y*m4x) - m1y)*(((m1y - m4y)*m3x - (m1x -
    m4x)*m3y + m1x*m4y - m1y*m4x)*m2y/((m2y - m4y)*m3x - (m2x - m4x)*m3y
    + m2x*m4y - m2y*m4x) - m1y) + (((m1y - m4y)*m2x - (m1x - m4x)*m2y +
    m1x*m4y - m1y*m4x)*m3x/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y
    - m3y*m4x) - m1x)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y -
    m1y*m4x)*m2x/((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x)
    - m1x)) + (((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y -
    m1y*m4x)/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) -
    1)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)/((m2y -
    m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) - 1)*(((m1y -
    m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)*m3x/((m3y - m4y)*m2x
    - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) - m1x)^2/((((m1y - m4y)*m2x -
    (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)*m3y/((m3y - m4y)*m2x - (m3x -
    m4x)*m2y + m3x*m4y - m3y*m4x) - m1y)*(((m1y - m4y)*m3x - (m1x -
    m4x)*m3y + m1x*m4y - m1y*m4x)*m2y/((m2y - m4y)*m3x - (m2x - m4x)*m3y
    + m2x*m4y - m2y*m4x) - m1y) + (((m1y - m4y)*m2x - (m1x - m4x)*m2y +
    m1x*m4y - m1y*m4x)*m3x/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y
    - m3y*m4x) - m1x)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y -
    m1y*m4x)*m2x/((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x)
    - m1x)) - (((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y -
    m1y*m4x)/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) -
    1)^2))


# some testing:
# - choose a random rectangle, 
# - project it onto a random plane,
# - insert the corners in the above equations,
# - check if the aspect ratio is correct.

from sage.plot.plot3d.transform import rotate_arbitrary

#redundandly random rotation matrix
rand_rotMatrix = \
           rotate_arbitrary((uniform(-5,5),uniform(-5,5),uniform(-5,5)),uniform(-5,5)) *\
           rotate_arbitrary((uniform(-5,5),uniform(-5,5),uniform(-5,5)),uniform(-5,5)) *\
           rotate_arbitrary((uniform(-5,5),uniform(-5,5),uniform(-5,5)),uniform(-5,5))

#random translation vector
rand_transVector = vector((uniform(-10,10),uniform(-10,10),uniform(-10,10))).transpose()

#random rectangle parameters
rand_width =uniform(0.1,10)
rand_height=uniform(0.1,10)
rand_left  =uniform(-10,10)
rand_top   =uniform(-10,10)

#random focal length and principal point
rand_f  = uniform(0.1,100)
rand_u0 = uniform(-100,100)
rand_v0 = uniform(-100,100)

# homogenous standard pinhole projection, see [zhang-single] Equation 1
hom_projection = A * rand_rotMatrix.augment(rand_transVector)

# construct a random rectangle in the plane z=0, then project it randomly 
rand_m1hom = hom_projection*vector((rand_left           ,rand_top            ,0,1)).transpose()
rand_m2hom = hom_projection*vector((rand_left           ,rand_top+rand_height,0,1)).transpose()
rand_m3hom = hom_projection*vector((rand_left+rand_width,rand_top            ,0,1)).transpose()
rand_m4hom = hom_projection*vector((rand_left+rand_width,rand_top+rand_height,0,1)).transpose()

#change type from 1x3 matrix to vector
rand_m1hom = rand_m1hom.column(0)
rand_m2hom = rand_m2hom.column(0)
rand_m3hom = rand_m3hom.column(0)
rand_m4hom = rand_m4hom.column(0)

#normalize
rand_m1hom = rand_m1hom/rand_m1hom[2]
rand_m2hom = rand_m2hom/rand_m2hom[2]
rand_m3hom = rand_m3hom/rand_m3hom[2]
rand_m4hom = rand_m4hom/rand_m4hom[2]

#substitute random values for f, u0, v0
rand_m1hom = rand_m1hom(f=rand_f,s=1,u0=rand_u0,v0=rand_v0)
rand_m2hom = rand_m2hom(f=rand_f,s=1,u0=rand_u0,v0=rand_v0)
rand_m3hom = rand_m3hom(f=rand_f,s=1,u0=rand_u0,v0=rand_v0)
rand_m4hom = rand_m4hom(f=rand_f,s=1,u0=rand_u0,v0=rand_v0)

# printing the randomly choosen values
print "ground truth: f=", rand_f, "; ratio=", rand_width/rand_height

# substitute all the variables in the equations:
print "calculated: f= ",\
f_(k2=k2_,k3=k3_)(s=1,u0=rand_u0,v0=rand_v0)(
  m1x=rand_m1hom[0],m1y=rand_m1hom[1],
  m2x=rand_m2hom[0],m2y=rand_m2hom[1],
  m3x=rand_m3hom[0],m3y=rand_m3hom[1],
  m4x=rand_m4hom[0],m4y=rand_m4hom[1],
),"; 1/ratio=", \
1/whRatio(f=f_)(k2=k2_,k3=k3_)(s=1,u0=rand_u0,v0=rand_v0)(
  m1x=rand_m1hom[0],m1y=rand_m1hom[1],
  m2x=rand_m2hom[0],m2y=rand_m2hom[1],
  m3x=rand_m3hom[0],m3y=rand_m3hom[1],
  m4x=rand_m4hom[0],m4y=rand_m4hom[1],
)

print "k2 = ", k2_(
  m1x=rand_m1hom[0],m1y=rand_m1hom[1],
  m2x=rand_m2hom[0],m2y=rand_m2hom[1],
  m3x=rand_m3hom[0],m3y=rand_m3hom[1],
  m4x=rand_m4hom[0],m4y=rand_m4hom[1],
), "; k3 = ", k3_(
  m1x=rand_m1hom[0],m1y=rand_m1hom[1],
  m2x=rand_m2hom[0],m2y=rand_m2hom[1],
  m3x=rand_m3hom[0],m3y=rand_m3hom[1],
  m4x=rand_m4hom[0],m4y=rand_m4hom[1],
)

# ATTENTION: testing revealed, that the whRatio 
# is actually the height/width ratio, 
# not the width/height ratio
# This contradicts [zhang-single]
# if anyone can find the error that caused this, I'd be grateful

    ground truth: f= 72.1045134124554 ; ratio= 3.46538779959142
    calculated: f=  72.1045134125 ; 1/ratio= 3.46538779959
    k2 =  0.99114614987 ; k3 =  1.57376280159
平安喜乐 2024-08-05 13:40:07

更新

阅读您的更新并查看第一个参考(白板扫描和图像增强)后,我明白了缺失点在哪里。

问题的输入数据是四元组 (A,B,C,D),AND 投影图像的中心 O。 文中,对应假设u0=v0=0。 添加这一点,问题就变得足够约束以获得矩形的纵横比。

然后将问题重述如下:给定 Z=0 平面中的四元组 (A,B,C,D),找到眼睛位置 E(0,0,h),h>0 和 3D 平面 P,使得(A,B,C,D) 在 P 上的投影是一个矩形。

请注意,P 由 E 决定:要获得平行四边形,P 必须包含 (EU) 和 (EV) 的平行线,其中 U=(AB)x(CD) 且 V=(AD)x(BC)。

从实验上看,这个问题通常有一个唯一的解决方案,对应于矩形的宽/高比的唯一值。

替代文本
alt text

上一篇文章

不,您无法通过投影确定矩形比例。

一般情况下,Z=0 平面的四个不共线点的四元组 (A,B,C,D) 是无限多个矩形的投影,具有无限多个宽高比。

考虑两个消失点 U,(AB)和(CD)的交点,V,(AD)和(BC)的交点,以及点 I,两条对角线(AC)和(BD)的交点。 要投影为 ABCD,中心 I 的平行四边形必须位于包含通过点 I 与 (UV) 平行的直线的平面上。在这样的平面上,您可以找到许多投影到 ABCD 的矩形,它们都具有不同的宽/高比。

请参阅使用 Cabri 3D 完成的这两张图像。 在这两种情况下,ABCD 保持不变(在灰色 Z=0 平面上),包含矩形的蓝色平面也没有改变。 部分隐藏的绿线是 (UV) 线,可见的绿线与其平行并包含 I。

alt text
替代文本

Update

After reading your update, and looking at the first reference (Whiteboard scanning and image enhancement), I see where the missing point is.

The input data of the problem is a quadruple (A,B,C,D), AND the center O of the projected image. In the article, it corresponds to the assumption u0=v0=0. Adding this point, the problem becomes constrained enough to get the aspect ratio of the rectangle.

The problem is then restated as follows: Given a quadruple (A,B,C,D) in the Z=0 plane, find the eye position E(0,0,h), h>0 and a 3D plane P such that the projection of (A,B,C,D) on P is a rectangle.

Note that P is determined by E: to get a parallelogram, P must contain parallels to (EU) and (EV), where U=(AB)x(CD) and V=(AD)x(BC).

Experimentally, it seems that this problem has in general one unique solution, corresponding to a unique value of the w/h ratio of the rectangle.

alt text
alt text

Previous Post

No, you can't determine the rectangle ratio from the projection.

In the general case, a quadruple (A,B,C,D) of four non collinear points of the Z=0 plane is the projection of infinitely many rectangles, with infinitely many width/height ratios.

Consider the two vanishing points U, intersection of (AB) and (CD) and V, intersection of (AD) and (BC), and the point I, intersection of the two diagonals (AC) and (BD). To project as ABCD, a parallelogram of center I must lie on a plane containing the line parallel to (UV) through point I. On one such plane, you can find many rectangles projecting to ABCD, all with a different w/h ratio.

See these two images done with Cabri 3D. In the two cases ABCD is unchanged (on the gray Z=0 plane), and the blue plane containing the rectangle is not changed either. The green line partially hidden is the (UV) line and the visible green line is parallel to it and contains I.

alt text
alt text

谈场末日恋爱 2024-08-05 13:40:07

尺寸并不是真正需要的,比例也不是真正需要的。 考虑到他使用的是文件的照片/扫描件,知道哪一面朝上是无关紧要的。 我怀疑他会扫描它们的背面。

“角交”是矫正透视的方法。 这可能会有所帮助:

如何绘制透视正确的网格二维

Size isnt really needed, and neither are proportions. And knowing which side is up is kind of irrelevant considering he's using photos/scans of documents. I doubt hes going to scan the back sides of them.

"Corner intersection" is the method to correct perspective. This might be of help:

How to draw a Perspective-Correct Grid in 2D

贱贱哒 2024-08-05 13:40:07

关于为什么结果给出 h/w 而不是 w/h 的问题:
我想知道上面等式20的表达式是否正确。
发布内容是:

       whRatio = sqrt (
            (n2*A.transpose()^(-1) * A^(-1)*n2.transpose()) / 
            (n3*A.transpose()^(-1) * A^(-1)*n3.transpose())
           ) 

当我尝试使用 OpenCV 执行该操作时,出现异常。 但当我使用以下方程时,一切正常,在我看来,它更像方程 20:
但根据公式 20,看起来应该是:

        whRatio = sqrt (
            (n2.transpose()*A.transpose()^(-1) * A^(-1)*n2) /
            (n3.transpose()*A.transpose()^(-1) * A^(-1)*n3)
           )

On the question of why the results give h/w rather then w/h:
I'm wondering if the expression of Equation 20 above is correct.
Posted is:

       whRatio = sqrt (
            (n2*A.transpose()^(-1) * A^(-1)*n2.transpose()) / 
            (n3*A.transpose()^(-1) * A^(-1)*n3.transpose())
           ) 

When I try to execute that with OpenCV, I get an exception. But everything works correctly when I use the following equation which to me looks more like Equation 20:
But based on Equation 20, it looks like it should be:

        whRatio = sqrt (
            (n2.transpose()*A.transpose()^(-1) * A^(-1)*n2) /
            (n3.transpose()*A.transpose()^(-1) * A^(-1)*n3)
           )
柠檬 2024-08-05 13:40:07

您可以通过此答案确定宽度/高度 计算矩形 3D 坐标及其坐标影子?。 假设您的矩形在交叉对角点上旋转,计算它的宽度和高度。 但是当你改变假设阴影平面与真实阴影平面之间的距离时,矩形的比例与计算的宽度/高度相同!

You can determine width / height by this answer Calculating rectangle 3D coordinate with coordinate its shadow?. Assume that your rectangle rotate on intersection diagonal point calculate it width and height. But when you change distance between the assumption shadow plane to the real shadow plane proportional of rectangle is the same with calculated width / height!

战皆罪 2024-08-05 13:40:07

如果不知道“相机”的距离,就不可能知道这个矩形的宽度。

从 5 厘米距离看去的小矩形与从几米远看去的大矩形看起来是一样的

it is impossible to know the width of this rectangle without knowing the distance of the 'camera'.

a small rectangle viewed from 5 centimeters distance looks the same as a huge rectangle as seen from meters away

撩动你心 2024-08-05 13:40:07

用这两个消失点和地平线下方的第三个点(即与矩形位于地平线的同一侧)绘制一个直角等腰三角形。 第三个点将是我们的原点,到消失点的两条线将是我们的轴。 将从原点到消失点的距离称为 pi/2。 现在将矩形的边从消失点延伸到轴,并标记它们与轴相交的位置。 选取一个轴,测量从两个标记到原点的距离,转换这些距离:x->tan(x),差值将是该边的“真实”长度。 对另一个轴执行相同的操作。 计算出这两个长度的比例,就完成了。

Draw a right isosceles triangle with those two vanishing points and a third point below the horizon (that is, on the same side of the horizon as the rectangle is). That third point will be our origin and the two lines to the vanishing points will be our axes. Call the distance from the origin to a vanishing point pi/2. Now extend the sides of the rectangle from the vanishing points to the axes, and mark where they intersect the axes. Pick an axis, measure the distances from the two marks to the origin, transform those distances: x->tan(x), and the difference will be the "true" length of that side. Do the same for the other axis. Take the ratio of those two lengths and you're done.

泪是无色的血 2024-08-05 13:40:07

Dropbox 在其技术博客上发表了一篇内容广泛的文章,其中介绍了如何解决扫描仪应用程序的问题。

https://blogs.dropbox.com/tech /2016/08/快速文档整改和增强/

纠正文档

我们假设输入文档在物理世界中是矩形,但如果它不完全面向相机,则图像中生成的角将是一般的凸四边形。 因此,为了满足我们的第一个目标,我们必须撤消捕获过程应用的几何变换。 除了相机焦距(内在参数)之外,这种转换还取决于相机相对于文档的视点(这些是所谓的外在参数)。 这是捕获场景的图表:

为了撤消几何变换,我们必须首先确定所述参数。 如果我们假设一个非常对称的相机(没有像散、没有倾斜等),则该模型中的未知数是:

  • 相机相对于文档的 3D 位置(3 个自由度),
  • 相机相对于文档的 3D 方向(3 个自由度),
  • 文档的尺寸(2 个自由度),以及
  • 相机的焦距(1 个自由度)。

另一方面,四个检测到的文档角的 x 和 y 坐标实际上为我们提供了八个约束。 虽然未知数 (9) 似乎比约束条件 (8) 更多,但未知数并非完全自由变量 - 人们可以想象物理缩放文档并将其放置在距相机更远的位置,以获得相同的照片。 这种关系施加了额外的约束,因此我们需要解决一个完全约束的系统。 (我们求解的实际方程组涉及一些其他考虑因素;相关的维基百科文章给出了很好的总结:https ://en.wikipedia.org/wiki/Camera_resectioning)

一旦恢复了参数,我们就可以撤消捕获过程应用的几何变换,以获得漂亮的矩形图像。 然而,这可能是一个耗时的过程:对于每个输出像素,需要查找源图像中相应输入像素的值。 当然,GPU 是专门为此类任务而设计的:在虚拟空间中渲染纹理。 存在一种视图变换——它恰好是我们刚刚解决的相机变换的逆!——用它可以渲染完整的输入图像并获得校正后的文档。 (看到这一点的一个简单方法是注意,一旦您的手机屏幕上显示了完整的输入图像,您可以倾斜并平移手机,以使屏幕上文档区域的投影对您来说呈直线。)< /p>

最后,请记住,比例方面存在歧义:我们无法判断文档是信纸大小的纸张 (8.5” x 11”) 还是海报板 (17” x 22”),对于实例。 输出图像的尺寸应该是多少? 为了解决这种歧义,我们计算输入图像中四边形内的像素数,并将输出分辨率设置为与该像素数匹配。 我们的想法是,我们不想对图像进行过多的上采样或下采样。<​​/p>

Dropbox has an extensive article on their tech blog where they describe how they solved the problem for their scanner app.

https://blogs.dropbox.com/tech/2016/08/fast-document-rectification-and-enhancement/

Rectifying a Document

We assume that the input document is rectangular in the physical world, but if it is not exactly facing the camera, the resulting corners in the image will be a general convex quadrilateral. So to satisfy our first goal, we must undo the geometric transform applied by the capture process. This transformation depends on the viewpoint of the camera relative to the document (these are the so-called extrinsic parameters), in addition to things like the focal length of the camera (the intrinsic parameters). Here’s a diagram of the capture scenario:

In order to undo the geometric transform, we must first determine the said parameters. If we assume a nicely symmetric camera (no astigmatism, no skew, et cetera), the unknowns in this model are:

  • the 3D location of the camera relative to the document (3 degrees of freedom),
  • the 3D orientation of the camera relative to the document (3 degrees of freedom),
  • the dimensions of the document (2 degrees of freedom), and
  • the focal length of the camera (1 degree of freedom).

On the flip side, the x- and y-coordinates of the four detected document corners gives us effectively eight constraints. While there are seemingly more unknowns (9) than constraints (8), the unknowns are not entirely free variables—one could imagine scaling the document physically and placing it further from the camera, to obtain an identical photo. This relation places an additional constraint, so we have a fully constrained system to be solved. (The actual system of equations we solve involves a few other considerations; the relevant Wikipedia article gives a good summary: https://en.wikipedia.org/wiki/Camera_resectioning)

Once the parameters have been recovered, we can undo the geometric transform applied by the capture process to obtain a nice rectangular image. However, this is potentially a time-consuming process: one would look up, for each output pixel, the value of the corresponding input pixel in the source image. Of course, GPUs are specifically designed for tasks like this: rendering a texture in a virtual space. There exists a view transform—which happens to be the inverse of the camera transform we just solved for!—with which one can render the full input image and obtain the rectified document. (An easy way to see this is to note that once you have the full input image on the screen of your phone, you can tilt and translate the phone such that the projection of the document region on the screen appears rectilinear to you.)

Lastly, recall that there was an ambiguity with respect to scale: we can’t tell whether the document was a letter-sized paper (8.5” x 11”) or a poster board (17” x 22”), for instance. What should the dimensions of the output image be? To resolve this ambiguity, we count the number of pixels within the quadrilateral in the input image, and set the output resolution as to match this pixel count. The idea is that we don’t want to upsample or downsample the image too much.

青春如此纠结 2024-08-05 13:40:07

对于这个有趣的问题似乎仍然存在一些困惑。 我想对问题何时可以解决、何时不能解决给出一个易于理解的解释。

约束和自由度

通常,当我们遇到这样的问题时,首先要做的是评估未知自由度 (DoF) N 的数量,以及独立方程 M 的数量我们有约束未知自由度的能力。 如果 N if 超过 M(意味着约束比未知数少),则无法解决问题。 在这种情况下,我们可以排除所有无法解决的问题。 如果 N 不超过 M,那么可能可以通过唯一的解决方案来解决问题,但这并不能保证(请参阅倒数第二段的示例)。

我们用p1、p2、p3和p4来表示4个角的位置世界坐标中的平面。 让我们使用 Rt 作为 3D 旋转和平移,将它们转换为相机坐标。 让我们使用 K 来表示 3x3 相机本征矩阵。 我们暂时忽略镜头畸变。 相机图像中第 i 个角点的 2D 位置由以下公式给出:qi=f(K(Rp >i+t)) 其中 f 是投影函数 f(x,y,z)=(x/z,y/z)。 使用这个方程,我们知道图像中的每个角都为我们提供了两个关于未知数的方程(即两个约束):一个来自qi 的 x 分量,另一个来自 y 分量。 所以我们总共有 8 个约束需要处理。 这些约束的正式名称是重投影约束。

那么我们未知的自由度是什么呢? 当然Rt是未知的,因为我们不知道相机在世界坐标中的位姿。 因此,我们已经有 6 个未知的 DoF:3 个用于 R(例如偏航、俯仰和滚动),3 个用于 t。 因此,剩余项中最多可能存在两个个未知数(Kp1、p2、 p3、p4)。

不同的问题

我们可以根据 (K, p1, p2, p3、p4)我们将视为未知。 此时,让我们以通常的形式写出 KK=(fx, 0, cx; 0, fy, cy; 0,0,1) 其中 fx 和fy 是焦距项(fx/fy 通常称为图像纵横比),(cx,cy) 是主点(图像中的投影中心)。

我们可以通过将 fx 和 fy 作为两个未知数来解决一个问题,并假设 (cx, cy, p1, p2, p 3、p4) 都是已知的。 事实上,这个问题在 OpenCV 的相机校准方法中得到了使用和解决,使用棋盘平面目标的图像。 通过假设主点位于图像中心(对于大多数相机来说这是一个非常合理的假设),这用于获得 fx 和 fy 的初始估计。

或者,我们可以通过假设 fx=fy 来创建一个不同的问题,这对于许多相机来说也是相当合理的,并假设这个焦距(表示为 f)是 K 中唯一未知的。强>。 因此,我们还剩下一个未知数可以处理(回想一下,我们最多可以有两个未知数)。 因此,让我们假设我们知道平面的形状:作为矩形(这是问题中的原始假设)。 因此我们可以如下定义角点:p1=(0,0,0), p2=(0,w,0), p< /strong>3=(h,0,0) 和 p4=(h,w,0),其中 h 和 w 表示矩形的高度和宽度。 现在,因为我们只剩下 1 个未知数,所以让我们将其设置为平面的纵横比:x=w/h。 现在的问题是我们能否同时从 8 个重投影约束中恢复 x、f、R 和 t? 事实证明答案是肯定的! 问题中引用的张的论文给出了解决方案。

尺度模糊性

人们可能想知道是否可以解决另一个问题:如果我们假设 K 已知并且两个未知数是 h 和 w。 它们可以通过重投影方程求解吗? 答案是否定的,因为飞机的大小和飞机到相机的深度之间存在模糊性。 具体来说,如果我们将角 pi 缩放 s 并将 t 缩放 s,则 s 在重投影方程中会抵消。 因此平面的绝对比例是不可恢复的。

未知 DoF 的不同组合可能存在其他问题,例如将 Rt、主要点分量之一和平面宽度作为未知数。但是需要考虑哪些情况是实际有用的,但是我还没有看到一套针对所有有用组合的系统解决方案

更多点

我们可能会认为如果我们要添加额外的点对应关系!在平面和图像之间,或者利用平面的边缘,我们可以恢复超过 8 个未知的自由度,遗憾的是答案是否定的,这是因为它们没有添加任何额外的独立约束。 完整描述从平面到图像的变换。这可以通过使用四个角拟合单应性矩阵来看出,然后可以确定图像中平面上所有其他点的位置。

There seems to still be some confusion on this interesting problem. I want to give an easy-to-follow explanation for when the problem can and cannot be solved.

Constraints and Degrees of Freedom

Typically when we are faced with a problem like this the first thing to do is to assess the number of unknown Degrees of Freedom (DoFs) N, and the number of independent equations M that we have for constraining the unknown DoFs. It is impossible to solve the problem if N if exceeds M (meaning there are fewer constraints than unknowns). We can rule out all problems where this is the case as being unsolvable. If N does not exceed M then it may be possible to solve the problem with a unique solution, but this is not guaranteed (see the second to last paragraph for an example).

Let's use p1, p2, p3 and p4 to denote the positions of the 4 corners of the planar surface in world coordinates. Let's use R and t to be the 3D rotation and translation that transforms these to camera coordinates. Let's use K to denote the 3x3  camera intrinsic matrix. We will ignore lens distortion for now. The 2D position of the ith corner in the camera's image is given by qi=f(K(Rpi+t)) where f is the projection function f(x,y,z)=(x/z,y/z). Using this equation we know that each corner in the image gives us two equations (i.e. two constraints) on our unknowns: one from the x component of qi and one from the y component. So we have a total of 8 constraints to work with. The official name of these constraints are the reprojection constraints.

So what are our unknown DoFs? Certainly R and t are unknown, because we do not know the camera's pose in world coordinates. Therefore we have already 6 unknown DoFs: 3 for R (e.g. yaw, pitch and roll) and 3 for t.  Therefore there can be a maximal of two unknowns in the remaining terms (K, p1, p2, p3, p4). 

Different problems

We can construct different problems depending on which two terms in (K, p1, p2, p3, p4) we shall consider as unknown. At this point let's write out K in the usual form: K=(fx, 0, cx; 0, fy, cy; 0,0,1) where fx and fy are the focal length terms (fx/fy is normally called the image aspect ratio) and (cx,cy) is the principal point (the centre of projection in the image).

We could obtain one problem by having fx and fy as our two unknowns, and assume (cx, cy, p1, p2, p3, p4) are all known. Indeed this very problem is used and solved within OpenCV's camera calibration method, using images of a checkerboard planar target. This is used to get an initial estimate for fx and fy, by assuming that the principal point is at the image centre (which is a very reasonable assumption for most cameras).

Alternatively we can create a different problem by assuming fx=fy, which again is quite reasonable for many cameras, and assume this focal length (denoted as f) is the only unknown in K. Therefore we still have one unknowns left to play with (recall we can have a maximum of two unknowns). So let's use this by supposing we known the shape of the plane: as a rectangle (which was the original assumption in the question). Therefore we can define the corners as follows: p1=(0,0,0), p2=(0,w,0), p3=(h,0,0) and p4=(h,w,0), where h and w denotes the height and width of the rectangle. Now, because we only have 1 unknown left, let us set this as the plane's aspect ratio: x=w/h. Now the question is can we simultaneously recover x, f, R and t from the 8 reprojection constraints? The answer it turns out is yes! And the solution is given in Zhang's paper cited in the question.

The scale ambiguity

One might wonder if another problem can be solved: if we assume K is known and the 2 unknowns are h and w. Can they be solved from the reprojection equations? The answer is no, and is because there is an ambiguity between the size of the plane and the plane's depth to the camera. Specifically if we scale the corners pi by s and scale t by s, then s cancels in the reprojection equations. Therefore the absolute scale of the plane is not recoverable.

There may be other problems with different combinations for the unknown DoFs, for example having R, t, one of the principal point components and the plane"s width as unknowns. However one needs to think of which cases are of practical use. Nevertheless I haven't yet seen a systematic set of solutions for all useful combinations!

More points

We might think that if we were to add extra point correspondences between the plane and the image, or exploit the edges of the plane, we could recover more than 8 unknown DoFs. Sadly the answer is no. This is because they don't add any extra independent constraints. The reason is because the 4 corners describe completely the transform from the plane to the image. This can be seen by fitting a homography matrix using the four corners, which can then determine the positions of all other points on the plane in the image.

风吹过旳痕迹 2024-08-05 13:40:07

以下是 https 的 Python 实现: //www.microsoft.com/en-us/research/publication/2016/11/Digital-Signal-Processing.pdf

import numpy as np


def find_rectangle_proportion(K, m1, m2, m3, m4):
    # Detailed computations
    # https://www.microsoft.com/en-us/research/publication/2016/11/Digital-Signal-Processing.pdf

    # m1 = top_left, m2 = top_right, m3 = bottom_left, m4 = bottom_right

    # Make homeneous coordinates
    m1 = np.array([m1[0], m1[1], 1])
    m2 = np.array([m2[0], m2[1], 1])
    m3 = np.array([m3[0], m3[1], 1])
    m4 = np.array([m4[0], m4[1], 1])

    # (11)
    k2 = np.dot(np.cross(m1, m4), m3) / np.dot(np.cross(m2, m4), m3)
    # (12)
    k3 = np.dot(np.cross(m1, m4), m2) / np.dot(np.cross(m3, m4), m2)

    # (14)
    n2 = k2 * m2 - m1
    # (16)
    n3 = k3 * m3 - m1

    inv_K = np.linalg.inv(K)
    KK = np.dot(inv_K.T, inv_K)

    # ratio width/height (20)
    ratio = np.sqrt(np.dot(n2, np.dot(KK, n2)) / np.dot(n3, np.dot(KK, n3)))

    return ratio


if __name__ == "__main__":
    top_left = [269, 25]
    top_right = [800, 19]
    bottom_right = [805, 748]
    bottom_left = [273, 750]

    # Load intrinsic matrices from calib.py
    K = np.load("calibration_pixel_6a/intrinsic.npy")

    ratio = find_rectangle_proportion(K, top_left, top_right, bottom_left, bottom_right)
    print(f"ratio width/height = {ratio}")

Here is a Python implementation of https://www.microsoft.com/en-us/research/publication/2016/11/Digital-Signal-Processing.pdf

import numpy as np


def find_rectangle_proportion(K, m1, m2, m3, m4):
    # Detailed computations
    # https://www.microsoft.com/en-us/research/publication/2016/11/Digital-Signal-Processing.pdf

    # m1 = top_left, m2 = top_right, m3 = bottom_left, m4 = bottom_right

    # Make homeneous coordinates
    m1 = np.array([m1[0], m1[1], 1])
    m2 = np.array([m2[0], m2[1], 1])
    m3 = np.array([m3[0], m3[1], 1])
    m4 = np.array([m4[0], m4[1], 1])

    # (11)
    k2 = np.dot(np.cross(m1, m4), m3) / np.dot(np.cross(m2, m4), m3)
    # (12)
    k3 = np.dot(np.cross(m1, m4), m2) / np.dot(np.cross(m3, m4), m2)

    # (14)
    n2 = k2 * m2 - m1
    # (16)
    n3 = k3 * m3 - m1

    inv_K = np.linalg.inv(K)
    KK = np.dot(inv_K.T, inv_K)

    # ratio width/height (20)
    ratio = np.sqrt(np.dot(n2, np.dot(KK, n2)) / np.dot(n3, np.dot(KK, n3)))

    return ratio


if __name__ == "__main__":
    top_left = [269, 25]
    top_right = [800, 19]
    bottom_right = [805, 748]
    bottom_left = [273, 750]

    # Load intrinsic matrices from calib.py
    K = np.load("calibration_pixel_6a/intrinsic.npy")

    ratio = find_rectangle_proportion(K, top_left, top_right, bottom_left, bottom_right)
    print(f"ratio width/height = {ratio}")
梦里泪两行 2024-08-05 13:40:07

您需要更多信息,该变换后的图形可以来自给定任意视角的任何平行四边形。

所以我想你需要先进行某种校准。

编辑:对于那些说我错了的人,这里有数学证明,即存在产生相同投影的矩形/相机的无限组合:

为了简化问题(因为我们只需要边的比例)让我们假设我们的矩形由以下点定义:R=[(0,0),(1,0),(1,r),(0,r)] (这种简化与将任何问题转换为仿射空间中的等效问题相同)。

变换后的多边形定义为:T=[(tx0,ty0),(tx1,ty1),(tx2,ty2),(tx3,ty3)]

存在变换矩阵M = [[m00,m01,m02],[m10,m11,m12],[m20,m21,m22]] 满足 (Rxi,Ryi,1)*M=wi(txi,tyi ,1)'

如果我们将上面的方程展开为点,

对于 R_0 我们得到:m02-tx0*w0 = m12-ty0*w0 = m22-w0 = 0

m00-tx1*w1 = m10-ty1*w1 = m20+m22-w1 = 0

对于 R_1,我们得到:对于 R_2, > 我们得到:m00+r*m01-tx2*w2 = m10+r*m11-ty2*w2 = m20+r*m21+m22-w2 = 0

对于R_3 我们得到: m00+r*m01-tx3*w3 = m10+r*m11-ty3*w3 = m20 + r*m21 + m22 -w3 = 0

到目前为止,我们有 12 个方程, 14 个未知变量(9 个来自矩阵,4 个来自 wi,1 个用于比率 r),其余为已知值 (txi)和 tyi 已给出)。

即使系统没有指定不足,一些未知数也会相互相乘(rmi0 乘积),从而使系统非线性(您可以将其转换为线性)系统为每个产品分配一个新名称,但最终您仍会得到 13 个未知数,其中 3 个被扩展为无限解)。

如果您发现推理或数学中有任何缺陷,请告诉我。

You need more information, that transformed figure could come from any parallelogram given an arbitrary perspective.

So I guess you need to do some kind of calibration first.

Edit: for those who said that I was wrong, here goes the mathematical proof that there are infinite combinations of rectangles/cameras that yield to the same projection:

In order to simplify the problem (as we only need the ratio of the sides) let's assume that our rectangle is defined by the following points: R=[(0,0),(1,0),(1,r),(0,r)] (this simplification is the same as transforming any problem to an equivalent one in an affine space).

The transformed polygon is defined as: T=[(tx0,ty0),(tx1,ty1),(tx2,ty2),(tx3,ty3)]

There exists a transformation matrix M = [[m00,m01,m02],[m10,m11,m12],[m20,m21,m22]] that satisfies (Rxi,Ryi,1)*M=wi(txi,tyi,1)'

if we expand the equation above for the points,

for R_0 we get: m02-tx0*w0 = m12-ty0*w0 = m22-w0 = 0

for R_1 we get: m00-tx1*w1 = m10-ty1*w1 = m20+m22-w1 = 0

for R_2 we get: m00+r*m01-tx2*w2 = m10+r*m11-ty2*w2 = m20+r*m21+m22-w2 = 0

and for R_3 we get: m00+r*m01-tx3*w3 = m10+r*m11-ty3*w3 = m20 + r*m21 + m22 -w3 = 0

So far we have 12 equations, 14 unknown variables (9 from the matrix, 4 from the wi, and 1 for the ratio r) and the rest are known values (txi and tyi are given).

Even if the system weren't underspecified, some of the unknowns are multiplied among themselves (r and mi0 products) making the system non linear (you could transform it to a linear system assigning a new name to each product, but you'll end still with 13 unknowns, and 3 of them being expanded to infinite solutions).

If you can find any flaw in the reasoning or the maths, please let me know.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文