使用高度图扭曲图像?

发布于 2024-10-19 17:29:58 字数 2471 浏览 7 评论 0原文

我有一个图像的高度图,它告诉我每个像素在 Z 方向上的偏移量。我的目标是仅使用其高度图来压平扭曲的图像。

我该怎么做呢?我知道相机的位置,如果有帮助的话。


为此,我考虑假设每个像素都是平面上的一个点,然后根据我从高度图获得的 Z 值垂直平移每个点,并根据该平移(想象一下您正在寻找在上面的点上;移动会导致该点从您的角度移动)。

从该投影偏移中,我可以提取每个像素的 X 和 Y 偏移,并将其输入到 cv.Remap() 中。

但我不知道如何使用 OpenCV 获得点的投影 3D 偏移,更不用说从中构建偏移图了。


以下是我正在做的事情的参考图像:

Calibration Image Warped Image

我知道激光的角度(45 度),并且从校准图像中,我可以计算出这本书很容易:

h(x) = sin(theta) * abs(calibration(x) - actual(x))

我对两条线执行此操作,并使用这种方法线性插值两条线以生成表面(Python代码。它在循环内):

height_grid[x][y] = heights_top[x] * (cv.GetSize(image)[1] - y) + heights_bottom[x] * y

我希望这会有所帮助;)


现在,这就是我必须去扭曲的内容图像。中间所有奇怪的东西都会将 3D 坐标投影到相机平面上,给定它的位置(以及相机的位置、旋转等):

class Point:
  def __init__(self, x = 0, y = 0, z = 0):
    self.x = x
    self.y = y
    self.z = z

mapX = cv.CreateMat(cv.GetSize(image)[1], cv.GetSize(image)[0], cv.CV_32FC1)
mapY = cv.CreateMat(cv.GetSize(image)[1], cv.GetSize(image)[0], cv.CV_32FC1)

c = Point(CAMERA_POSITION[0], CAMERA_POSITION[1], CAMERA_POSITION[2])
theta = Point(CAMERA_ROTATION[0], CAMERA_ROTATION[1], CAMERA_ROTATION[2])
d = Point()
e = Point(0, 0, CAMERA_POSITION[2] + SENSOR_OFFSET)

costx = cos(theta.x)
costy = cos(theta.y)
costz = cos(theta.z)

sintx = sin(theta.x)
sinty = sin(theta.y)
sintz = sin(theta.z)


for x in xrange(cv.GetSize(image)[0]):
  for y in xrange(cv.GetSize(image)[1]):
    
    a = Point(x, y, heights_top[x / 2] * (cv.GetSize(image)[1] - y) + heights_bottom[x / 2] * y)
    b = Point()
    
    d.x = costy * (sintz * (a.y - c.y) + costz * (a.x - c.x)) - sinty * (a.z - c.z)
    d.y = sintx * (costy * (a.z - c.z) + sinty * (sintz * (a.y - c.y) + costz * (a.x - c.x))) + costx * (costz * (a.y - c.y) - sintz * (a.x - c.x))
    d.z = costx * (costy * (a.z - c.z) + sinty * (sintz * (a.y - c.y) + costz * (a.x - c.x))) - sintx * (costz * (a.y - c.y) - sintz * (a.x - c.x))
    
    mapX[y, x] = x + (d.x - e.x) * (e.z / d.z)
    mapY[y, x] = y + (d.y - e.y) * (e.z / d.z)
    

print
print 'Remapping original image using map...'

remapped = cv.CreateImage(cv.GetSize(image), 8, 3)
cv.Remap(image, remapped, mapX, mapY, cv.CV_INTER_LINEAR)

这现在变成了一个巨大的图像和代码线程......无论如何,这段代码chunk 在 18MP 相机图像上运行需要 7 分钟;这方式太长了,最终,这种方法对图像没有任何作用(每个像素的偏移量是<<1)。

有什么想法吗?

I have a height map for an image, which tells me the offset of each pixel in the Z direction. My goal is to flatten a distorted image using only it's height map.

How would I go about doing this? I know the position of the camera, if that helps.


To do this, I was thinking about assuming that each pixel was a point on a plane, and then to translate each of those points vertically according to the Z-value I get from the height map, and from that translation (imagine you are looking at the points from above; the shift will cause the point to move around from your perspective).

From that projected shift, I could extract X and Y-shift of each pixel, which I could feed into cv.Remap().

But I have no idea how I could get the projected 3D offset of a point with OpenCV, let alone construct a offset map out of it.


Here are my reference images for what I'm doing:

Calibration Image
Warped Image

I know the angle of the lasers (45 degrees), and from the calibration images, I can calculate the height of the book really easily:

h(x) = sin(theta) * abs(calibration(x) - actual(x))

I do this for both lines and linearly interpolate the two lines to generate a surface using this approach (Python code. It's inside a loop):

height_grid[x][y] = heights_top[x] * (cv.GetSize(image)[1] - y) + heights_bottom[x] * y

I hope this helps ;)


Right now, this is what I have to dewarp the image. All that strange stuff in the middle projects a 3D coordinate onto the camera plane, given it's position (and the camera's location, rotation, etc.):

class Point:
  def __init__(self, x = 0, y = 0, z = 0):
    self.x = x
    self.y = y
    self.z = z

mapX = cv.CreateMat(cv.GetSize(image)[1], cv.GetSize(image)[0], cv.CV_32FC1)
mapY = cv.CreateMat(cv.GetSize(image)[1], cv.GetSize(image)[0], cv.CV_32FC1)

c = Point(CAMERA_POSITION[0], CAMERA_POSITION[1], CAMERA_POSITION[2])
theta = Point(CAMERA_ROTATION[0], CAMERA_ROTATION[1], CAMERA_ROTATION[2])
d = Point()
e = Point(0, 0, CAMERA_POSITION[2] + SENSOR_OFFSET)

costx = cos(theta.x)
costy = cos(theta.y)
costz = cos(theta.z)

sintx = sin(theta.x)
sinty = sin(theta.y)
sintz = sin(theta.z)


for x in xrange(cv.GetSize(image)[0]):
  for y in xrange(cv.GetSize(image)[1]):
    
    a = Point(x, y, heights_top[x / 2] * (cv.GetSize(image)[1] - y) + heights_bottom[x / 2] * y)
    b = Point()
    
    d.x = costy * (sintz * (a.y - c.y) + costz * (a.x - c.x)) - sinty * (a.z - c.z)
    d.y = sintx * (costy * (a.z - c.z) + sinty * (sintz * (a.y - c.y) + costz * (a.x - c.x))) + costx * (costz * (a.y - c.y) - sintz * (a.x - c.x))
    d.z = costx * (costy * (a.z - c.z) + sinty * (sintz * (a.y - c.y) + costz * (a.x - c.x))) - sintx * (costz * (a.y - c.y) - sintz * (a.x - c.x))
    
    mapX[y, x] = x + (d.x - e.x) * (e.z / d.z)
    mapY[y, x] = y + (d.y - e.y) * (e.z / d.z)
    

print
print 'Remapping original image using map...'

remapped = cv.CreateImage(cv.GetSize(image), 8, 3)
cv.Remap(image, remapped, mapX, mapY, cv.CV_INTER_LINEAR)

This is turning into a huge thread of images and code now... Anyways, this code chunk takes my 7 minutes to run on a 18MP camera image; that's way too long, and in the end, this approach does nothing to the image (the offset for each pixel is << 1).

Any ideas?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

櫻之舞 2024-10-26 17:29:58

我最终实现了自己的解决方案:

for x in xrange(cv.GetSize(image)[0]):
  for y in xrange(cv.GetSize(image)[1]):

    a = Point(x, y, heights_top[x / 2] * (cv.GetSize(image)[1] - y) + heights_bottom[x / 2] * y)
    b = Point()

    d.x = costy * (sintz * (a.y - c.y) + costz * (a.x - c.x)) - sinty * (a.z - c.z)
    d.y = sintx * (costy * (a.z - c.z) + sinty * (sintz * (a.y - c.y) + costz * (a.x - c.x))) + costx * (costz * (a.y - c.y) - sintz * (a.x - c.x))
    d.z = costx * (costy * (a.z - c.z) + sinty * (sintz * (a.y - c.y) + costz * (a.x - c.x))) - sintx * (costz * (a.y - c.y) - sintz * (a.x - c.x))

    mapX[y, x] = x + 100.0 * (d.x - e.x) * (e.z / d.z)
    mapY[y, x] = y + 100.0 * (d.y - e.y) * (e.z / d.z)


print
print 'Remapping original image using map...'

remapped = cv.CreateImage(cv.GetSize(image), 8, 3)
cv.Remap(image, remapped, mapX, mapY, cv.CV_INTER_LINEAR)

这(缓慢地)使用 cv.Remap 函数重新映射每个像素,这似乎有点工作......

I ended up implementing my own solution:

for x in xrange(cv.GetSize(image)[0]):
  for y in xrange(cv.GetSize(image)[1]):

    a = Point(x, y, heights_top[x / 2] * (cv.GetSize(image)[1] - y) + heights_bottom[x / 2] * y)
    b = Point()

    d.x = costy * (sintz * (a.y - c.y) + costz * (a.x - c.x)) - sinty * (a.z - c.z)
    d.y = sintx * (costy * (a.z - c.z) + sinty * (sintz * (a.y - c.y) + costz * (a.x - c.x))) + costx * (costz * (a.y - c.y) - sintz * (a.x - c.x))
    d.z = costx * (costy * (a.z - c.z) + sinty * (sintz * (a.y - c.y) + costz * (a.x - c.x))) - sintx * (costz * (a.y - c.y) - sintz * (a.x - c.x))

    mapX[y, x] = x + 100.0 * (d.x - e.x) * (e.z / d.z)
    mapY[y, x] = y + 100.0 * (d.y - e.y) * (e.z / d.z)


print
print 'Remapping original image using map...'

remapped = cv.CreateImage(cv.GetSize(image), 8, 3)
cv.Remap(image, remapped, mapX, mapY, cv.CV_INTER_LINEAR)

This (slowly) remaps each pixel using the cv.Remap function, and this seems to kind of work...

∞琼窗梦回ˉ 2024-10-26 17:29:58

基于与相机的距离的失真仅发生在透视投影中。如果您有像素的 (x,y,z) 位置,则可以使用相机的投影矩阵将像素投影回世界空间。有了这些信息,您就可以以正交方式渲染像素。但是,由于原始透视投影,您可能会丢失数据。

Distortion based on distance from the camera only happens with a perspective projection. If you have the (x,y,z) position of a pixel, you can use the projection matrix of the camera to unproject the pixels back into world-space. With that information, you can render the pixels in an orthographic way. However, you may have missing data, due to the original perspective projection.

内心激荡 2024-10-26 17:29:58

按如下方式分离场景:

  • 您有一个未知的位图图像 I(x,y) -> (r,g,b)
  • 你有一个已知的高度场H(x,y) -> h
  • 你有一个相机变换 C(x,y,z) -> (u,v) 将场景投影到屏幕平面

请注意,相机变换会丢弃信息(您不会获得每个屏幕像素的深度值)。您还可能在屏幕上有一些场景重叠,在这种情况下,仅显示最前面的部分 - 其余部分将被丢弃。所以一般来说这不是完全可逆的。

  • 您有一个屏幕截图 S(u,v),它是 C(x,y,H(x,y)) 的结果I 中的 x,y
  • 您想要生成屏幕截图 S'(u',v'),它是 C(x, y,0) for x,y in I

有两种明显的方法可以解决这个问题;两者都依赖于相机变换的准确值。

  1. 光线投射:对于S中的每个像素,将光线投射回场景中。找出它击中高度场的位置;这将为您提供原始图像I中的(x,y),并且屏幕像素为您提供该点的颜色。一旦您拥有了尽可能多的I,您就可以将其重新转换以找到S'

  2. 双重渲染:对于I中的每个x,y,投影找到(u,v)和(u',v')。从 S(u,v) 中获取像素颜色并将其复制到 S'(u',v')。

两种方法都会存在采样问题,可以通过超采样或插值来解决;方法 1 将在图像的遮挡区域中留下空白空间,方法 2 将从第一个表面“投影穿过”。

编辑:

我以为你指的是 CG 风格的高度场,其中 S 中的每个像素都位于 S' 中相应位置的正上方;但这并不是页面在表面上的覆盖方式。页面固定在书脊上且无弹性 - 抬起页面中心会将自由边缘拉向书脊。

根据您的示例图像,您必须反转这种累积拉力 - 检测书脊中心线位置和方向,并逐步向左和向右工作,找到页面每个垂直条带顶部和底部的高度变化,计算结果宽高比缩小和倾斜,并将其反转以重新创建原始平面页面。

Separate your scene out as follows:

  • you have an unknown bitmap image I(x,y) -> (r,g,b)
  • you have a known height field H(x,y) -> h
  • you have a camera transform C(x,y,z) -> (u,v) which projects the scene to a screen plane

Note that the camera transform throws information away (you do not get a depth value for each screen pixel). You may also have bits of scene overlap on screen, in which case only the foremost gets shown - the rest is discarded. So in general this is not perfectly reversible.

  • you have a screenshot S(u,v) which is a result of C(x,y,H(x,y)) for x,y in I
  • you want to generate a screenshot S'(u',v') which is a result of C(x,y,0) for x,y in I

There are two obvious ways to approach this; both depend on having accurate values for the camera transform.

  1. Ray-casting: for each pixel in S, cast a ray back into the scene. Find out where it hits the heightfield; this gives you (x,y) in the original image I, and the screen pixel gives you the color at that point. Once you have as much of I as you can recover, re-transform it to find S'.

  2. Double-rendering: for each x,y in I, project to find (u,v) and (u',v'). Take the pixel-color from S(u,v) and copy it to S'(u',v').

Both methods will have sampling problems which be helped by super-sampling or interpolation; method 1 will leave empty spaces in occluded areas of the image, method 2 will 'project through' from the first surface.

Edit:

I had presumed you meant a CG-style heightfield, where each pixel in S is directly above the corresponding location in S'; but this is not how a page drapes over a surface. A page is fixed at the spine and is non-stretchy - lifting the center of a page pulls the free edge toward the spine.

Based on your sample image, you'll have to reverse this cumulative pulling - detect the spine centerline location and orientation and work progressively left and right, finding the change in height across the top and bottom of each vertical strip of page, calculating the resulting aspect-narrowing and skew, and reversing it to re-create the original flat page.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文