OPENCV的
确切地说,让:
A = an image
X = a grid of real-valued X coords into the image.
Y = a grid of real-valued Y coords into the image.
B = remap(A, X, Y)
所有像素坐标I,j
B[i, j] = A(X[i, j], Y[i, j])
对于 有价值的坐标 x
和 y
。
我的问题是:给定索引网格 x
, y
,如何生成“倒格网格” x^-1
, y^-1
这样:
X(X^-1[i, j], Y^-1[i, j]) = i
Y(X^-1[i, j], Y^-1[i, j]) = j
对于
X^-1(X[i, j], Y[i, j]) = i
Y^-1(X[i, j], Y[i, j]) = j
所有整数像素坐标 i,j
?
FWIW,图像和索引映射X和Y的形状相同。但是,索引X和Y没有先验结构。例如,它们不一定是仿射或刚性变换。它们甚至可能是不可避免的,例如,如果 x,y
a
中的多个像素映射到B中相同的精确像素坐标。如果存在一个合理的逆映射。
该解决方案不必基于OPENCV,因为我不使用OPENCV,而是另一个具有 remap()
实现的库。虽然欢迎任何建议,但我特别热衷于“数学上正确”的东西,即如果我的地图M完全可逆,则该方法应该在机器精度的一些较小的余量中找到完美的倒数。
OpenCV's remap()
uses a real-valued index grid to sample a grid of values from an image using bilinear interpolation, and returns the grid of samples as a new image.
To be precise, let:
A = an image
X = a grid of real-valued X coords into the image.
Y = a grid of real-valued Y coords into the image.
B = remap(A, X, Y)
Then for all pixel coordinates i, j,
B[i, j] = A(X[i, j], Y[i, j])
Where the round-braces notation A(x, y)
denotes using bilinear interpolation to solve for the pixel value of image A using float-valued coords x
and y
.
My question is: given an index grid X
, Y
, how can I generate an "inverse grid" X^-1
, Y^-1
such that:
X(X^-1[i, j], Y^-1[i, j]) = i
Y(X^-1[i, j], Y^-1[i, j]) = j
And
X^-1(X[i, j], Y[i, j]) = i
Y^-1(X[i, j], Y[i, j]) = j
For all integer pixel coordinates i, j
?
FWIW, the image and index maps X and Y are the same shape. However, there is no a priori structure to the index maps X and Y. For example, they're not necessarily affine or rigid transforms. They may even be uninvertible, e.g. if X, Y
maps multiple pixels in A
to the same exact pixel coordinate in B. I'm looking for ideas for a method that will find a reasonable inverse map if one exists.
The solution need not be OpenCV-based, as I'm not using OpenCV, but another library that has a remap()
implementation. While any suggestions are welcome, I'm particularly keen on something that's "mathematically correct", i.e. if my map M is perfectly invertible, the method should find the perfect inverse, within some small margin of machine precision.
发布评论
评论(15)
迭代解决方案
以上许多解决方案对我不起作用,当地图不可逆转或不快速时失败。
我提出了一种替代的6线迭代解决方案。
它的表现如何?
对于我的用例,即用于航空摄影的地形校正图,此方法以10个步骤舒适地收敛到像素的1/10。它也很快,因为所有重型计算都被塞入OpenCV
如何工作?
该方法使用以下想法:如果
(x',y')= f(x,y)
是一个映射,然后只要f 很小。
我们可以继续完善我们的映射,以上是我们的第一个预测(我是“身份映射”):
g_1 = i -f
我们的第二个预测可以从中进行调整:
g_2 = g_1 + i -f(g_1)
等:
g_n + 1 = g_n + i -i -f(g_n)
证明
g_n
收敛到反向f^-1
很难,但是我们可以轻松证明的是,如果g
已收敛,它将保持融合。假设
g_n = f^-1
,然后我们可以替换为:g_n + 1 = g_n + i -f(g_n)
,然后获取:
测试脚本< /strong>
Iterative solution
Many of the above solutions didn't work for me, failed when the map wasn't invertible, or weren't terribly fast.
I present an alternative, 6-line iterative solution.
How well does it do?
For my use case of inverting a terrain correction map for aerial photography, this method converges comfortably in 10 steps to 1/10th of a pixel. It's also blazingly fast, because all the heavy compute is tucked inside OpenCV
How does it work?
The approach uses the idea that if
(x', y') = F(x, y)
is a mapping, then the inverse can be approximated with(x, y) = -F(x', y')
, as long as the gradient ofF
is small.We can continue to refine our mapping, the above gets us our first prediction (I is an "identity mapping"):
G_1 = I - F
Our second prediction can be adapted from that:
G_2 = G_1 + I - F(G_1)
and so on:
G_n+1 = G_n + I - F(G_n)
Proving that
G_n
converges to the inverseF^-1
is hard, but what we can easily prove is that ifG
has converged, it will stay converged.Assume
G_n = F^-1
, then we can substitute into:G_n+1 = G_n + I - F(G_n)
and then get:
Testing script
好吧,我只需要自己解决这个我自己自己就可以概述解决方案。
给定
x
,y
remap()
函数,该功能执行以下:我计算
XINV
,yinv
可以由remap()
函数使用 invert the Process:首先我构建a n 最近的邻居
(X,Y)。
我使用Euclidian距离,我找到了一个很好的 c ++ header lib for kd-trees 在github上。 >
A
的网格中的值,查找n = 5
最近的邻居{(x [i_k,j_k],y [i_k,j_k])| k .. n-1}
在我的点集中。yinv [x,y] = j_k
,否则...使用逆距离加权(idw)计算插值值:
w_k = 1/pow(d_k,p)
(我使用p = 2
)XINV [x,y] =(sum_k w_k * i_k)/(sum_k w_k)
yinv [x,y] =(sum_k w_k * j_k)/(sum_k w_k)
请注意,如果
b
是a
w x H 图像然后
X
和y
是w x H
浮子的数组。如果a
是WX H
图像,则XINV
和yinv
是WX H
arrays对于浮子。重要的是,您必须与图像和地图尺寸一致。像魅力一样工作!我的第一个版本我尝试了蛮力强迫搜索,但我什至从未等待过它的完成。我切换到KD-Tree,然后开始获得合理的运行时间。如果我有时间我想将其添加到OpenCV中。
下图是使用
remap()
以从第一个图像中删除镜头失真。第三张图像是颠倒过程的结果。Well I just had to solve this remap inversion problem myself and I'll outline my solution.
Given
X
,Y
for theremap()
function that does the following:I computed
Xinv
,Yinv
that can be used by theremap()
function to invert the process:First I build a KD-Tree for the 2D point set
{(X[i,j],Y[i,j]}
so I can efficiently find theN
nearest neighbors to a given point(x,y).
I use Euclidian distance for my distance metric. I found a great C++ header lib for KD-Trees on GitHub.Then I loop thru all the
(x,y)
values inA
's grid and find theN = 5
nearest neighbors{(X[i_k,j_k],Y[i_k,j_k]) | k = 0 .. N-1}
in my point set.If distance
d_k == 0
for somek
thenXinv[x,y] = i_k
andYinv[x,y] = j_k
, otherwise...Use Inverse Distance Weighting (IDW) to compute an interpolated value:
w_k = 1 / pow(d_k, p)
(I usep = 2
)Xinv[x,y] = (sum_k w_k * i_k)/(sum_k w_k)
Yinv[x,y] = (sum_k w_k * j_k)/(sum_k w_k)
Note that if
B
is aW x H
image thenX
andY
areW x H
arrays of floats. IfA
is aw x h
image thenXinv
andYinv
arew x h
arrays for floats. It is important that you are consistent with image and map sizing.Works like a charm! My first version I tried brute forcing the search and I never even waited for it to finish. I switched to a KD-Tree then I started to get reasonable run times. I f I ever get time I would like to add this to OpenCV.
The second image below is use
remap()
to remove the lens distortion from the first image. The third image is a result of inverting the process.这是一个重要的问题,令我惊讶的是,在任何标准库中都没有得到更好的解决(至少据我所知)。
我对接受的解决方案不满意,因为它没有使用转换的隐性平滑度。我可能会错过重要的情况,但我无法想象在任何有用的意义上都可以逆转映射,而且在像素量表上都不平滑。
平滑度意味着无需计算最近的邻居:最近的点是原始网格上已经接近的点。
我的解决方案使用了以下事实:在原始映射中,一个正方形[(i,j),(i+1,j),(i+1,j+1),(i,j+1)]四边形[(x [i,j],y [i,j],x [i+1,j],y [i+1,j],... 在四边形内进行插值
仅 四边形进行逐步构建,我在这里复制代码,希望有足够的评论使这个想法
对不太明显的内容进行了清晰的评论
bilinear_inverse
的矢量化性质,但是在每次迭代中,仅选择坐标(偏移到其边界框)的四边形是有效的。这是一个测试示例
< a href =“ https://i.sstatic.net/fz5yz.png” rel =“ noreferrer”>
This is an important problem, and I am surprised that it is not better addressed in any standard library (at least to my knowledge).
I wasn't happy with the accepted solution as it didn't use the implicit smoothness of the transformation. I might miss important cases, but I cannot imagine mapping that are both invertible in any useful sense and non-smooth at the pixel scale.
Smoothness means that there is no need to compute nearest neighbors: the nearest points are those that are already near on the original grid.
My solution uses the fact that, in the original mapping, a square [(i,j), (i+1, j), (i+1, j+1), (i, j+1)] maps to a quadrilateral [(X[i,j], Y[i,j], X[i+1,j], Y[i+1,j], ...] that has no other points inside. Then the inverse mapping only requires interpolation within the quadrilateral. For this I use an inverse bilinear interpolation, which will give exact results at the vertices and for any other affine transform.
The implementation has no other dependency than
numpy
. The logic is to run through all quadrilaterals and build progressively the reverse mapping. I copy the code here, hopefully there are enough comments to make the idea clear enough.A few comments on the less obvious stuff:
bilinear_inverse
, but at each iteration selects only the quadrilaterals for which the coordinates (offset to their bounding box) are valid.Here's a test example
您可以在已知点上倒置映射并将其插入新的网格中。
它会正常工作,而失真不是很大。
这是使用scipy.interpaly.griddata在Python中非常简单的实现:
如果您将CV_32FC2用于地图,则可以简化点构造:
You can invert map at known points and interpolate it into new grid.
It will work fine, while distortion is not very huge.
Here is very simple implementation in Python using scipy.interpolate.griddata:
If you use CV_32FC2 for maps, you can simplify points construction:
如果您的映射是从同构派生
h
的映射,则可以倒置h
,并直接使用cv :: initundististOrtrectifymap()
创建倒数映射。例如,在Python中:
OpenCV文档指出
initundistortrectifymap()
:如果您刚刚给出了地图,则必须自己做。
Hoewever,新地图坐标的插值并不小,因为一个像素的支撑区域可能很大。
这是一个简单的Python解决方案,它通过进行点对点映射来颠倒地图。这可能会使一些坐标未分配,而另一些则将多次更新。因此,地图中可能有孔。
这是一个小型Python程序,展示了这两种方法:
If you map is derived from a homography
H
you could invertH
and directly create the inverse maps withcv::initUndistortRectifyMap()
.e.g. in Python:
The OpenCV documentation states about
initUndistortRectifyMap()
:In the case you have just given the maps, you have to do it by yourself.
Hoewever, interpolation of the new maps' coordinates is not trivial, because the support region for one pixel could be very large.
Here is a simple Python solution which inverts the maps by doing point-to-point mapping. This will probably leave some coordinates unassigned, while others will be updated several times. So there may be holes in the map.
Here is a small Python program demonstrating both approaches:
这是@wcochran答案的实现。我试图恢复镜头校正的镜头校正。
Here's an implementation of @wcochran 's answer. I was trying to recover a lens correction resulted by lensfunpy.
knnregressor具有颠倒网格映射的所有必要组件!
干得好:
A KNNRegressor has all the necessary components to invert the grid mapping!
Here you go:
barycentric重新采样解决方案
我使用。我用numba优化了它。结果是快速且具有弹性的对变形,旋转,缩放,对称性和缩放的结果。
我将其与迭代解决方案提议。 迭代溶液比我的更快,但在旋转和对称性下失败。
基准测试
我已经创建了一个repo invert_map 用于基准我尝试过的不同算法。随意添加您的。
Barycentric Resampling Solution
I used a resampling approach with barycentric interpolation. I optimized it with numba. The result is fast and resilient to distortion, rotation, scaling, symmetry and zoom.
I compared it with the Iterative solution proposed by Hannesh. The Iterative solution is faster than mine but it fails under rotation and symmetry.
Benchmark
I've created a repo invert_map for benchmarking the different algo I tried. Feel free to add yours.
没有任何标准方法可以使用 openCV 。
如果您正在寻找一个完整的现成解决方案,我不确定自己可以提供帮助,但是我至少可以描述几年前完成此任务的方法。
首先,您应该创建与源图像相同的维度的重新映射。我创建了具有较大尺寸的地图,以简化插值,最后一步将它们裁剪成适当的尺寸。然后,您应该将它们填充在以前的重新映射中存在的值(并不难:仅迭代它们,如果映射坐标x和y在图像的范围内坐着,将它们的行和列作为新的y和x取用,然后放入旧X和Y列和新地图的行)。这是相当简单的解决方案,但是它给出了相当不错的结果。对于完美的一个,您应该使用插值方法和邻居像素插值旧X和Y到整数值。
此后,您应该实际手动重新映射像素颜色,或者用像素坐标完全填充映射,并使用OpenCV中的版本。
您将遇到相当具有挑战性的任务:您应该在空区域内插值像素。换句话说,您应该根据这些距离,将距离距离最接近非零像素坐标并混合颜色(如果重现颜色)或坐标(如果进行完整的地图计算)。实际上,线性插值也不是那么困难,您甚至可以在 remap()实现noreferrer“> opencv github页面。对于NN插值,我会简单得多 - 只需拿最近的邻居的颜色/坐标即可。
最终的任务是推断重新映像区域边界的区域。 OPENCV的算法也可以用作参考。
There is no any standard way to do it with OpenCV.
If you are looking for a complete ready-to-use solution, I am not sure that I can help, but I can at least describe a method that I used some years ago to do this task.
First of all, you should create remapping maps with the same dimension as your source image. I created maps with larger dimensions for simpler interpolation, and at final step cropped them to proper size. Then you should fill them with values existing in previous remapping maps (not so difficult: just iterate over them and if maps coordinates x and y lays in limits of your image, take their row and column as new y and x, and place into old x and y column and row of the new map). It is rather simple solution,but it gives rather good result. For perfect one you should interpolate old x and y to integer values using your interpolation method and neighbour pixels.
After this you should either actually remap pixel colors manually, or completely fill your remapping map with pixel coordinates and use version from OpenCV.
You will meet rather challenging task: you should interpolate pixels in empty areas. In other words, you should take distances to closest non-zero pixel coordinates and mix color (if you remap colors) or coordinates (if you proceed with full maps computation) fractions according to these distances. Actually it is also not so difficult for linear interpolation, and you can even look into
remap()
implementation in OpenCV github page. For NN interpolation it will me much simpler - just take color/coordinate of nearest neighbour.And a final task is extrapolation of areas out of borders of remapped pixels area. Also algorithm from OpenCV can be used as a reference.
OP在这里。我想我找到了答案。我尚未实施它,如果有人提出了一个不太贴心的解决方案(或发现这个问题有问题),我将选择他们的答案。
问题语句
让A为源图像,B为目标图像,M是从A的坐标到B的坐标的映射,即:
...方形支架指示带有整数索引的数组查找,圆形括号指示双线性插值查找。浮点数。我们使用更经济的符号来重述上述:
我们希望找到一个逆向映射n,将b映射到尽可能最好:
可以在不参考a或b的情况下说出问题:
... i_n 是具有与n相同的尺寸的身份映射
|*|| 指示frobenius norm , 同构,然后您可以直接构建n为:
或在我们简化的符号中:
图
。
解决方案
构建空N作为浮子的3D张量:
对于A的坐标空间中的每个坐标[i,j],请:
计算将这些A坐标映射到相应的B坐标物(由2x2 Grid的像素索引给出的)的同型矩阵h。
可能昂贵的步骤是第1步中的搜索,该搜索是对环绕[i,j]的M中A坐标的2x2网格。蛮力搜索将使整个算法o(n*m),其中n是a中的像素的数量,而m中的像素的数量。
要将其减少到o(n),可以运行扫描线每个A坐标四边形内的算法以识别所有整数值坐标[i,j]。这可以预先计算为一个标志性,该哈希图将整数值映射为坐标[i,j]到包围四边形的B坐标的左上角[k,l]。
OP here. I think I've found an answer. I haven't implemented it yet, and if someone comes up with a less fiddly solution (or finds something wrong with this one), I'll choose their answer instead.
Problem statement
Let A be the source image, B be the destination image, and M be the mapping from A's coords to B's coords, i.e.:
...where square braces indicate array lookup with integer indices, and circular braces indicate bilinear interpolation lookup with floating-point indices. We restate the above using the more economical notation:
We wish to find an inverse mapping N that maps B back to A as best as is possible:
The problem can be stated without reference to A or B:
...where
||*||
indicates the Frobenius norm, andI_n
is the identity map with the same dimensions as N, i.e. a map where:Naive solution
If M's values are all integers, and M is an isomorphism, then you can construct N directly as:
Or in our simplified notation:
...where I_m is the identity map with the same dimensions as M.
There are two problems:
Solution
Construct empty N as a 3D tensor of floats:
For each coordinate [i, j] in A's coordinate space, do:
Compute the homography matrix H that maps those A-coordinates to their corresponding B-coordinates (given by the 2x2 grid's pixel indices).
The potentially expensive step here would be the search in step 1 for the 2x2 grid of A-coordinates in M that encircles [i, j]. A brute-force search would make this whole algorithm O(n*m) where n is the number of pixels in A, and m the number of pixels in B.
To reduce this to O(n), one could instead run a scanline algorithm within each A-coordinate quadrilateral to identify all the integer-valued coordinates [i, j] it contains. This could be precomputed as a hashmap that maps integer-valued A coords [i, j] to the upper-left corner of its encircling quadrilateral's B coords [k, l].
一种方法是拍摄原始地图,遍历其条目,并占据X和Y值的地板和天花板。这给出了(x,y)(x f ,y f )的四个最近整数f ),(x f ,y c )和(x c ,y c )在原始源图像的坐标中。然后,您可以将它们用作包含像素值和权重的索引填充一个结构,并将您首选的不规则网格插值与这些数据一起使用。
由于结构可以是图像阵列的积累,而权重为标量,因此这很容易实现。 F是原始源,G是扭曲的图像,F'是恢复的图像。该地图为M.
Init f'至0。创建一个与F'相同的浮子的0个式重量阵列W。
在M中迭代M,在M中的每个M中找到4个整数对及其距离(x,y)的距离。从g中取相应的像素值,将其按相互距离加权,然后将其累积到f'like
f'(xf | c,yf | c)+= g(i,j)/sqrt((x--) xf | c)^2+(y-yf | c)^2)
然后将重量累积到
w(xf | c,yf | c)+= 1./sqrt((x-xf | c)^2+(y-yf | c)^2)
。完成之后,通过迭代并将每个像素除以W中的对应条目(如果不是零)来标准化f'。
在这一点上,图像通常几乎完整,但是在较高的下采样比下,f'中的某些像素可能不会填充。因此,您会在W中来回往返W以找到0个重量条目,并插入这些像素来自他们的非空邻居。这部分可以通过KNN搜索和插值来完成,因为它们通常不多。
它易于实现,比KNN方法更好得多(尽管我认为这对于小图像很棒)。不利的一面是,反距离不是最大的插值方案,但是如果映射不太笨拙并且原始采样不多,它似乎可以很好地工作。当然,如果下样本比率很高,则必须推断出大量丢失的信息,因此它本质上会带来粗略的结果。
如果您想尽可能多地从地图倒置中挤出,可以尝试解决原始插值方案定义的方程式(可能不确定的)系统;不是不可能的,但是具有挑战性。
One way to do it is to take the original map, iterate through its entries and take floors and ceils of the x and y values. This gives the four nearest integers around (x,y), (xf,yf), (xc,yf), (xf,yc), and (xc,yc) in the coordinates of the original source image. You can then fill in a structure with each of these as an index which contains the pixel value and a weight, and use your preferred irregular grid interpolation with those data.
This is easy to implement with inverse distance interpolation, since the structure can be an image array accumulation and the weights are scalars. F is the original source, G is the warped image, and F' is the restored image. The map is M.
Init F' to 0. Create a 0-initialized weight array W of floats the same size as F'.
Iterate through M. For each in M, find the 4 integer pairs and their distances from (x,y). Take the corresponding pixel value from G, weight it by its reciprocal distance, and accumulate it into F' like
F'(xf|c,yf|c)+=G(i,j)/sqrt((x-xf|c)^2+(y-yf|c)^2)
Then accumulate that weight into
W(xf|c,yf|c)+=1./sqrt((x-xf|c)^2+(y-yf|c)^2)
.After that is finished, normalize F' by iterating through it and divide each pixel by its corresponding entry in W, if it's non zero.
At this point, the image is usually nearly complete, but with high downsampling ratios, some pixels in F' may not get filled in. So then you do a couple passes back and forth through W to find 0 weight entries, and interpolate those pixels from their non-empty neighbors. This part could be done with KNN search and interpolate too since there usually aren't many of them.
It's easy to implement and scales a lot better than the KNN approach (though I think that's great for small images). The downside is that inverse distance isn't the greatest interpolation scheme, but it seems to work fairly well if the mapping isn't too clumpy and the original hasn't been downsampled a lot. Of course, if the downsample ratio is high, you're having to infer a lot of lost information, so it's inherently going to give rough results.
If you want to squeeze as much as possible out of the map inversion, you could try to solve the (potentially underdetermined) system of equations defined by the original interpolation scheme; not impossible, but challenging.
好吧,为了使扭曲的图像从Unsustort中获取,也许您可以使用OpENCV的UnsctortPoint函数获取反向映射。使用initundistortRectifyMap您从testort-&gt; untistort中获得映射,并使用UnnestortPoints,您可以从Untistort-&gt; toint点上获取映射,然后使用重新映射来获取扭曲图像。
Well, to get the distort image from undistort, maybe you can use undistortPoints function of opencv to get reverse map. Use initUndistortRectifyMap you get map from distort->undistort, and use undistortPoints, you can get map from undistort->distort points by points, then use remap to get the distort image.
解决方案 https://stackoverflow.com/a/68706787/4521113 很棒,但是我对所提供的说明不满意。在这里,我将在该解决方案,我认为它的假设以及这些假设产生的局限性上做出解释。
介绍性问题
假设我们有一个函数
f(x)
,我们想获得一个值x
,该值f(x)= y
。例如,假设f(x)=x²
,我们想找到产生x
的值x
,该值f(x)= 4
。对于这种具体情况,我们可以倒转功能并使用x =f⁻⁻(y)= sqrt(y)
,所以f⁻⁻(4)= sqrt(4)= 2
,这为我们提供了解决方案。但是,一个函数并不总是可逆的,或者发现逆可能是不平凡的。在这种情况下,我们可以将问题重新定义为最小化问题。让我们定义损失函数
c(x)=(y -f(x))²
其中y
是我们在评估f(x)后要获得的值)
。查找x
f(x)= y
,等效于最小化c(x)
。有很多算法用于查找功能的最低限度。让我们考虑渐变下降解决这个问题,只是因为。在我们的情况下,我们将通过
将其应用于
x_0 = 1
的特定示例来找到解决方案,我们观察到
x的值如何
慢慢接近2
,我们知道这是解决最小化问题的解决方案。梯度下降以将从像素到像素
地图
x [i,j]
和y [i,j]
的地图被认为是从r²到r的功能,产生一个函数f =(x(i,j),y(i,j))
从r²到r²;它映射原始图像中的像素坐标为目标图像中的像素坐标。反转地图等同于查找f⁻=(x⁻⁻(i',j'),y⁻⁻(i',j'))
将目标图像中的像素坐标映射到像素在原始图像中坐标。同样,可以通过定义成本函数c(i,j)= ||来重新重新重新重新重新重新构成最小化问题。 (i',j') - (x(i,j),y(i,j))||²
。同样,我们可以在原始图像中使用梯度下降来查找坐标(i,j)
,这些图像映射到坐标(i',j') in目标图像:
其中
j
是地图的雅各布:我们假设
(i,j)
和(i',j')
是2D矢量。这开始类似于上述解决方案。假设
有多种梯度下降的变体。他们中的一些人使用一些“共轭方向”。一个与梯度不同的方向,但也导致最小值。 Hannesh提出的解决方案用身份矩阵代替Jacobian J。因此,假设是身份矩阵乘以比例因子
alpha'
是雅各布时代的有效近似值,是“学习率” alpha:2 * alpha * alpha * j 由
alpha' * i
近似。最后,选择
alpha'
为1。在我们获得的迭代算法中介绍这些更改:
现在,我们可以构建目标索引的图像(i',j'),并近似评估。
(x(i,j),y(i,j))
通过使用remap
函数。这将产生提出的算法的最终版本。我实施了实用的考虑因素
,以近似相机校准函数的反图。该相机校准功能在原始扭曲的图像中采用坐标:
并将它们转换为未置换的图像中的坐标。在这里,您可以看到按像素应用迭代算法像素的结果,并且不使用
Remap
函数,但使用确切的映射评估当前解决方案:解决方案是近似值
算法提供的结果是一个近似值,因为
Remap
函数在当前解决方案下对地图的评估提供了近似值。在这里,您可以看到使用Remap
函数应用算法的结果,alpha'= 1e-2
的学习率以及迭代1000次:注意图像边界上的伪影,以及右下角缺少映射。使用
Inter_linear
选择了选择的插值方法,但是使用Inter_cubic
也没有真正帮助:选择一个
使用精确映射计算的倒数映射是使用
alpha'= 5E-1
和50次迭代的学习速率获得的。但是,在最终结果中选择学习率不当也可能导致工件。在此处查找使用
alpha'= 1E0
和50个迭代的结果:注意图像角中获得的伪影。这是由于学习率太大而不会融合算法的结果。
另一方面,检查使用
alpha'= 1e-2
和50迭代的结果:请注意,“未置入的”图像并未完全未完全突出,并且“直线”线仍然弯曲。这是由于学习率太小而不会融合算法的结果。
Solution https://stackoverflow.com/a/68706787/4521113 is great, but I was not satisfied with the provided explanation. Here I will contribute my interpretation on that solution, the assumptions I think it makes, and the limitations that arise from those assumptions.
Introductory problem
Assume we have a function
f(x)
and we want to obtain a valuex
that producesf(x) = y
. As an example, assumef(x) = x²
and we want to find the valuex
that producesf(x) = 4
. For this concrete case, we can invert the function and usex = f⁻¹(y) = sqrt(y)
, sof⁻¹(4) = sqrt(4) = 2
, which gives us the solution.However, a function is not always invertible, or finding the inverse could be non-trivial. In such cases, we can redefine the problem as a minimization problem. Let's define the loss function
C(x) = ( y - f(x) )²
wherey
is the value we want to obtain after evaluatingf(x)
. Findingx
for whichf(x) = y
, is equivalent to minimizingC(x)
.There are plenty of algorithms used to find the minimum of a function. Let's consider Gradient descent to solve this problem, just because. In our case, we would iterate on
x
to find the solution throughApplying this to our particular example starting from
x_0 = 1
,we observe how the value of
x
slowly approaches2
, which we know to be the solution to our minimization problem.Gradient descent to invert a map from pixel to pixel
Maps
X[i,j]
andY[i,j]
can be thought as functions from R² to R, which combined produce a functionF = ( X(i,j) , Y(i,j) )
from R² to R²; it maps pixel coordinates in the original image to pixel coordinates in the target image. Inverting the map is equivalent to findF⁻¹ = ( X⁻¹(i',j') , Y⁻¹(i',j') )
that maps pixel coordinates in the target image to pixel coordinates in the original image. And again, this problem can be reformulated as a minimization problem by defining the cost functionC(i,j) = || (i',j') - ( X(i,j) , Y(i,j) ) ||²
. And again, we can iterate using gradient descent to find the coordinates(i,j)
in the original image that are mapped to the coordinates(i',j')
in the target image:where
J
is the Jacobian of the map:and we assume
(i,j)
and(i',j')
to be 2d row vectors. This starts to resemble the solution mentioned above.Assumptions
There are multiple variants of Gradient descent. Some of them use some "conjugate direction"; a direction different from the gradient, but that also leads to the minimum. The solution proposed by Hannesh substitutes the Jacobian J by the identity matrix. Hence, the assumption is that the identity matrix times a scale factor
alpha'
is a valid approximation for the Jacobian times 2 times the "learning rate" alpha:2 * alpha * J
is approximated byalpha' * I
.Finally,
alpha'
is chosen to be 1.Introducing these changes in the iterative algorithms we obtain:
Now, we can build an image of target indices (i',j'), and approximate the evaluation of
( X(i,j) , Y(i,j) )
by using theremap
function. That would yield the final version of the proposed algorithm.Practical considerations
I implemented the solution to approximate the inverse map of my camera calibration function. That camera calibration function takes coordinates in the original distorted image:


and transforms them into coordinates in the undistorted image. Here you can see the result of applying the iterative algorithm pixel by pixel, and WITHOUT using the
remap
function but evaluating the current solution with the exact map:The solution is an approximation
The result provided by the algorithm is an approximation, because the


remap
function provides an approximation to the evaluation of the map at the current solution. Here you can see the result of applying the algorithm using theremap
function, a learning rate ofalpha'=1e-2
, and iterating 1000 times:Note the artifacts on the borders of the image, and the lack of mapping in the right bottom corner. The chosen interpolation method was selected using
INTER_LINEAR
, but usingINTER_CUBIC
does not really help either:Choosing a too small or too big learning-rate
The inverse map computed using the exact map was obtained using a learning rate of
alpha'=5e-1
, and 50 iterations. However, choosing the learning rate inappropriately can also lead to artifacts in the final result.Find here the result of using

alpha'=1e0
and 50 iterations:Note the artifacts obtained in the corners of the image. That is the consequence of the algorithm not converging because of a too big learning rate.
On the other hand, check the result of using

alpha'=1e-2
and 50 iterations:Note how the "undistorted" image is not totally undistorted, and "straight" lines are still curved. That is the consequence of the algorithm not converging because of a too small learning rate.
据我了解,您有原始图像和一个变换的图像,并且希望恢复不知道它已应用的转换的性质,但是假设它是明智的,例如旋转或鱼眼扭曲。
我要尝试的是在索引图像和纯图像中使用阈值将图像转换为二进制。然后尝试识别对象。大多数映射至少将保留连接性和Euler号码,索引中最大的对象仍然是平原上最大的对象。
然后为您匹配的图像 /索引对介绍瞬间,看看是否可以删除翻译,旋转和缩放。这为您提供了几个反向地图,然后您可以尝试将其缝合在一起。 (如果转换不是简单的话,但是重新建立任何转换的总体问题无法解决)。
From what I understand you have an original image, and a transformed image, and you wish to recover the nature of the transform that has been applied without knowing it, but assuming it is something sensible, like a rotation or a fish-eye distort.
What I would try is thresholding the image to convert it to binary, in both the index image and the plain image. Then try to identify objects. Most mappings will at least retain connectivity and Euler number, mostly the largest object in the index will still be the largest object in the plain.
Then take moments for your matched image / indexed pairs and see if you can remove translation, rotation and scaling. That gives you several reverse maps, which you can then try to stitch together. (Hard if the transform is not simple, but the general problem of reconstituting just any transformation cannot be solved).
使用KDTREE和反距离加权(IDW)
Use KDTree and Inverse Distance Weighting (IDW)
