如何优化大量散点的插值?
我目前正在处理一组坐标点(经度、纬度,大约 60000 个)以及该位置的温度。我需要对它们进行插值,以计算某些温度未知的点的值,以绘制某些区域。 为了尊重这些点之间的影响,我已将每个(长,纬度)点转换为单位球体点(x,y,z)。 我已经开始应用“数值食谱第三版”中的广义多维 Shepard 插值:
Doub interp(VecDoub_I &pt)
{
Doub r, w, sum=0., sumw=0.;
if (pt.size() != dim)
throw("RBF_interp bad pt size");
for (Int i=0;i<n;i++)
{
if ((r=rad(&pt[0],&pts[i][0])) == 0.)
return vals[i];
sum += (w = pow(r,pneg));
sumw += w*vals[i];
}
return sumw/sum;
}
Doub rad(const Doub *p1, const Doub *p2)
{
Doub sum = 0.;
for (Int i=0;i<dim;i++)
sum += SQR(p1[i]-p2[i]);
return sqrt(sum);
}
正如您所看到的,对于一个点的插值,算法会计算该点到其他每个点的距离,并将其作为最终值的权重。 尽管这个算法有效,但与我需要的相比还是太慢了,因为我将计算很多点来映射某个区域的网格。 优化此问题的一种方法是,我可以忽略超出特定半径的点,但对于点很少或没有点的区域会造成问题。 另一件事是通过仅计算一次查找表并存储距离来减少每两个点之间的距离的计算。这样做的问题是不可能存储这么大的矩阵(60000 x 60000)。 获得的温度网格将用于计算不同温度值的轮廓。 如果有人知道一种优化该算法的方法或者可能帮助提供更好的算法,我将不胜感激。
I am currently working with a set of coordinate points (longitude, latitude, about 60000 of them) and the temperature at that location. I need to do a interpolation on them to compute the values at some points with unknown temperature as to map certain regions.
As to respect the influence that the points have between them I have converted every (long, lat) point to a unit sphere point (x, y, z).
I have started applying the generalized multidimension Shepard interpolation from "Numerical recipes 3rd Edition":
Doub interp(VecDoub_I &pt)
{
Doub r, w, sum=0., sumw=0.;
if (pt.size() != dim)
throw("RBF_interp bad pt size");
for (Int i=0;i<n;i++)
{
if ((r=rad(&pt[0],&pts[i][0])) == 0.)
return vals[i];
sum += (w = pow(r,pneg));
sumw += w*vals[i];
}
return sumw/sum;
}
Doub rad(const Doub *p1, const Doub *p2)
{
Doub sum = 0.;
for (Int i=0;i<dim;i++)
sum += SQR(p1[i]-p2[i]);
return sqrt(sum);
}
As you can see, for the interpolation of one point, the algorithm computes the distance of that point to each of the other points and taking it as a weight in the final value.
Even though this algorithm works it is much too slow compared to what I need since I will be computing a lot of points to map a grid of a certain region.
One way of optimizing this is that I could leave out the points than are beyond a certain radius, but would pose a problem for areas with few or no points.
Another thing would be to reduce the computing of the distance between each 2 points by only computing once a Look-up Table and storing the distances. The problem with this is that it is impossible to store such a large matrix (60000 x 60000).
The grid of temperatures that is obtained, will be used to compute contours for different temperature values.
If anyone knows a way to optimize this algorithm or maybe help with a better one, I will appreciate it.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果您有大量数据点并且将采用大量插值,那么具有无限支持的径向基函数可能不是您想要使用的。
有一些变体使用 N 个最近邻和有限支持来减少每个插值必须考虑的点数。可以在此处提到的第一个解决方案中找到这种情况的变体 反距离加权 (IDW ) 使用 Python 进行插值。 (尽管我一直怀疑这种实现在某些条件下可能是不连续的 - 当然有一些变体是可以的)
Radial basis functions with infinite support is probably not what you want to be using if you have a large number of data points and will be taking a large number of interpolation values.
There are variants that use N nearest neighbours and finite support to reduce the number of points that must be considered for each interpolation value. A variant of this can be found in the first solution mentioned here Inverse Distance Weighted (IDW) Interpolation with Python. (though I have a nagging suspicion that this implementation can be discontinuous under certain conditions - there are certainly variants that are fine)
您的查找表不必存储 60k 正方形中的每个点,只需存储那些重复使用的点即可。您可以将任意坐标
x
映射到int(x*resolution)
,以通过降低分辨率来提高命中率。类似的幂函数查找表也可能有帮助。
Your look-up table doesn't have to store every point in the 60k square, only those once which are used repeatedly. You can map any coordinate
x
toint(x*resolution)
to improve the hit rate by lowering the resolution.A similar lookup table for the power function might also help.