填充 numpy 数组中的空白
我只想用最简单的术语对 3D 数据集进行插值。线性插值,最近邻,所有这些就足够了(这是为了开始一些算法,所以不需要精确的估计)。
在新的 scipy 版本中,像 griddata 这样的东西会很有用,但目前我只有 scipy 0.8。所以我有一个“立方体”(data[:,:,:]
, (NixNjxNk))数组和一个标志数组(flags[:,:,:,]
code>、True
或 False
)具有相同的大小。我想使用例如数据中最近的有效数据点或“接近”点的某种线性组合,将我的数据插入到标志的相应元素为 False 的数据元素中。
数据集中至少在两个维度上可能存在较大间隙。除了使用 kdtree 或类似的方法编写成熟的最近邻算法之外,我真的找不到通用的 N 维最近邻插值器。
I just want to interpolate, in the simplest possible terms, a 3D dataset. Linear interpolation, nearest neighbour, all that would suffice (this is to start off some algorithm, so no accurate estimate is required).
In new scipy versions, things like griddata would be useful, but currently I only have scipy 0.8. So I have a "cube" (data[:,:,:]
, (NixNjxNk)) array, and an array of flags (flags[:,:,:,]
, True
or False
) of the same size. I want to interpolate my data for the elements of data where the corresponding element of flag is False, using eg the nearest valid datapoint in data, or some linear combination of "close by" points.
There can be large gaps in the dataset in at least two dimensions. Other than coding a full-blown nearest neighbour algorithm using kdtrees or similar, I can't really find a generic, N-dimensional nearest-neighbour interpolator.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
使用 scipy.ndimage,您的问题可以通过两行最近邻插值来解决:
现在,以函数的形式:
使用示例:
结果:
Using scipy.ndimage, your problem can be solved with nearest neighbor interpolation in 2 lines :
Now, in the form of a function:
Exemple of use:
result:
您可以设置晶体生长式算法,沿每个轴交替移动视图,仅替换标记为
False
但具有True
邻居的数据。这给出了类似“最近邻”的结果(但不是欧几里得距离或曼哈顿距离——我认为如果您正在计算像素,计算具有公共角的所有连接像素,它可能是最近邻)这对于 NumPy 来说应该相当有效因为它仅迭代轴和收敛迭代,而不是一小部分数据。原始、快速、稳定。我认为这就是您所追求的:
为了更好地衡量,这里是由最初标记为
True
的数据播种的区域的可视化 (2D)。You can set up a crystal-growth-style algorithm shifting a view alternately along each axis, replacing only data that is flagged with a
False
but has aTrue
neighbor. This gives a "nearest-neighbor"-like result (but not in Euclidean or Manhattan distance -- I think it might be nearest-neighbor if you are counting pixels, counting all connecting pixels with common corners) This should be fairly efficient with NumPy as it iterates over only axis and convergence iterations, not small slices of the data.Crude, fast and stable. I think that's what you were after:
For good measure, here's a visualization (2D) of the zones seeded by the data originally flagged
True
.不久前,我为博士学位编写了这个脚本: https://github.com/Technariumas/Inpainting
示例: http://blog.technariumas.lt/post/117630308826 /healing-holes-in-python-arrays
速度慢,但有效。高斯核是最好的选择,只需检查大小/西格玛值。
Some time ago I wrote this script for my PhD: https://github.com/Technariumas/Inpainting
An example: http://blog.technariumas.lt/post/117630308826/healing-holes-in-python-arrays
Slow, but does the work. Gaussian kernel is the best choice, just check size/sigma values.
您可以尝试解决您的问题,例如:
这实际上可以以非常简单的方式实现(特别是如果性能不是最关心的问题)。
显然这只是启发法,您需要用实际数据进行一些实验才能找到合适的适应方案。当将内核适应视为内核重新权衡时,您可能希望根据值的传播方式来执行此操作。例如,原始支撑的权重是 1,它们的衰减与它们出现的迭代有关。
此外,确定该过程何时真正收敛可能是一个棘手的问题。根据应用的不同,最终保留一些“间隙区域”保持“未填充”可能是合理的。
更新:这是一个非常简单的实现,与上面描述的 *) 一致:
以及
filler(.)
操作的简单演示,类似于:*)所以这里的
nanmean(.)
只是用来演示适应过程的想法。基于此演示,实施更复杂的适应和衰减权重方案应该非常简单。另请注意,没有关注实际的执行性能,但它仍然应该很好(具有合理的输入形状)。You may try to tackle your problem like:
This actually can be implemented quite a straightforward manner (especially if performance is not a top concern).
Obviously this is just heuristics and you need to do some experiments with your actual data to find proper adaptation scheme. When seeing kernel adaptation as kernel reweighing, you may like to do it based on how the values have been propagated. For example your weights for original supports are 1 and they decay related on which iteration they emerged.
Also the determination of when this process has actually converged may be tricky one. Depending on the application it may be reasonable eventually to leave some 'gap regions' remain 'unfilled'.
Update: Here is a very simple implementation along the lines *) described above:
And a simple demonstration of the
filler(.)
on action, would be something like:*) So here the
nanmean(.)
is just used to demonstrate the idea of the adaptation process. Based on this demonstration, it should be quite straightforward to implement a more complex adaptation and decaying weighing scheme. Also note that, no attention is paid to actual execution performance, but it still should be good (with reasonable input shapes).也许您正在寻找的是机器学习算法,例如神经网络或支持向量机。
您可以查看此页面,其中包含一些 Python 的 SVM 包的链接:http: //web.media.mit.edu/~stefie10/technical/pythonml.html
Maybe what you are looking for is a machine learning algorithm, like a neural network or a support vector machine.
You may check this page, which has some links to SVM packages for python: http://web.media.mit.edu/~stefie10/technical/pythonml.html