最小化两个数据集之间的插值误差
在下图的顶部,我们可以看到一些值(y 轴)随时间(x 轴)变化。
当发生这种情况时,我们在不同且不可预测的时间对值进行采样,并且我们还在两个数据集之间交替采样,以红色和蓝色表示。
在任何时候计算值时,我们期望红色和蓝色数据集都会返回相似的值。然而,如三个较小的方框所示,情况并非如此。随着时间的推移,每个数据集(红色和蓝色)的值将出现发散,然后收敛于原始值。
最初我使用线性插值来获取值,接下来我尝试使用 Catmull-Rom 插值。前者导致值在每个数据点之间接近,然后漂移;后者产生的值仍然更接近,但平均误差更大。
任何人都可以建议另一种策略或插值方法来提供更大的平滑度(也许通过使用每个数据集中更多的样本点)?
In the top of the diagrams below we can see some value (y-axis) changing over time (x-axis).
As this happens we are sampling the value at different and unpredictable times, also we are alternating the sampling between two data sets, indicated by red and blue.
When computing the value at any time, we expect that both red and blue data sets will return similar values. However as shown in the three smaller boxes this is not the case. Viewed over time the values from each data set (red and blue) will appear to diverge and then converge about the original value.
Initially I used linear interpolation to obtain a value, next I tried using Catmull-Rom interpolation. The former results in a values come close together and then drift apart between each data point; the latter results in values which remain closer, but where the average error is greater.
Can anyone suggest another strategy or interpolation method which will provide greater smoothing (perhaps by using a greater number of sample points from each data set)?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我相信您提出的问题在没有进一步了解底层采样过程的情况下没有直接答案。就其本质而言,样本之间的函数值可以仅仅是任何,因此我认为无法确保两个样本数组的插值收敛。
也就是说,如果您对底层过程有先验知识,那么您可以在多种插值方法中进行选择以最大限度地减少错误。例如,如果您测量阻力与机翼速度的函数关系,您就会知道该关系是平方 (a*V^2)。然后您可以选择二阶多项式拟合,并且两个系列的插值之间有很好的匹配。
I believe what you ask is a question that does not have a straight answer without further knowledge on the underlying sampled process. By its nature, the value of the function between samples can be merely anything, so I think there is no way to assure the convergence of the interpolations of two sample arrays.
That said, if you have a prior knowledge of the underlying process, then you can choose among several interpolation methods to minimize the errors. For example, if you measure the drag force as a function of the wing velocity, you know the relation is square (a*V^2). Then you can choose polynomial fitting of the 2nd order and have pretty good match between the interpolations of the two serieses.
尝试 B-splines:Catmull-Rom 插值(遍历数据点),B-样条线进行平滑。
例如,对于均匀间隔的数据(不是您的情况)
当然,插值的红色/蓝色曲线取决于红色/蓝色数据点的间距,
所以无法完美匹配。
Try B-splines: Catmull-Rom interpolates (goes through the data points), B-spline does smoothing.
For example, for uniformly-spaced data (not your case)
Of course the interpolated red / blue curves depend on the spacing of the red / blue data points,
so cannot match perfectly.
我想引用 Catmull-Rom 样条简介来建议不要使用 Catmull- ROM 用于此插值任务。
根据定义,红色插值曲线将穿过所有红色数据点,蓝色插值曲线将穿过所有蓝色点。因此,您不会获得两个数据集的最佳拟合。
您可以更改边界条件并使用两个数据集中的数据点进行分段近似,如这些 幻灯片。
I'd like to quote Introduction to Catmull-Rom Splines to suggest not using Catmull-Rom for this interpolation task.
By definition your red interpolated curve will pass through all red data points and your blue interpolated curve will pass through all blue points. Therefore you won't get a best fit for both data sets.
You might change your boundary conditions and use data points from both data sets for a piecewise approximation as shown in these slides.
我同意 ysap 的观点,即这个问题无法如您所期望的那样得到回答。可能有更好的插值方法,具体取决于您的模型动态 - 与 ysap 一样,我建议利用底层动态(如果已知)的方法。
关于红色/蓝色样本,我认为您已经对采样和插值数据集进行了很好的观察,我会挑战您最初的期望:
我不希望这样。如果您假设您无法完美插值 - 特别是如果插值误差与样本中的误差相比很大 - 那么您肯定会拥有一个连续误差函数,该函数在您的样本点最长(时间)内表现出最大误差。因此,具有不同样本点的两个数据集应该表现出您所看到的行为,因为远离红色样本点(在时间上)的点可能靠近(在时间上)蓝色样本点,反之亦然 - 如果您的点交错,这肯定是真的。因此,我希望你所展示的内容是:
(如果您没有有关潜在动态的信息(频率内容除外),那么贾科莫关于采样的观点是关键 - 但是,如果查看奈奎斯特以下的信息,则无需进行插值。)
I agree with ysap that this question cannot be answered as you may be expecting. There may be better interpolation methods, depending on your model dynamics - as with ysap, I recommend methods that utilize the underlying dynamics, if known.
Regarding the red/blue samples, I think you have made a good observation about sampled and interpolated data sets and I would challenge your original expectation that:
I do not expect this. If you assume that you cannot perfectly interpolate - and particularly if the interpolation error is large compared to the errors in samples - then you are certain to have a continuous error function that exhibits largest errors longest (time) from your sample points. Therefore two data sets that have differing sample points should exhibit the behaviour you see because points that are far (in time) from red sample points may be near (in time) to blue sample points and vice versa - if staggered as your points are, this is sure to be true. Thus I would expect what you show, that:
(If you do not have information about underlying dynamics (except frequency content), then Giacomo's points on sampling are key - however, you need not interpolate if looking at info below Nyquist.)
对原始连续函数进行采样时,采样频率应符合奈奎斯特-香农采样定理,否则采样过程会引入错误(也称为混叠)。由于两个数据集中的误差不同,因此在插值时会产生不同的值。
因此,您需要知道原函数的最高频率B,然后收集频率至少为2B的样本。如果您的函数具有非常高的频率并且您无法那么快地采样,那么您至少应该在采样之前尝试将它们过滤掉。
When sampling the original continuous function, the sampling frequency should comply to the Nyquist-Shannon sampling theorem, otherwise the sampling process introduces an error (also known as aliasing). The error, being different in the two datasets, results in a different value when you interpolate.
Therefore, you need to know the highest frequency B of the original function and then collect samples with a frequency at least 2B. If your function has very high frequencies and you cannot sample that fast, you should at least try to filter them away before sampling.