寻找数据的最佳线性部分
我有一些科学数据,希望找到适合直线的最佳区域。理论上,数据应该具有恒定的梯度,但其他影响会影响数据,从而存在非线性部分,如下所示
到目前为止,我已经尝试采用二阶导数并定位零值区域或具有适合的 100 个点的移动窗口并选择最小的区域卡方。但是,这些无法正确选择区域。选择数据的最佳区域来拟合直线的方法是什么?
I have some scientific data and wish to find the best region to fit a straight line in. Theoretically, the data should have a constant gradient but other influences effect the data such that there are non-linear sections as shown below
So far I've tried taking the second derivative and locate regions of zero value or having a moving window of 100 points that is fitted and select the region with minimum chi square. However, these haven't been able to select the region correctly. What is a method to select the best region of the data to fit with a straight line?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这也许不是一个答案,但对于评论来说太长了。
以下是我用于类似任务的一些想法。
我假设我们有 i=i..N 的数据 (x[],y[])。
基于 S 中索引的数据({1,,N} 的子集)的最小二乘最佳拟合线 x->a*x+b 是
其中此外
,“平均 chisq”是
并集的拟合两个不相交的间隔 S 和 T 可以根据 S 和 T 的拟合计算出来
。因此,例如,如果您遵循 Sembei 的建议,则可以非常有效地组合相邻(或其他)拟合。
上述所有公式都可以通过简单(尽管繁琐)的代数推导出来。
This is not, perhaps, an answer but was too long for a comment.
Here's a couple of ideas I've used for similar tasks.
I'm supposing we have data (x[],y[]) for i=i..N.
The least squares best fit line, x->a*x+b based on data with indices in S (a subset of {1,,N}) is
where
Moreover the 'average chisq' is
The fit for a union a union of two disjoint intervals S and T say can be calculated from the fits for S and T
So, for example, if you follow Sembei's suggestion combining adjacent (or other) fits can be done very efficiently.
All of the formulae above can be derived with straightforward (though tedious) algebra.