寻找数据的最佳线性部分

发布于 2025-01-12 15:39:08 字数 235 浏览 0 评论 0原文

我有一些科学数据，希望找到适合直线的最佳区域。理论上，数据应该具有恒定的梯度，但其他影响会影响数据，从而存在非线性部分，如下所示

在此处输入图像描述

到目前为止，我已经尝试采用二阶导数并定位零值区域或具有适合的 100 个点的移动窗口并选择最小的区域卡方。但是，这些无法正确选择区域。选择数据的最佳区域来拟合直线的方法是什么？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

谁许谁一生繁华 2025-01-19 15:39:09

这也许不是一个答案，但对于评论来说太长了。

以下是我用于类似任务的一些想法。
我假设我们有 i=i..N 的数据 (x[],y[])。

基于 S 中索引的数据（{1,,N} 的子集）的最小二乘最佳拟合线 x->a*x+b 是

a = r; b = my(S) - r*mx(S)

其中此外

r = C(S)/Vx(S)
mx(S) = Sum{ i in S | x[i]}/|S|
my(S) = Sum{ i in S | y[i]}/|S|
Vx(S) = Sum{ i in S | square(x[i]-mx(S))/|S|
Vy(S) = Sum{ i in S | square(y[i]-my(S))/|S|
C(S) = Sum{ i in S | (x[i]-mx(S))*(y[i]-my(S))/|S|

，“平均 chisq”是

Sum{ i in S| square( y[i]-(a*x[i]+b))}/|S| = Vy(S)-C(S)*C(S)/Vx(S)

并集的拟合两个不相交的间隔 S 和 T 可以根据 S 和 T 的拟合计算出来

mx(S union T) = (|S|*mx(S) + |T|*mx(T))/(|S|+|T|)
Vx(S union T) =  (|S|/(|S|+|T|))*Vx(S)
                +(|T|/(|S|+|T|))*Vx(T) 
                +|S|*|T|/square( |S|+|T|))*square( mx(S)-mx(T))
my(S union T) = (|S|*my(S) + |T|*my(T))/(|S|+|T|)
Vy(S union T) =  (|S|/(|S|+|T|))*Vy(S)
                +(|T|/(|S|+|T|))*Vy(T) 
                +|S|*|T|/square( |S|+|T|))*square( my(S)-my(T))
C(S union T) =  (|S|/(|S|+|T|))*C(S)
                +(|T|/(|S|+|T|))*C(T) 
                +|S|*|T|/square( |S|+|T|))*(mx(S)-mx(T))*(my(S)-my(T))

。因此，例如，如果您遵循 Sembei 的建议，则可以非常有效地组合相邻（或其他）拟合。

上述所有公式都可以通过简单（尽管繁琐）的代数推导出来。

This is not, perhaps, an answer but was too long for a comment.

Here's a couple of ideas I've used for similar tasks.
I'm supposing we have data (x[],y[]) for i=i..N.

The least squares best fit line, x->a*x+b based on data with indices in S (a subset of {1,,N}) is

a = r; b = my(S) - r*mx(S)

where

r = C(S)/Vx(S)
mx(S) = Sum{ i in S | x[i]}/|S|
my(S) = Sum{ i in S | y[i]}/|S|
Vx(S) = Sum{ i in S | square(x[i]-mx(S))/|S|
Vy(S) = Sum{ i in S | square(y[i]-my(S))/|S|
C(S) = Sum{ i in S | (x[i]-mx(S))*(y[i]-my(S))/|S|

Moreover the 'average chisq' is

Sum{ i in S| square( y[i]-(a*x[i]+b))}/|S| = Vy(S)-C(S)*C(S)/Vx(S)

The fit for a union a union of two disjoint intervals S and T say can be calculated from the fits for S and T

mx(S union T) = (|S|*mx(S) + |T|*mx(T))/(|S|+|T|)
Vx(S union T) =  (|S|/(|S|+|T|))*Vx(S)
                +(|T|/(|S|+|T|))*Vx(T) 
                +|S|*|T|/square( |S|+|T|))*square( mx(S)-mx(T))
my(S union T) = (|S|*my(S) + |T|*my(T))/(|S|+|T|)
Vy(S union T) =  (|S|/(|S|+|T|))*Vy(S)
                +(|T|/(|S|+|T|))*Vy(T) 
                +|S|*|T|/square( |S|+|T|))*square( my(S)-my(T))
C(S union T) =  (|S|/(|S|+|T|))*C(S)
                +(|T|/(|S|+|T|))*C(T) 
                +|S|*|T|/square( |S|+|T|))*(mx(S)-mx(T))*(my(S)-my(T))

So, for example, if you follow Sembei's suggestion combining adjacent (or other) fits can be done very efficiently.

All of the formulae above can be derived with straightforward (though tedious) algebra.

回复收藏 0 原文

~没有更多了~