数据是否存在线性趋势?

发布于 2025-01-06 07:50:23 字数 309 浏览 1 评论 0原文

我有一个由整数 x = [x1,...,xn], n<1 000 000 数组表示的连续传入数据。每两个元素满足以下条件x[i] < x[i + 1]。

我需要尽快检测到这样的断点,即这些数据的线性趋势结束并转变为二次趋势。数据总是以线性趋势开始...

我尝试计算

k = (x[i+1] - x[i])/ (x[i] - x[i-1]) 

但是这个测试不太可靠...也许有一个更简单和有效的统计测试...在这种情况下回归线的计算很慢...

I am having a continuously incoming data represented by an array of integer x = [x1,...,xn], n<1 000 000. Each two elements satisfy the following condition x[i] < x[i + 1].

I need to detected as fast as possible such a breakpoint, where the linear trend of these data ends and transforms into a quadratic trend. The data always starts with linear trend...

I tried to compute

k = (x[i+1] - x[i])/ (x[i] - x[i-1]) 

But this test not too reliable... Maybe there is a more simple and efficent statistic test... The computation of the regression line is slow in this case...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

小耗子 2025-01-13 07:50:23

跟踪一阶导数和二阶导数。即保留x[i]-x[i-1]的均值和方差。并保留 (x[i+1]-x[i]) - (x[i]-x[i-1]) 的和与方差。

对于线性趋势,一阶导数的均值应该是恒定的,如果您观察到与均值的偏差(可以使用方差计算),那么您可以说出现了问题。二阶导数的平均值应为 0。

对于二次趋势,一阶导数的平均值会增加。所以你会发现很多样本与均值偏差很大。二阶导数的行为与线性情况下一阶导数的行为类似。

算法(仅使用二阶导数):

  1. 对于每个输入,计算符号(+ve 或 -ve)二阶导数
  2. 跟踪您最近获得的同质符号有多少(即,如果序列是 -+-++++ 则答案是 4)
  3. 如果齐次符号的长度大于阈值(比方说 40 ?),则将其标记为二次序列的开始

Keep track of first derivation and second derivation. That is, keep the mean and variance of x[i]-x[i-1]. And keep sum and variance of (x[i+1]-x[i]) - (x[i]-x[i-1]).

For linear trend the mean of first derivative should be constant and if you observe a deviation from mean (which you can calculate using variance), then you can say something is wrong. The mean of second derivative should be 0.

For quadratic trend, mean of first derivative increases. So you will find many samples with large deviation from mean. The second derivative's behavior is similar to behavior of first derivative in linear case.

An Algorithm (using just the second derivative):

  1. For each input, calculate the sign (+ve or -ve) second derivative
  2. Keep track of how many homogenous signs you got recently (i.e. if sequence is -+-++++ the answer is 4)
  3. If the length of homogenous signs is greater than a threshold (let us say 40 ?), then mark it as beginning of quadratic sequence
浮生未歇 2025-01-13 07:50:23

实际上你计算的是函数的导数。也许您应该使用更多点来计算它,例如 5,请参阅五点模板

Actually you calculate a derivative of the function. Possibly you should use more points for calculating it e.g. 5, see Five-point stencil

一紙繁鸢 2025-01-13 07:50:23

您可以在此处使用运行窗口回归。

W 点上的线性回归系数的计算涉及 X[i]、iX[i] 和 X[i]^2 形式的项之和。如果存储这些和,则可以通过推导最左边点的项并添加最右边点的项(iX[i] 变为 (i+1).X[i], ieiX[i],轻松移动一个点+X[i])。您的数据值为整数,不会有舍入累加。

也就是说,您可以在恒定时间内计算每 W 个连续点的运行回归,并检测相关系数的下降。

You can use a running window regression here.

The computation of the linear regression coefficients on W points involves sums of terms of the form X[i], i.X[i] and X[i]^2. If you store these sums, you easily shift by one point by deducing the terms for the leftmost point and adding the terms for the rightmost point (the i.X[i] becoming (i+1).X[i], i.e. i.X[i]+X[i]). Your data values are integer, there will be no roundoff accumulation.

This said, you can compute the running regression in constant time for every W consecutive points and detect a drop of the correlation coefficient.

白况 2025-01-13 07:50:23

对于超快的解决方案,您可以考虑进行如下测试:

| X[i + s] - 2 X[i] + X[i - s] | > k (X[i + s] - X[i - s])

对于精心选择的 s 和 k。

看一下 | 的情节X[i + s] - 2 X[i] + X[i - s] | / (X[i + s] - X[i - s]) 作为 i 的函数,用于增加 s 的值。

For an ultra-fast solution, you may consider a test like:

| X[i + s] - 2 X[i] + X[i - s] | > k (X[i + s] - X[i - s])

for well chosen s and k.

Have a look at a plot of | X[i + s] - 2 X[i] + X[i - s] | / (X[i + s] - X[i - s]) as a function of i, for increasing values of s.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文