用于计算滚动窗口与给定线函数的平方和距离的算法
给定一个线函数y = a*x + b
(a
和b
是以前已知的常数),很容易计算总和:直线与样本窗口之间的平方距离 (1, Y1), (2, Y2), ..., (n, Yn)
(其中 Y1
是最旧的样本,Yn
是最新的):
sum((Yx - (a*x + b))^2 for x in 1,...,n)
我需要一个快速算法来计算滚动窗口(长度为 n )的该值 - 我无法在每次新样本到达时重新扫描窗口中的所有样本。
显然,应该为进入窗口的每个新样本和离开窗口的每个旧样本保存和更新某些状态。
请注意,当样本离开窗口时,其余样本的不定数也会发生变化 - 每个 Yx 都变为 Y(x-1)。因此,当样本离开窗口时,窗口中的每个其他样本都会为新的总和贡献不同的值:(Yx - (a*(x-1) + b))^2
而不是 <代码>(Yx - (a*x + b))^2。
是否有已知的算法来计算这个?如果没有,你能想到一个吗? (由于一阶线性近似,出现一些错误是可以的)。
Given a line function y = a*x + b
(a
and b
are previously known constants), it is easy to calculate the sum-of-squares distance between the line and a window of samples (1, Y1), (2, Y2), ..., (n, Yn)
(where Y1
is the oldest sample and Yn
is the newest):
sum((Yx - (a*x + b))^2 for x in 1,...,n)
I need a fast algorithm for calculating this value for a rolling window (of length n
) - I cannot rescan all the samples in the window every time a new sample arrives.
Obviously, some state should be saved and updated for every new sample that enters the window and every old sample leaves the window.
Notice that when a sample leaves the window, the indecies of the rest of the samples change as well - every Yx becomes Y(x-1). Therefore when a sample leaves the window, every other sample in the window contribute a different value to the new sum: (Yx - (a*(x-1) + b))^2
instead of (Yx - (a*x + b))^2
.
Is there a known algorithm for calculating this? If not, can you think of one? (It is ok to have some mistakes due to first-order linear approximations).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
难道直接的方法就不能解决问题吗?……
我所说的“直接”是指维护一个样本队列。一旦新样本到达,您将:
至于时间,这里的所有内容都是 O(1) 如果队列被实现为链表或类似的东西,您也希望将距离与样本存储在队列中,因此您只计算一次。因此,每个样本的内存使用量为 3 个浮点数 - O(n)。
Won't a straightforward approach do the trick?...
By 'straightforward' I mean maintaining a queue of samples. Once a new sample arrives, you would:
As for time, everything here is O(1) if the queue is implemented as linked list or something similar, You would want to store the distance with your samples in queue, too, so you calculate it only once. The memory usage is thus 3 floats per sample - O(n).
如果展开术语
(Yx - (a*x + b))^2
,这些术语将分为三个部分:a
,x
的术语代码>和<代码>b。当对n
求和时,它们会产生一些常量,并且可以忽略。Yx
和b
项。这些可以按照 @Xion 描述的 Boxcar 积分器的方式进行处理。-2*Yx*a*x
中的一项。-2*a
是一个常量,因此请忽略该部分。考虑部分和S = Y1*1 + Y2*2 + Y3*3 ... Yn*n
。给定Y1
和运行总和R = Y1 + Y2 + ... + Yn
,您可以找到消除Y1 的
并减少其他每一项,得到S - R
*1Y2*1 + Y3*2 + ... + Yn*(n-1)
。现在,通过减去Y1
并加上Y(n+1)
,更新运行总和R
,如 (2) 所示。将新的Yn*n
项添加到S
中。现在只需将所有这些部分项相加即可。
If you expand the term
(Yx - (a*x + b))^2
the terms break into three parts:a
,x
andb
. These produce some constant when summed overn
and can be ignored.Yx
andb
. These can be handled in the style of a boxcar integrator as @Xion described.-2*Yx*a*x
. The-2*a
is a constant so ignore that part. Consider the partial sumS = Y1*1 + Y2*2 + Y3*3 ... Yn*n
. GivenY1
and a running sumR = Y1 + Y2 + ... + Yn
you can findS - R
which eliminatesY1*1
and reduces each of the other terms, leaving you withY2*1 + Y3*2 + ... + Yn*(n-1)
. Now update the running sumR
as for (2) by subtracting offY1
and addingY(n+1)
. Add the newYn*n
term toS
.Now just add up all those partial terms.