随着时间的推移不断构建子集的滚动中位数

发布于 2024-12-20 18:14:49 字数 444 浏览 1 评论 0原文

我想计算数据集上滚动中位数的变体,它不是通过向前和向后进行 k 个观察来构建子集,而是通过考虑给定时间窗口内的所有观察来构建子集。

一个简单的实现可能如下所示:

windowwidth <- 30
median.window <- function(x) median(mydata[time <= x + windowwidth /2 & time >= x - windowwidth /2)
vapply(time, median.window)

但是,正如您可以想象的那样,这对于大型数据集来说并不是很有效。您是否看到了可能的改进或提供优化实施的软件包?您不能期望观察值随时间均匀分布。

zoo提供了rollmedian,但该函数不提供根据时间而是根据观察计数来选择winwod。

I would like to compute a variant of rolling medians on my dataset that does build the subsets not by going k observerations to the front and back, but by taking all observations into account that are in a given time window.

A straightforward implemtation could look like this:

windowwidth <- 30
median.window <- function(x) median(mydata[time <= x + windowwidth /2 & time >= x - windowwidth /2)
vapply(time, median.window)

However, as you can imagine, this is not very efficient for large datasets. Do you see a possible improvement or a package providing an optimized implementation? You can not expect the observations be distributed equally over time.

zoo provides rollmedian, but this function does not offer to choose the winwod based on time but on the observation count.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

无言温柔 2024-12-27 18:14:49

好的,试试这个:

Rgames: timeseq<-1:5 
Rgames: winmat <- outer(timeseq,timeseq,FUN=function(x,y) y>=x &y<=x+2) 
Rgames: winmat 
      [,1]  [,2]  [,3]  [,4]  [,5] 
[1,]  TRUE  TRUE  TRUE FALSE FALSE 
[2,] FALSE  TRUE  TRUE  TRUE FALSE 
[3,] FALSE FALSE  TRUE  TRUE  TRUE 
[4,] FALSE FALSE FALSE  TRUE  TRUE 
[5,] FALSE FALSE FALSE FALSE  TRUE 
Rgames: winmat %*% timeseq 
     [,1] 
[1,]    6 
[2,]    9 
[3,]   12 
[4,]    9 
[5,]    5 

用您的窗口宽度替换该函数,我想您就可以了。
编辑:为了响应 Thilo 的查询,看起来在一般情况下您应该使用 apply。鉴于上面的内容,将您的观察值称为“timval”,

Rgames: timval<-c(3,4,2,6,1)
Rgames: valmat<-timval*t(winmat)
Rgames: valmat
     [,1] [,2] [,3] [,4] [,5]
[1,]    3    0    0    0    0
[2,]    4    4    0    0    0
[3,]    2    2    2    0    0
[4,]    0    6    6    6    0
[5,]    0    0    1    1    1
Rgames: apply(valmat,2,median)
[1] 2 2 1 0 0

再次编辑:显然我在那里睡着了:没有人想要基于所有这些零的中值。发帖之前我应该​​多考虑一下。添加这个:

valmat[valmat==0]<- NA
apply(valmat,2, median, na.rm=T)
[1] 3.0 4.0 2.0 3.5 1.0

我确信有比这更干净的“构建”valmat 方法,但最终结果是您想要应用任何函数的“过滤矩阵”。

Ok, try this:

Rgames: timeseq<-1:5 
Rgames: winmat <- outer(timeseq,timeseq,FUN=function(x,y) y>=x &y<=x+2) 
Rgames: winmat 
      [,1]  [,2]  [,3]  [,4]  [,5] 
[1,]  TRUE  TRUE  TRUE FALSE FALSE 
[2,] FALSE  TRUE  TRUE  TRUE FALSE 
[3,] FALSE FALSE  TRUE  TRUE  TRUE 
[4,] FALSE FALSE FALSE  TRUE  TRUE 
[5,] FALSE FALSE FALSE FALSE  TRUE 
Rgames: winmat %*% timeseq 
     [,1] 
[1,]    6 
[2,]    9 
[3,]   12 
[4,]    9 
[5,]    5 

Replace that function with your window width and I think you'll be all set.
Edit: In respons to Thilo's query, it looks like in the general case you should use apply. Given the stuff above, call your observation values "timval", as

Rgames: timval<-c(3,4,2,6,1)
Rgames: valmat<-timval*t(winmat)
Rgames: valmat
     [,1] [,2] [,3] [,4] [,5]
[1,]    3    0    0    0    0
[2,]    4    4    0    0    0
[3,]    2    2    2    0    0
[4,]    0    6    6    6    0
[5,]    0    0    1    1    1
Rgames: apply(valmat,2,median)
[1] 2 2 1 0 0

Edit again: clearly I was asleep there: nobody wants a median based on all those zeroes. I should think more before posting. Add this:

valmat[valmat==0]<- NA
apply(valmat,2, median, na.rm=T)
[1] 3.0 4.0 2.0 3.5 1.0

And I'm sure there's a cleaner way of 'building' valmat than this, but the final result is the "filter matrix" you want to apply any function to.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文