随着时间的推移不断构建子集的滚动中位数
我想计算数据集上滚动中位数的变体,它不是通过向前和向后进行 k 个观察来构建子集,而是通过考虑给定时间窗口内的所有观察来构建子集。
一个简单的实现可能如下所示:
windowwidth <- 30
median.window <- function(x) median(mydata[time <= x + windowwidth /2 & time >= x - windowwidth /2)
vapply(time, median.window)
但是,正如您可以想象的那样,这对于大型数据集来说并不是很有效。您是否看到了可能的改进或提供优化实施的软件包?您不能期望观察值随时间均匀分布。
zoo
提供了rollmedian
,但该函数不提供根据时间而是根据观察计数来选择winwod。
I would like to compute a variant of rolling medians on my dataset that does build the subsets not by going k
observerations to the front and back, but by taking all observations into account that are in a given time window.
A straightforward implemtation could look like this:
windowwidth <- 30
median.window <- function(x) median(mydata[time <= x + windowwidth /2 & time >= x - windowwidth /2)
vapply(time, median.window)
However, as you can imagine, this is not very efficient for large datasets. Do you see a possible improvement or a package providing an optimized implementation? You can not expect the observations be distributed equally over time.
zoo
provides rollmedian
, but this function does not offer to choose the winwod based on time but on the observation count.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
好的,试试这个:
用您的窗口宽度替换该函数,我想您就可以了。
编辑:为了响应 Thilo 的查询,看起来在一般情况下您应该使用
apply
。鉴于上面的内容,将您的观察值称为“timval”,再次编辑:显然我在那里睡着了:没有人想要基于所有这些零的中值。发帖之前我应该多考虑一下。添加这个:
我确信有比这更干净的“构建”
valmat
方法,但最终结果是您想要应用任何函数的“过滤矩阵”。Ok, try this:
Replace that function with your window width and I think you'll be all set.
Edit: In respons to Thilo's query, it looks like in the general case you should use
apply
. Given the stuff above, call your observation values "timval", asEdit again: clearly I was asleep there: nobody wants a median based on all those zeroes. I should think more before posting. Add this:
And I'm sure there's a cleaner way of 'building'
valmat
than this, but the final result is the "filter matrix" you want to apply any function to.