非连续时间序列上的 R 滚动平均值
我想对过去 X 天进行滚动平均值。 rollmean()
使用行来实现这一点。由于我使用的记录器有时会失败,并且数据也已清理,因此时间序列不是连续的(行不一定代表恒定的时间差)。
一位同事建议了下面的解决方案,效果很好。除了我的数据需要分组(在示例中按处理)。对于每一天,我想要每次治疗的最后 X 天的滚动平均值。
谢谢
# making some example data
# vector with days since the beginning of experiment
days <- 0:30
# random values df1 <- tibble::tibble(
days_since_beginning = days,
value_to_used = rnorm(length(days)),
treatment = sample(letters[1],31,replace = TRUE) )
df2 <- tibble::tibble(
days_since_beginning = days,
value_to_used = rnorm(length(days)),
treatment = sample(letters[2],31,replace = TRUE) )
df <- full_join(df1, df2)
# how long should be the period for mean
time_period <- 10 # calculate for last 10 days
df_mean <- df %>% dplyr::mutate(
# calculate rolling mean
roll_mean = purrr::map_dbl(
.x = days_since_beginning,
.f = ~ df %>%
# select only data for the last `time_period`
dplyr::filter(days_since_beginning >= .x - time_period &
days_since_beginning <= .x) %>%
purrr::pluck("value_to_used") %>%
mean() %>%
return()
) )
I want to make a rolling mean on the last X number of days. rollmean()
does that using rows. Since I am using loggers that sometimes fail, and also the data were cleaned, the time series is not continuous (rows do not necessarily represent a constant time difference).
A colleague suggested the solution below, which works great. Except my data need to be grouped (in the example by treatment). For each day, I want the rolling mean of the last X days for each treatment.
Thanks
# making some example data
# vector with days since the beginning of experiment
days <- 0:30
# random values df1 <- tibble::tibble(
days_since_beginning = days,
value_to_used = rnorm(length(days)),
treatment = sample(letters[1],31,replace = TRUE) )
df2 <- tibble::tibble(
days_since_beginning = days,
value_to_used = rnorm(length(days)),
treatment = sample(letters[2],31,replace = TRUE) )
df <- full_join(df1, df2)
# how long should be the period for mean
time_period <- 10 # calculate for last 10 days
df_mean <- df %>% dplyr::mutate(
# calculate rolling mean
roll_mean = purrr::map_dbl(
.x = days_since_beginning,
.f = ~ df %>%
# select only data for the last `time_period`
dplyr::filter(days_since_beginning >= .x - time_period &
days_since_beginning <= .x) %>%
purrr::pluck("value_to_used") %>%
mean() %>%
return()
) )
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这是过去 10 天治疗的平均值。 width 参数包括计算要返回使用的行数,以便它对应于 10 天而不是 10 行。这利用了宽度可以是向量的事实。
This takes the mean over the last 10 days by treatment. The width argument includes a computation of how many rows back to use so that it corresponds to 10 days rather than 10 rows. This uses the fact that width can be a vector.
同一位同事提出了他自己的解决方案:
我将结果与 G. Grothendieck 的想法进行了比较,只有当我在我同事的代码中使用
time_period
和time_period + 1
G.格洛腾迪克代码。因此,time_period
的使用方式有所不同,我对为什么会发生这种情况感到困惑。Same colleague came up with his own solution:
I compared the results with G. Grothendieck's idea, and it only matches if I use
time_period
in my colleague's code andtime_period + 1
in G. Grothendieck's code. So there is a difference in how thetime_period
is used, and I am confused about why it happens.