非连续时间序列上的 R 滚动平均值

发布于 2025-01-14 05:07:29 字数 1254 浏览 1 评论 0原文

我想对过去 X 天进行滚动平均值。 rollmean() 使用行来实现这一点。由于我使用的记录器有时会失败,并且数据也已清理,因此时间序列不是连续的(行不一定代表恒定的时间差)。

一位同事建议了下面的解决方案,效果很好。除了我的数据需要分组(在示例中按处理)。对于每一天,我想要每次治疗的最后 X 天的滚动平均值。

谢谢

 # making some example data
 # vector with days since the beginning of experiment

days <- 0:30
 
 # random values df1 <-   tibble::tibble(
     days_since_beginning = days,
     value_to_used = rnorm(length(days)),
     treatment = sample(letters[1],31,replace = TRUE)   )
 
 df2 <-   tibble::tibble(
     days_since_beginning = days,
     value_to_used = rnorm(length(days)),
     treatment = sample(letters[2],31,replace = TRUE)   )
 
 df <- full_join(df1, df2)
 
 # how long should be the period for mean

 time_period <- 10 # calculate for last 10 days
 
 
 df_mean <- df %>%    dplyr::mutate(
     # calculate rolling mean 
     roll_mean = purrr::map_dbl(
       .x = days_since_beginning,
       .f = ~ df %>% 
         # select only data for the last `time_period`
         dplyr::filter(days_since_beginning >= .x - time_period &
                         days_since_beginning <= .x) %>% 
         purrr::pluck("value_to_used") %>% 
         mean() %>% 
         return()
     )   )

I want to make a rolling mean on the last X number of days. rollmean() does that using rows. Since I am using loggers that sometimes fail, and also the data were cleaned, the time series is not continuous (rows do not necessarily represent a constant time difference).

A colleague suggested the solution below, which works great. Except my data need to be grouped (in the example by treatment). For each day, I want the rolling mean of the last X days for each treatment.

Thanks

 # making some example data
 # vector with days since the beginning of experiment

days <- 0:30
 
 # random values df1 <-   tibble::tibble(
     days_since_beginning = days,
     value_to_used = rnorm(length(days)),
     treatment = sample(letters[1],31,replace = TRUE)   )
 
 df2 <-   tibble::tibble(
     days_since_beginning = days,
     value_to_used = rnorm(length(days)),
     treatment = sample(letters[2],31,replace = TRUE)   )
 
 df <- full_join(df1, df2)
 
 # how long should be the period for mean

 time_period <- 10 # calculate for last 10 days
 
 
 df_mean <- df %>%    dplyr::mutate(
     # calculate rolling mean 
     roll_mean = purrr::map_dbl(
       .x = days_since_beginning,
       .f = ~ df %>% 
         # select only data for the last `time_period`
         dplyr::filter(days_since_beginning >= .x - time_period &
                         days_since_beginning <= .x) %>% 
         purrr::pluck("value_to_used") %>% 
         mean() %>% 
         return()
     )   )

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

甜`诱少女 2025-01-21 05:07:29

这是过去 10 天治疗的平均值。 width 参数包括计算要返回使用的行数,以便它对应于 10 天而不是 10 行。这利用了宽度可以是向量的事实。

library(dplyr)
library(zoo)

df %>%
  group_by(treatment) %>%
  mutate(roll = rollapplyr(value_to_used, 
    seq_along(days_since_beginning) - findInterval(days_since_beginning - 10, days_since_beginning), 
    mean)) %>%
  ungroup

This takes the mean over the last 10 days by treatment. The width argument includes a computation of how many rows back to use so that it corresponds to 10 days rather than 10 rows. This uses the fact that width can be a vector.

library(dplyr)
library(zoo)

df %>%
  group_by(treatment) %>%
  mutate(roll = rollapplyr(value_to_used, 
    seq_along(days_since_beginning) - findInterval(days_since_beginning - 10, days_since_beginning), 
    mean)) %>%
  ungroup
江心雾 2025-01-21 05:07:29

同一位同事提出了他自己的解决方案:

df_mean <- 
  df %>%
  dplyr::group_by(treatment) %>% 
  tidyr::nest() %>% 
  dplyr::mutate(
    data_with_mean = purrr::map(
      .x = data,
      .f = ~ {
        dataset <- .x
        
        dataset %>% 
          dplyr::mutate(
            # calculate rolling mean 
            roll_mean = purrr::map_dbl(
              .x = days_since_beginning,
              .f = ~ dataset %>% 
                # select only data for the last `time_period`
                dplyr::filter(days_since_beginning >= .x - time_period &
                                days_since_beginning <= .x) %>% 
                purrr::pluck("value_to_used") %>% 
                mean() %>% 
                return()
            )) %>% 
          return()
        
      }
    )) %>% 
  dplyr::select(-data) %>% 
  tidyr::unnest(data_with_mean) %>% 
  dplyr::ungroup()

我将结果与 G. Grothendieck 的想法进行了比较,只有当我在我同事的代码中使用 time_periodtime_period + 1 G.格洛腾迪克代码。因此,time_period 的使用方式有所不同,我对为什么会发生这种情况感到困惑。

Same colleague came up with his own solution:

df_mean <- 
  df %>%
  dplyr::group_by(treatment) %>% 
  tidyr::nest() %>% 
  dplyr::mutate(
    data_with_mean = purrr::map(
      .x = data,
      .f = ~ {
        dataset <- .x
        
        dataset %>% 
          dplyr::mutate(
            # calculate rolling mean 
            roll_mean = purrr::map_dbl(
              .x = days_since_beginning,
              .f = ~ dataset %>% 
                # select only data for the last `time_period`
                dplyr::filter(days_since_beginning >= .x - time_period &
                                days_since_beginning <= .x) %>% 
                purrr::pluck("value_to_used") %>% 
                mean() %>% 
                return()
            )) %>% 
          return()
        
      }
    )) %>% 
  dplyr::select(-data) %>% 
  tidyr::unnest(data_with_mean) %>% 
  dplyr::ungroup()

I compared the results with G. Grothendieck's idea, and it only matches if I use time_period in my colleague's code and time_period + 1 in G. Grothendieck's code. So there is a difference in how the time_period is used, and I am confused about why it happens.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文