计算移动平均线

发布于 2024-07-16 19:47:44 字数 166 浏览 9 评论 0 原文

我正在尝试使用 R 计算矩阵中一系列值的移动平均值。 R 中似乎没有 内置函数 可以让我计算移动平均线。 有任何套餐提供吗? 还是我需要自己写?

I'm trying to use R to calculate the moving average over a series of values in a matrix. There doesn't seem to be a built-in function in R that will allow me to calculate moving averages. Do any packages provide one? Or do I need to write my own?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(18

不回头走下去 2024-07-23 19:47:45

如果您希望系列的两端不是 NA 而是递归计算移动平均线,则另一个有用的函数:

smoothing = function(x, k=1) {
  sapply(seq_along(x), function(i) {
    i.min = max(i-k, 1)
    i.max = min(i+k, length(x))
    mean(x[i.min:i.max], na.rm=TRUE)
  })
}

示例:

x = 1:10/2

[1] 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

smoothing(x, 2)

[1] 1.00 1.25 1.50 2.00 2.50 3.00 3.50 4.00 4.25 4.50

Another useful function if you want the two ends of series not to be NA but to be recursively calculated moving averages:

smoothing = function(x, k=1) {
  sapply(seq_along(x), function(i) {
    i.min = max(i-k, 1)
    i.max = min(i+k, length(x))
    mean(x[i.min:i.max], na.rm=TRUE)
  })
}

Example:

x = 1:10/2

[1] 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

smoothing(x, 2)

[1] 1.00 1.25 1.50 2.00 2.50 3.00 3.50 4.00 4.25 4.50

や三分注定 2024-07-23 19:47:44

或者你可以简单地使用filter来计算它,这是我使用的函数:

ma <- function(x, n = 5){filter(x, rep(1 / n, n), sides = 2)}

如果你使用dplyr,请小心在上面的函数中指定stats::filter

Or you can simply calculate it using filter, here's the function I use:

ma <- function(x, n = 5){filter(x, rep(1 / n, n), sides = 2)}

If you use dplyr, be careful to specify stats::filter in the function above.

阿楠 2024-07-23 19:47:44
  • zoo 包中的滚动均值/最大值/中位数 (rollmean)
  • TTR
  • 位于 预测
  • Rolling Means/Maximums/Medians in the zoo package (rollmean)
  • MovingAverages in TTR
  • ma in forecast
放低过去 2024-07-23 19:47:44

使用 cumsum 应该足够且高效。 假设您有一个向量 x 并且需要 n 个数字的运行总和

cx <- c(0,cumsum(x))
rsum <- (cx[(n+1):length(cx)] - cx[1:(length(cx) - n)]) / n

正如 @mzuther 的评论中所指出的,这假设数据中没有 NA。 要处理这些问题,需要将每个窗口除以非 NA 值的数量。 这是一种方法,结合@Ricardo Cruz 的评论:

cx <- c(0, cumsum(ifelse(is.na(x), 0, x)))
cn <- c(0, cumsum(ifelse(is.na(x), 0, 1)))
rx <- cx[(n+1):length(cx)] - cx[1:(length(cx) - n)]
rn <- cn[(n+1):length(cx)] - cn[1:(length(cx) - n)]
rsum <- rx / rn

这仍然存在问题,如果窗口中的所有值都是 NA,那么将会出现被零除错误。

Using cumsum should be sufficient and efficient. Assuming you have a vector x and you want a running sum of n numbers

cx <- c(0,cumsum(x))
rsum <- (cx[(n+1):length(cx)] - cx[1:(length(cx) - n)]) / n

As pointed out in the comments by @mzuther, this assumes that there are no NAs in the data. to deal with those would require dividing each window by the number of non-NA values. Here's one way of doing that, incorporating the comment from @Ricardo Cruz:

cx <- c(0, cumsum(ifelse(is.na(x), 0, x)))
cn <- c(0, cumsum(ifelse(is.na(x), 0, 1)))
rx <- cx[(n+1):length(cx)] - cx[1:(length(cx) - n)]
rn <- cn[(n+1):length(cx)] - cn[1:(length(cx) - n)]
rsum <- rx / rn

This still has the issue that if all the values in the window are NAs then there will be a division by zero error.

寒尘 2024-07-23 19:47:44

data.table 1.12.0 中添加了新的 frollmean 函数来计算快速且精确的滚动平均值,仔细处理 NANaN< /code> 和 +Inf-Inf 值。

由于问题中没有可重现的示例,因此这里没有更多需要解决的问题。

您可以在手册中找到有关 ?frollmean 的更多信息,也可以在线访问 ?frollmean

以下手册中的示例:

library(data.table)
d = as.data.table(list(1:6/2, 3:8/4))

# rollmean of single vector and single window
frollmean(d[, V1], 3)

# multiple columns at once
frollmean(d, 3)

# multiple windows at once
frollmean(d[, .(V1)], c(3, 4))

# multiple columns and multiple windows at once
frollmean(d, c(3, 4))

## three above are embarrassingly parallel using openmp

In data.table 1.12.0 new frollmean function has been added to compute fast and exact rolling mean carefully handling NA, NaN and +Inf, -Inf values.

As there is no reproducible example in the question there is not much more to address here.

You can find more info about ?frollmean in manual, also available online at ?frollmean.

Examples from manual below:

library(data.table)
d = as.data.table(list(1:6/2, 3:8/4))

# rollmean of single vector and single window
frollmean(d[, V1], 3)

# multiple columns at once
frollmean(d, 3)

# multiple windows at once
frollmean(d[, .(V1)], c(3, 4))

# multiple columns and multiple windows at once
frollmean(d, c(3, 4))

## three above are embarrassingly parallel using openmp
浅语花开 2024-07-23 19:47:44

caTools 包具有非常快速的滚动平均值/最小值/最大值/标准差和一些其他功能。 我只使用过 runmean 和 runsd,它们是迄今为止提到的任何其他软件包中最快的。

The caTools package has very fast rolling mean/min/max/sd and few other functions. I've only worked with runmean and runsd and they are the fastest of any of the other packages mentioned to date.

白云悠悠 2024-07-23 19:47:44

以下示例代码展示了如何使用 zoo 包。

library(tidyverse)
library(zoo)

some_data = tibble(day = 1:10)
# cma = centered moving average
# tma = trailing moving average
some_data = some_data %>%
    mutate(cma = rollmean(day, k = 3, fill = NA)) %>%
    mutate(tma = rollmean(day, k = 3, fill = NA, align = "right"))
some_data
#> # A tibble: 10 x 3
#>      day   cma   tma
#>    <int> <dbl> <dbl>
#>  1     1    NA    NA
#>  2     2     2    NA
#>  3     3     3     2
#>  4     4     4     3
#>  5     5     5     4
#>  6     6     6     5
#>  7     7     7     6
#>  8     8     8     7
#>  9     9     9     8
#> 10    10    NA     9

Here is example code showing how to compute a centered moving average and a trailing moving average using the rollmean function from the zoo package.

library(tidyverse)
library(zoo)

some_data = tibble(day = 1:10)
# cma = centered moving average
# tma = trailing moving average
some_data = some_data %>%
    mutate(cma = rollmean(day, k = 3, fill = NA)) %>%
    mutate(tma = rollmean(day, k = 3, fill = NA, align = "right"))
some_data
#> # A tibble: 10 x 3
#>      day   cma   tma
#>    <int> <dbl> <dbl>
#>  1     1    NA    NA
#>  2     2     2    NA
#>  3     3     3     2
#>  4     4     4     3
#>  5     5     5     4
#>  6     6     6     5
#>  7     7     7     6
#>  8     8     8     7
#>  9     9     9     8
#> 10    10    NA     9
記憶穿過時間隧道 2024-07-23 19:47:44

您可以使用 RcppRoll 来实现用 C++ 编写的非常快速的移动平均线。 只需调用 roll_mean 函数即可。 可以在此处找到文档。

否则,这个(较慢的)for循环应该可以解决问题:

ma <- function(arr, n=15){
  res = arr
  for(i in n:length(arr)){
    res[i] = mean(arr[(i-n):i])
  }
  res
}

You could use RcppRoll for very quick moving averages written in C++. Just call the roll_mean function. Docs can be found here.

Otherwise, this (slower) for loop should do the trick:

ma <- function(arr, n=15){
  res = arr
  for(i in n:length(arr)){
    res[i] = mean(arr[(i-n):i])
  }
  res
}
晨敛清荷 2024-07-23 19:47:44

事实上,RcppRoll 非常好。

cantdutchthis 发布的代码必须在第四行中更正以修复窗口:

ma <- function(arr, n=15){
  res = arr
  for(i in n:length(arr)){
    res[i] = mean(arr[(i-n+1):i])
  }
  res
}

另一种处理缺失的方法是此处给出。

第三种方法是改进 cantdutchthis 代码来计算部分平均值,如下所示:

  ma <- function(x, n=2,parcial=TRUE){
  res = x #set the first values

  if (parcial==TRUE){
    for(i in 1:length(x)){
      t<-max(i-n+1,1)
      res[i] = mean(x[t:i])
    }
    res

  }else{
    for(i in 1:length(x)){
      t<-max(i-n+1,1)
      res[i] = mean(x[t:i])
    }
    res[-c(seq(1,n-1,1))] #remove the n-1 first,i.e., res[c(-3,-4,...)]
  }
}

In fact RcppRoll is very good.

The code posted by cantdutchthis must be corrected in the fourth line to the window be fixed:

ma <- function(arr, n=15){
  res = arr
  for(i in n:length(arr)){
    res[i] = mean(arr[(i-n+1):i])
  }
  res
}

Another way, which handles missings, is given here.

A third way, improving cantdutchthis code to calculate partial averages or not, follows:

  ma <- function(x, n=2,parcial=TRUE){
  res = x #set the first values

  if (parcial==TRUE){
    for(i in 1:length(x)){
      t<-max(i-n+1,1)
      res[i] = mean(x[t:i])
    }
    res

  }else{
    for(i in 1:length(x)){
      t<-max(i-n+1,1)
      res[i] = mean(x[t:i])
    }
    res[-c(seq(1,n-1,1))] #remove the n-1 first,i.e., res[c(-3,-4,...)]
  }
}
心的位置 2024-07-23 19:47:44

您可以通过以下方式计算窗口宽度为 k 的向量 x 的移动平均值:

apply(embed(x, k), 1, mean)

You may calculate the moving average of a vector x with a window width of k by:

apply(embed(x, k), 1, mean)
北城挽邺 2024-07-23 19:47:44

为了补充 cantdutchthis罗德里戈·雷梅迪奥;

moving_fun <- function(x, w, FUN, ...) {
  # x: a double vector
  # w: the length of the window, i.e., the section of the vector selected to apply FUN
  # FUN: a function that takes a vector and return a summarize value, e.g., mean, sum, etc.
  # Given a double type vector apply a FUN over a moving window from left to the right, 
  #    when a window boundary is not a legal section, i.e. lower_bound and i (upper bound) 
  #    are not contained in the length of the vector, return a NA_real_
  if (w < 1) {
    stop("The length of the window 'w' must be greater than 0")
  }
  output <- x
  for (i in 1:length(x)) {
     # plus 1 because the index is inclusive with the upper_bound 'i'
    lower_bound <- i - w + 1
    if (lower_bound < 1) {
      output[i] <- NA_real_
    } else {
      output[i] <- FUN(x[lower_bound:i, ...])
    }
  }
  output
}

# example
v <- seq(1:10)

# compute a MA(2)
moving_fun(v, 2, mean)

# compute moving sum of two periods
moving_fun(v, 2, sum)

In order to complement the answer of cantdutchthis and Rodrigo Remedio;

moving_fun <- function(x, w, FUN, ...) {
  # x: a double vector
  # w: the length of the window, i.e., the section of the vector selected to apply FUN
  # FUN: a function that takes a vector and return a summarize value, e.g., mean, sum, etc.
  # Given a double type vector apply a FUN over a moving window from left to the right, 
  #    when a window boundary is not a legal section, i.e. lower_bound and i (upper bound) 
  #    are not contained in the length of the vector, return a NA_real_
  if (w < 1) {
    stop("The length of the window 'w' must be greater than 0")
  }
  output <- x
  for (i in 1:length(x)) {
     # plus 1 because the index is inclusive with the upper_bound 'i'
    lower_bound <- i - w + 1
    if (lower_bound < 1) {
      output[i] <- NA_real_
    } else {
      output[i] <- FUN(x[lower_bound:i, ...])
    }
  }
  output
}

# example
v <- seq(1:10)

# compute a MA(2)
moving_fun(v, 2, mean)

# compute moving sum of two periods
moving_fun(v, 2, sum)
来日方长 2024-07-23 19:47:44

滑块包可用于此目的。 它有一个专门设计的界面,感觉类似于 purrr。 它接受任何任意函数,并且可以返回任何类型的输出。 数据帧甚至逐行迭代。 pkgdown 网站位于此处

library(slider)

x <- 1:3

# Mean of the current value + 1 value before it
# returned as a double vector
slide_dbl(x, ~mean(.x, na.rm = TRUE), .before = 1)
#> [1] 1.0 1.5 2.5


df <- data.frame(x = x, y = x)

# Slide row wise over data frames
slide(df, ~.x, .before = 1)
#> [[1]]
#>   x y
#> 1 1 1
#> 
#> [[2]]
#>   x y
#> 1 1 1
#> 2 2 2
#> 
#> [[3]]
#>   x y
#> 1 2 2
#> 2 3 3

slider 和 data.table 的 frollapply() 的开销应该相当低(比 Zoo 快得多)。 对于这里的这个简单示例,frollapply() 看起来要快一些,但请注意,它只接受数字输入,并且输出必须是标量数值。 滑块函数是完全通用的,您可以返回任何数据类型。

library(slider)
library(zoo)
library(data.table)

x <- 1:50000 + 0L

bench::mark(
  slider = slide_int(x, function(x) 1L, .before = 5, .complete = TRUE),
  zoo = rollapplyr(x, FUN = function(x) 1L, width = 6, fill = NA),
  datatable = frollapply(x, n = 6, FUN = function(x) 1L),
  iterations = 200
)
#> # A tibble: 3 x 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 slider      19.82ms   26.4ms     38.4    829.8KB     19.0
#> 2 zoo        177.92ms  211.1ms      4.71    17.9MB     24.8
#> 3 datatable    7.78ms   10.9ms     87.9    807.1KB     38.7

The slider package can be used for this. It has an interface that has been specifically designed to feel similar to purrr. It accepts any arbitrary function, and can return any type of output. Data frames are even iterated over row wise. The pkgdown site is here.

library(slider)

x <- 1:3

# Mean of the current value + 1 value before it
# returned as a double vector
slide_dbl(x, ~mean(.x, na.rm = TRUE), .before = 1)
#> [1] 1.0 1.5 2.5


df <- data.frame(x = x, y = x)

# Slide row wise over data frames
slide(df, ~.x, .before = 1)
#> [[1]]
#>   x y
#> 1 1 1
#> 
#> [[2]]
#>   x y
#> 1 1 1
#> 2 2 2
#> 
#> [[3]]
#>   x y
#> 1 2 2
#> 2 3 3

The overhead of both slider and data.table's frollapply() should be pretty low (much faster than zoo). frollapply() looks to be a little faster for this simple example here, but note that it only takes numeric input, and the output must be a scalar numeric value. slider functions are completely generic, and you can return any data type.

library(slider)
library(zoo)
library(data.table)

x <- 1:50000 + 0L

bench::mark(
  slider = slide_int(x, function(x) 1L, .before = 5, .complete = TRUE),
  zoo = rollapplyr(x, FUN = function(x) 1L, width = 6, fill = NA),
  datatable = frollapply(x, n = 6, FUN = function(x) 1L),
  iterations = 200
)
#> # A tibble: 3 x 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 slider      19.82ms   26.4ms     38.4    829.8KB     19.0
#> 2 zoo        177.92ms  211.1ms      4.71    17.9MB     24.8
#> 3 datatable    7.78ms   10.9ms     87.9    807.1KB     38.7
纵性 2024-07-23 19:47:44

编辑:非常乐意添加 side 参数,例如 Date 的过去 7 天的移动平均值(或总和,或...)向量。


对于只想自己计算的人来说,无非是:

# x = vector with numeric data
# w = window length
y <- numeric(length = length(x))

for (i in seq_len(length(x))) {
  ind <- c((i - floor(w / 2)):(i + floor(w / 2)))
  ind <- ind[ind %in% seq_len(length(x))]
  y[i] <- mean(x[ind])
}

y

但是使其独立于 mean() 会很有趣,因此您可以计算任何“移动”函数!

# our working horse:
moving_fn <- function(x, w, fun, ...) {
  # x = vector with numeric data
  # w = window length
  # fun = function to apply
  # side = side to take, (c)entre, (l)eft or (r)ight
  # ... = parameters passed on to 'fun'
  y <- numeric(length(x))
  for (i in seq_len(length(x))) {
    if (side %in% c("c", "centre", "center")) {
      ind <- c((i - floor(w / 2)):(i + floor(w / 2)))
    } else if (side %in% c("l", "left")) {
      ind <- c((i - floor(w) + 1):i)
    } else if (side %in% c("r", "right")) {
      ind <- c(i:(i + floor(w) - 1))
    } else {
      stop("'side' must be one of 'centre', 'left', 'right'", call. = FALSE)
    }
    ind <- ind[ind %in% seq_len(length(x))]
    y[i] <- fun(x[ind], ...)
  }
  y
}

# and now any variation you can think of!
moving_average <- function(x, w = 5, side = "centre", na.rm = FALSE) {
  moving_fn(x = x, w = w, fun = mean, side = side, na.rm = na.rm)
}

moving_sum <- function(x, w = 5, side = "centre", na.rm = FALSE) {
  moving_fn(x = x, w = w, fun = sum, side = side, na.rm = na.rm)
}

moving_maximum <- function(x, w = 5, side = "centre", na.rm = FALSE) {
  moving_fn(x = x, w = w, fun = max, side = side, na.rm = na.rm)
}

moving_median <- function(x, w = 5, side = "centre", na.rm = FALSE) {
  moving_fn(x = x, w = w, fun = median, side = side, na.rm = na.rm)
}

moving_Q1 <- function(x, w = 5, side = "centre", na.rm = FALSE) {
  moving_fn(x = x, w = w, fun = quantile, side = side, na.rm = na.rm, 0.25)
}

moving_Q3 <- function(x, w = 5, side = "centre", na.rm = FALSE) {
  moving_fn(x = x, w = w, fun = quantile, side = side, na.rm = na.rm, 0.75)
}

EDIT: took great joy in adding the side parameter, for a moving average (or sum, or ...) of e.g. the past 7 days of a Date vector.


For people just wanting to calculate this themselves, it's nothing more than:

# x = vector with numeric data
# w = window length
y <- numeric(length = length(x))

for (i in seq_len(length(x))) {
  ind <- c((i - floor(w / 2)):(i + floor(w / 2)))
  ind <- ind[ind %in% seq_len(length(x))]
  y[i] <- mean(x[ind])
}

y

But it gets fun to make it independent of mean(), so you can calculate any 'moving' function!

# our working horse:
moving_fn <- function(x, w, fun, ...) {
  # x = vector with numeric data
  # w = window length
  # fun = function to apply
  # side = side to take, (c)entre, (l)eft or (r)ight
  # ... = parameters passed on to 'fun'
  y <- numeric(length(x))
  for (i in seq_len(length(x))) {
    if (side %in% c("c", "centre", "center")) {
      ind <- c((i - floor(w / 2)):(i + floor(w / 2)))
    } else if (side %in% c("l", "left")) {
      ind <- c((i - floor(w) + 1):i)
    } else if (side %in% c("r", "right")) {
      ind <- c(i:(i + floor(w) - 1))
    } else {
      stop("'side' must be one of 'centre', 'left', 'right'", call. = FALSE)
    }
    ind <- ind[ind %in% seq_len(length(x))]
    y[i] <- fun(x[ind], ...)
  }
  y
}

# and now any variation you can think of!
moving_average <- function(x, w = 5, side = "centre", na.rm = FALSE) {
  moving_fn(x = x, w = w, fun = mean, side = side, na.rm = na.rm)
}

moving_sum <- function(x, w = 5, side = "centre", na.rm = FALSE) {
  moving_fn(x = x, w = w, fun = sum, side = side, na.rm = na.rm)
}

moving_maximum <- function(x, w = 5, side = "centre", na.rm = FALSE) {
  moving_fn(x = x, w = w, fun = max, side = side, na.rm = na.rm)
}

moving_median <- function(x, w = 5, side = "centre", na.rm = FALSE) {
  moving_fn(x = x, w = w, fun = median, side = side, na.rm = na.rm)
}

moving_Q1 <- function(x, w = 5, side = "centre", na.rm = FALSE) {
  moving_fn(x = x, w = w, fun = quantile, side = side, na.rm = na.rm, 0.25)
}

moving_Q3 <- function(x, w = 5, side = "centre", na.rm = FALSE) {
  moving_fn(x = x, w = w, fun = quantile, side = side, na.rm = na.rm, 0.75)
}
叹沉浮 2024-07-23 19:47:44

虽然有点慢,但您也可以使用zoo::rollapply 对矩阵执行计算。

reqd_ma <- rollapply(x, FUN = mean, width = n)

其中x是数据集,FUN =mean是函数; 您还可以将其更改为 min、max、sd 等,宽度是滚动窗口。

Though a bit slow but you can also use zoo::rollapply to perform calculations on matrices.

reqd_ma <- rollapply(x, FUN = mean, width = n)

where x is the data set, FUN = mean is the function; you can also change it to min, max, sd etc and width is the rolling window.

小霸王臭丫头 2024-07-23 19:47:44

可以使用 runner 包来移动函数。 在本例中,mean_run 函数。 cummean 的问题在于它不处理 NA 值,但 mean_run 可以。 runner 包还支持不规则时间序列,并且窗口可以依赖于日期:

library(runner)
set.seed(11)
x1 <- rnorm(15)
x2 <- sample(c(rep(NA,5), rnorm(15)), 15, replace = TRUE)
date <- Sys.Date() + cumsum(sample(1:3, 15, replace = TRUE))

mean_run(x1)
#>  [1] -0.5910311 -0.2822184 -0.6936633 -0.8609108 -0.4530308 -0.5332176
#>  [7] -0.2679571 -0.1563477 -0.1440561 -0.2300625 -0.2844599 -0.2897842
#> [13] -0.3858234 -0.3765192 -0.4280809

mean_run(x2, na_rm = TRUE)
#>  [1] -0.18760011 -0.09022066 -0.06543317  0.03906450 -0.12188853 -0.13873536
#>  [7] -0.13873536 -0.14571604 -0.12596067 -0.11116961 -0.09881996 -0.08871569
#> [13] -0.05194292 -0.04699909 -0.05704202

mean_run(x2, na_rm = FALSE )
#>  [1] -0.18760011 -0.09022066 -0.06543317  0.03906450 -0.12188853 -0.13873536
#>  [7]          NA          NA          NA          NA          NA          NA
#> [13]          NA          NA          NA

mean_run(x2, na_rm = TRUE, k = 4)
#>  [1] -0.18760011 -0.09022066 -0.06543317  0.03906450 -0.10546063 -0.16299272
#>  [7] -0.21203756 -0.39209010 -0.13274756 -0.05603811 -0.03894684  0.01103493
#> [13]  0.09609256  0.09738460  0.04740283

mean_run(x2, na_rm = TRUE, k = 4, idx = date)
#> [1] -0.187600111 -0.090220655 -0.004349696  0.168349653 -0.206571573 -0.494335093
#> [7] -0.222969541 -0.187600111 -0.087636571  0.009742884  0.009742884  0.012326968
#> [13]  0.182442234  0.125737145  0.059094786

还可以指定其他选项,例如 lag,并且仅在 at 特定索引滚动。 更多内容请参见 package函数 文档。

One can use runner package for moving functions. In this case mean_run function. Problem with cummean is that it doesn't handle NA values, but mean_run does. runner package also supports irregular time series and windows can depend on date:

library(runner)
set.seed(11)
x1 <- rnorm(15)
x2 <- sample(c(rep(NA,5), rnorm(15)), 15, replace = TRUE)
date <- Sys.Date() + cumsum(sample(1:3, 15, replace = TRUE))

mean_run(x1)
#>  [1] -0.5910311 -0.2822184 -0.6936633 -0.8609108 -0.4530308 -0.5332176
#>  [7] -0.2679571 -0.1563477 -0.1440561 -0.2300625 -0.2844599 -0.2897842
#> [13] -0.3858234 -0.3765192 -0.4280809

mean_run(x2, na_rm = TRUE)
#>  [1] -0.18760011 -0.09022066 -0.06543317  0.03906450 -0.12188853 -0.13873536
#>  [7] -0.13873536 -0.14571604 -0.12596067 -0.11116961 -0.09881996 -0.08871569
#> [13] -0.05194292 -0.04699909 -0.05704202

mean_run(x2, na_rm = FALSE )
#>  [1] -0.18760011 -0.09022066 -0.06543317  0.03906450 -0.12188853 -0.13873536
#>  [7]          NA          NA          NA          NA          NA          NA
#> [13]          NA          NA          NA

mean_run(x2, na_rm = TRUE, k = 4)
#>  [1] -0.18760011 -0.09022066 -0.06543317  0.03906450 -0.10546063 -0.16299272
#>  [7] -0.21203756 -0.39209010 -0.13274756 -0.05603811 -0.03894684  0.01103493
#> [13]  0.09609256  0.09738460  0.04740283

mean_run(x2, na_rm = TRUE, k = 4, idx = date)
#> [1] -0.187600111 -0.090220655 -0.004349696  0.168349653 -0.206571573 -0.494335093
#> [7] -0.222969541 -0.187600111 -0.087636571  0.009742884  0.009742884  0.012326968
#> [13]  0.182442234  0.125737145  0.059094786

One can also specify other options like lag, and roll only at specific indexes. More in package and function documentation.

烟─花易冷 2024-07-23 19:47:44

下面是一个带有 filter 的简单函数,演示了一种通过填充处理开始和结束 NA 的方法,并使用自定义权重计算加权平均值(由 filter 支持):

wma <- function(x) { 
  wts <- c(seq(0.5, 4, 0.5), seq(3.5, 0.5, -0.5))
  nside <- (length(wts)-1)/2
  # pad x with begin and end values for filter to avoid NAs
  xp <- c(rep(first(x), nside), x, rep(last(x), nside)) 
  z <- stats::filter(xp, wts/sum(wts), sides = 2) %>% as.vector 
  z[(nside+1):(nside+length(x))]
}

Here is a simple function with filter demonstrating one way to take care of beginning and ending NAs with padding, and computing a weighted average (supported by filter) using custom weights:

wma <- function(x) { 
  wts <- c(seq(0.5, 4, 0.5), seq(3.5, 0.5, -0.5))
  nside <- (length(wts)-1)/2
  # pad x with begin and end values for filter to avoid NAs
  xp <- c(rep(first(x), nside), x, rep(last(x), nside)) 
  z <- stats::filter(xp, wts/sum(wts), sides = 2) %>% as.vector 
  z[(nside+1):(nside+length(x))]
}
深者入戏 2024-07-23 19:47:44
vector_avg <- function(x){
  sum_x = 0
  for(i in 1:length(x)){
    if(!is.na(x[i]))
      sum_x = sum_x + x[i]
  }
  return(sum_x/length(x))
}
vector_avg <- function(x){
  sum_x = 0
  for(i in 1:length(x)){
    if(!is.na(x[i]))
      sum_x = sum_x + x[i]
  }
  return(sum_x/length(x))
}
猥︴琐丶欲为 2024-07-23 19:47:44

我使用聚合以及由rep()创建的向量。 这样做的优点是可以使用 cbind() 一次聚合数据框中多于 1 列。 下面是长度为 1000 的向量 (v) 的移动平均值为 60 的示例:

v=1:1000*0.002+rnorm(1000)
mrng=rep(1:round(length(v)/60+0.5), length.out=length(v), each=60)
aggregate(v~mrng, FUN=mean, na.rm=T)

请注意,rep 中的第一个参数只是根据向量的长度和要计算的量获取移动极差的足够唯一值。平均; 第二个参数保持长度等于向量长度,最后一个参数重复第一个参数的值的次数与平均周期相同。

总的来说,您可以使用多个函数(中值、最大值、最小值) - 例如所示的平均值。 同样,可以使用带有 cbind 的公式对数据框中的多个(或所有)列执行此操作。

I use aggregate along with a vector created by rep(). This has the advantage of using cbind() to aggregate more than 1 column in your dataframe at time. Below is an example of a moving average of 60 for a vector (v) of length 1000:

v=1:1000*0.002+rnorm(1000)
mrng=rep(1:round(length(v)/60+0.5), length.out=length(v), each=60)
aggregate(v~mrng, FUN=mean, na.rm=T)

Note the first argument in rep is to simply get enough unique values for the moving range, based on the length of the vector and the amount to be averaged; the second argument keeps the length equal to the vector length, and the last repeats the values of the first argument the same number of times as the averaging period.

In aggregate you could use several functions (median, max, min) - mean shown for example. Again, could could use a formula with cbind to do this on more than one (or all) columns in a dataframe.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文