我正在尝试使用 R 计算矩阵中一系列值的移动平均值。 R 中似乎没有 内置函数 可以让我计算移动平均线。 有任何套餐提供吗? 还是我需要自己写?
I'm trying to use R to calculate the moving average over a series of values in a matrix. There doesn't seem to be a built-in function in R that will allow me to calculate moving averages. Do any packages provide one? Or do I need to write my own?
发布评论
评论(18)
如果您希望系列的两端不是 NA 而是递归计算移动平均线,则另一个有用的函数:
示例:
[1] 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
[1] 1.00 1.25 1.50 2.00 2.50 3.00 3.50 4.00 4.25 4.50
Another useful function if you want the two ends of series not to be NA but to be recursively calculated moving averages:
Example:
[1] 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
[1] 1.00 1.25 1.50 2.00 2.50 3.00 3.50 4.00 4.25 4.50
或者你可以简单地使用filter来计算它,这是我使用的函数:
如果你使用
dplyr
,请小心在上面的函数中指定stats::filter
。Or you can simply calculate it using filter, here's the function I use:
If you use
dplyr
, be careful to specifystats::filter
in the function above.使用 cumsum 应该足够且高效。 假设您有一个向量 x 并且需要 n 个数字的运行总和
正如 @mzuther 的评论中所指出的,这假设数据中没有 NA。 要处理这些问题,需要将每个窗口除以非 NA 值的数量。 这是一种方法,结合@Ricardo Cruz 的评论:
这仍然存在问题,如果窗口中的所有值都是 NA,那么将会出现被零除错误。
Using
cumsum
should be sufficient and efficient. Assuming you have a vector x and you want a running sum of n numbersAs pointed out in the comments by @mzuther, this assumes that there are no NAs in the data. to deal with those would require dividing each window by the number of non-NA values. Here's one way of doing that, incorporating the comment from @Ricardo Cruz:
This still has the issue that if all the values in the window are NAs then there will be a division by zero error.
在 data.table 1.12.0 中添加了新的
frollmean
函数来计算快速且精确的滚动平均值,仔细处理NA
、NaN< /code> 和
+Inf
、-Inf
值。由于问题中没有可重现的示例,因此这里没有更多需要解决的问题。
您可以在手册中找到有关
?frollmean
的更多信息,也可以在线访问?frollmean
。以下手册中的示例:
In data.table 1.12.0 new
frollmean
function has been added to compute fast and exact rolling mean carefully handlingNA
,NaN
and+Inf
,-Inf
values.As there is no reproducible example in the question there is not much more to address here.
You can find more info about
?frollmean
in manual, also available online at?frollmean
.Examples from manual below:
caTools 包具有非常快速的滚动平均值/最小值/最大值/标准差和一些其他功能。 我只使用过 runmean 和 runsd,它们是迄今为止提到的任何其他软件包中最快的。
The
caTools
package has very fast rolling mean/min/max/sd and few other functions. I've only worked withrunmean
andrunsd
and they are the fastest of any of the other packages mentioned to date.以下示例代码展示了如何使用 zoo 包。
Here is example code showing how to compute a centered moving average and a trailing moving average using the
rollmean
function from the zoo package.您可以使用 RcppRoll 来实现用 C++ 编写的非常快速的移动平均线。 只需调用
roll_mean
函数即可。 可以在此处找到文档。否则,这个(较慢的)for循环应该可以解决问题:
You could use
RcppRoll
for very quick moving averages written in C++. Just call theroll_mean
function. Docs can be found here.Otherwise, this (slower) for loop should do the trick:
事实上,
RcppRoll
非常好。cantdutchthis 发布的代码必须在第四行中更正以修复窗口:
另一种处理缺失的方法是此处给出。
第三种方法是改进 cantdutchthis 代码来计算部分平均值,如下所示:
In fact
RcppRoll
is very good.The code posted by cantdutchthis must be corrected in the fourth line to the window be fixed:
Another way, which handles missings, is given here.
A third way, improving cantdutchthis code to calculate partial averages or not, follows:
您可以通过以下方式计算窗口宽度为
k
的向量x
的移动平均值:You may calculate the moving average of a vector
x
with a window width ofk
by:为了补充 cantdutchthis 和 罗德里戈·雷梅迪奥;
In order to complement the answer of cantdutchthis and Rodrigo Remedio;
滑块包可用于此目的。 它有一个专门设计的界面,感觉类似于 purrr。 它接受任何任意函数,并且可以返回任何类型的输出。 数据帧甚至逐行迭代。 pkgdown 网站位于此处。
slider 和 data.table 的
frollapply()
的开销应该相当低(比 Zoo 快得多)。 对于这里的这个简单示例,frollapply() 看起来要快一些,但请注意,它只接受数字输入,并且输出必须是标量数值。 滑块函数是完全通用的,您可以返回任何数据类型。The slider package can be used for this. It has an interface that has been specifically designed to feel similar to purrr. It accepts any arbitrary function, and can return any type of output. Data frames are even iterated over row wise. The pkgdown site is here.
The overhead of both slider and data.table's
frollapply()
should be pretty low (much faster than zoo).frollapply()
looks to be a little faster for this simple example here, but note that it only takes numeric input, and the output must be a scalar numeric value. slider functions are completely generic, and you can return any data type.编辑:非常乐意添加
side
参数,例如Date 的过去 7 天的移动平均值(或总和,或...)向量。
对于只想自己计算的人来说,无非是:
但是使其独立于
mean()
会很有趣,因此您可以计算任何“移动”函数!EDIT: took great joy in adding the
side
parameter, for a moving average (or sum, or ...) of e.g. the past 7 days of aDate
vector.For people just wanting to calculate this themselves, it's nothing more than:
But it gets fun to make it independent of
mean()
, so you can calculate any 'moving' function!虽然有点慢,但您也可以使用zoo::rollapply 对矩阵执行计算。
其中x是数据集,FUN =mean是函数; 您还可以将其更改为 min、max、sd 等,宽度是滚动窗口。
Though a bit slow but you can also use zoo::rollapply to perform calculations on matrices.
where x is the data set, FUN = mean is the function; you can also change it to min, max, sd etc and width is the rolling window.
可以使用
runner
包来移动函数。 在本例中,mean_run
函数。cummean
的问题在于它不处理NA
值,但mean_run
可以。runner
包还支持不规则时间序列,并且窗口可以依赖于日期:还可以指定其他选项,例如
lag
,并且仅在at
特定索引滚动。 更多内容请参见 package 和 函数 文档。One can use
runner
package for moving functions. In this casemean_run
function. Problem withcummean
is that it doesn't handleNA
values, butmean_run
does.runner
package also supports irregular time series and windows can depend on date:One can also specify other options like
lag
, and roll onlyat
specific indexes. More in package and function documentation.下面是一个带有
filter
的简单函数,演示了一种通过填充处理开始和结束 NA 的方法,并使用自定义权重计算加权平均值(由filter
支持):Here is a simple function with
filter
demonstrating one way to take care of beginning and ending NAs with padding, and computing a weighted average (supported byfilter
) using custom weights:我使用聚合以及由rep()创建的向量。 这样做的优点是可以使用 cbind() 一次聚合数据框中多于 1 列。 下面是长度为 1000 的向量 (v) 的移动平均值为 60 的示例:
请注意,rep 中的第一个参数只是根据向量的长度和要计算的量获取移动极差的足够唯一值。平均; 第二个参数保持长度等于向量长度,最后一个参数重复第一个参数的值的次数与平均周期相同。
总的来说,您可以使用多个函数(中值、最大值、最小值) - 例如所示的平均值。 同样,可以使用带有 cbind 的公式对数据框中的多个(或所有)列执行此操作。
I use aggregate along with a vector created by rep(). This has the advantage of using cbind() to aggregate more than 1 column in your dataframe at time. Below is an example of a moving average of 60 for a vector (v) of length 1000:
Note the first argument in rep is to simply get enough unique values for the moving range, based on the length of the vector and the amount to be averaged; the second argument keeps the length equal to the vector length, and the last repeats the values of the first argument the same number of times as the averaging period.
In aggregate you could use several functions (median, max, min) - mean shown for example. Again, could could use a formula with cbind to do this on more than one (or all) columns in a dataframe.