有没有更快的方法来获得百分比变化？

发布于 2024-12-11 06:01:59 字数 811 浏览 0 评论 0原文

我有一个包含大约 25000 条记录和 10 列的数据框。我正在使用代码根据另一列 (y) 确定同一列 (NewVal) 中先前值的更改，其中已有百分比更改。

x=c(1:25000)
y=rpois(25000,2)
z=data.frame(x,y)
z[1,'NewVal']=z[1,'x']

所以我运行了这个：

for(i in 2:nrow(z)){z$NewVal[i]=z$NewVal[i-1]+(z$NewVal[i-1]*(z$y[i]/100))}

这比我预期的要长得多。诚然，我可能是一个不耐烦的人 - 正如一封写给我的严厉信件曾经说过的那样 - 但我正试图逃离 Excel 的世界（在我阅读了 http://www.burns-stat.com/pages/Tutor/spreadsheet_addiction.html，这给我带来了更多问题，因为我已经开始不信任数据 - 那封信还提到了我的信任问题）。

我想在不使用包中的任何函数的情况下执行此操作，因为我想知道创建值的公式是什么 - 或者如果你愿意的话，根据那封友好的信，我是一个要求严格的控制狂。

我还想知道如何获得移动平均线，就像 caTools 中的 rollmean 一样。或者我如何找出他们的公式是什么？我尝试输入 rollmean，我认为它指的是另一个函数（我是 R 新手）。这可能应该是另一个问题 - 但正如那封信所说，我一生中从未做出过正确的决定。

原文

I have a data frame with around 25000 records and 10 columns. I am using code to determine the change to the previous value in the same column (NewVal) based on another column (y) with a percent change already in it.

x=c(1:25000)
y=rpois(25000,2)
z=data.frame(x,y)
z[1,'NewVal']=z[1,'x']

So I ran this:

for(i in 2:nrow(z)){z$NewVal[i]=z$NewVal[i-1]+(z$NewVal[i-1]*(z$y[i]/100))}

This takes considerably longer than I expected it to. Granted I may be an impatient person - as a scathing letter drafted to me once said - but I am trying to escape the world of Excel (after I read http://www.burns-stat.com/pages/Tutor/spreadsheet_addiction.html, which is causing me more problems as I have begun to mistrust data - that letter also mentioned my trust issues).

I would like to do this without using any of the functions from packages as I would like to know what the formula for creating the values is - or if you will, I am a demanding control freak according to that friendly missive.

I would also like to know how to get a moving average just like rollmean in caTools. Either that or how do I figure out what their formula is? I tried entering rollmean and I think it refers to another function (I am new to R). This should probably be another question - but as that letter said, I don't ever make the right decisions in my life.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

凡尘雨 2024-12-18 06:01:59

R 中的秘密是矢量化。在您的示例中，您可以使用 cumprod 来完成繁重的工作：

z$NewVal2 <- x[1] * cumprod(with(z, 1 +(c(0, y[-1]/100))))

all.equal(z$NewVal, z$NewVal2)
[1] TRUE

head(z, 10)
    x y   NewVal  NewVal2
1  25 4 25.00000 25.00000
2  24 3 25.75000 25.75000
3  23 0 25.75000 25.75000
4  22 1 26.00750 26.00750
5  21 3 26.78773 26.78773
6  20 2 27.32348 27.32348
7  19 2 27.86995 27.86995
8  18 3 28.70605 28.70605
9  17 4 29.85429 29.85429
10 16 2 30.45138 30.45138

在我的机器上，循环只需不到 3 分钟即可运行，而 cumprod 语句几乎是瞬时的。

The secret in R is to vectorise. In your example you can use cumprod to do the heavy lifting:

z$NewVal2 <- x[1] * cumprod(with(z, 1 +(c(0, y[-1]/100))))

all.equal(z$NewVal, z$NewVal2)
[1] TRUE

head(z, 10)
    x y   NewVal  NewVal2
1  25 4 25.00000 25.00000
2  24 3 25.75000 25.75000
3  23 0 25.75000 25.75000
4  22 1 26.00750 26.00750
5  21 3 26.78773 26.78773
6  20 2 27.32348 27.32348
7  19 2 27.86995 27.86995
8  18 3 28.70605 28.70605
9  17 4 29.85429 29.85429
10 16 2 30.45138 30.45138

On my machine, the loop takes just less than 3 minutes to run, while the cumprod statement is virtually instantaneous.

回复收藏 0 原文

烟沫凡尘 2024-12-18 06:01:59

我使用 Reduce 得到了大约 800 倍的改进：

    system.time(z[, "NewVal"] <-Reduce("*",  c(1, 1+z$y[-1]/100), accumulate=T) )
   user  system elapsed 
  0.139   0.008   0.148 

> head(z)
    x y NewVal
1   1 1  1.000
2   2 1  1.010
3   3 1  1.020
4   4 5  1.071
5   5 1  1.082
6   6 2  1.103
7   7 2  1.126
8   8 3  1.159
9   9 0  1.159
10 10 1  1.171
> system.time(for(i in 2:nrow(z)){z$NewVal[i]=z$NewVal[i-1]+
                                              (z$NewVal[i-1]*(z$y[i]/100))})
   user  system elapsed 
  37.29  106.38  143.16

I got about a 800-fold improvement with Reduce:

    system.time(z[, "NewVal"] <-Reduce("*",  c(1, 1+z$y[-1]/100), accumulate=T) )
   user  system elapsed 
  0.139   0.008   0.148 

> head(z)
    x y NewVal
1   1 1  1.000
2   2 1  1.010
3   3 1  1.020
4   4 5  1.071
5   5 1  1.082
6   6 2  1.103
7   7 2  1.126
8   8 3  1.159
9   9 0  1.159
10 10 1  1.171
> system.time(for(i in 2:nrow(z)){z$NewVal[i]=z$NewVal[i-1]+
                                              (z$NewVal[i-1]*(z$y[i]/100))})
   user  system elapsed 
  37.29  106.38  143.16

回复收藏 0 原文

~没有更多了~