有没有更快的方法来获得百分比变化?
我有一个包含大约 25000 条记录和 10 列的数据框。我正在使用代码根据另一列 (y) 确定同一列 (NewVal) 中先前值的更改,其中已有百分比更改。
x=c(1:25000)
y=rpois(25000,2)
z=data.frame(x,y)
z[1,'NewVal']=z[1,'x']
所以我运行了这个:
for(i in 2:nrow(z)){z$NewVal[i]=z$NewVal[i-1]+(z$NewVal[i-1]*(z$y[i]/100))}
这比我预期的要长得多。诚然,我可能是一个不耐烦的人 - 正如一封写给我的严厉信件曾经说过的那样 - 但我正试图逃离 Excel 的世界(在我阅读了 http://www.burns-stat.com/pages/Tutor/spreadsheet_addiction.html,这给我带来了更多问题,因为我已经开始不信任数据 - 那封信还提到了我的信任问题)。
我想在不使用包中的任何函数的情况下执行此操作,因为我想知道创建值的公式是什么 - 或者如果你愿意的话,根据那封友好的信,我是一个要求严格的控制狂。
我还想知道如何获得移动平均线,就像 caTools 中的 rollmean 一样。或者我如何找出他们的公式是什么?我尝试输入 rollmean,我认为它指的是另一个函数(我是 R 新手)。这可能应该是另一个问题 - 但正如那封信所说,我一生中从未做出过正确的决定。
I have a data frame with around 25000 records and 10 columns. I am using code to determine the change to the previous value in the same column (NewVal) based on another column (y) with a percent change already in it.
x=c(1:25000)
y=rpois(25000,2)
z=data.frame(x,y)
z[1,'NewVal']=z[1,'x']
So I ran this:
for(i in 2:nrow(z)){z$NewVal[i]=z$NewVal[i-1]+(z$NewVal[i-1]*(z$y[i]/100))}
This takes considerably longer than I expected it to. Granted I may be an impatient person - as a scathing letter drafted to me once said - but I am trying to escape the world of Excel (after I read http://www.burns-stat.com/pages/Tutor/spreadsheet_addiction.html, which is causing me more problems as I have begun to mistrust data - that letter also mentioned my trust issues).
I would like to do this without using any of the functions from packages as I would like to know what the formula for creating the values is - or if you will, I am a demanding control freak according to that friendly missive.
I would also like to know how to get a moving average just like rollmean in caTools. Either that or how do I figure out what their formula is? I tried entering rollmean and I think it refers to another function (I am new to R). This should probably be another question - but as that letter said, I don't ever make the right decisions in my life.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
![扫码二维码加入Web技术交流群](/public/img/jiaqun_03.jpg)
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
R 中的秘密是矢量化。在您的示例中,您可以使用 cumprod 来完成繁重的工作:
在我的机器上,循环只需不到 3 分钟即可运行,而 cumprod 语句几乎是瞬时的。
The secret in R is to vectorise. In your example you can use
cumprod
to do the heavy lifting:On my machine, the loop takes just less than 3 minutes to run, while the
cumprod
statement is virtually instantaneous.我使用
Reduce
得到了大约 800 倍的改进:I got about a 800-fold improvement with
Reduce
: