渐进式操作的 For 循环替代方案

发布于 2024-10-19 11:07:36 字数 428 浏览 1 评论 0原文

我必须逐步将回归函数应用于时间序列数据(向量“time”和“tm”,并且我使用 For 循环,如下所示:

top<-length(time)
for(k in 2:top){
    lin.regr<-lm(tm[1:k] ~ log(time[1:k]))
    slope[k]<-coef(lin.regr)[2]
}

但是对于向量长度约为 10k 的情况,它会变得非常慢。 有没有更快的替代方案(也许使用 apply 函数)?

在一个更简单的问题中:如果我有一个像 x<-c(1:10) 这样的向量,我如何构建包含(例如)x 值的渐进和的 ay 向量? 喜欢:

x
1 2 3 4 5 6 7 8 9 10
y
1  3  6 10 15 21 28 36 45 55

I have to apply regression function progressively to a time series data (vector "time" and "tm" and I'm using a For Loop as follow:

top<-length(time)
for(k in 2:top){
    lin.regr<-lm(tm[1:k] ~ log(time[1:k]))
    slope[k]<-coef(lin.regr)[2]
}

But for vectors' length of about 10k it becomes very slow.
Is there a faster alternative (maybe using apply function)?

In a more easy problem: if I have a vector like x<-c(1:10) how can I build a y vector containing (for example) the progressive sum of x values?
Like:

x
1 2 3 4 5 6 7 8 9 10
y
1  3  6 10 15 21 28 36 45 55

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

陪我终i 2024-10-26 11:07:36

嗯,没有快速循环替代方案,除非您可以矢量化。在某些情况下,诸如ave、aggregate、ddply、tapply...之类的函数可以为您带来巨大的胜利,但通常诀窍在于使用更快的函数,例如cumsum(参见user615147的答案)

举例来说:

top <- 1000
tm <- rnorm(top,10)   
time <- rnorm(top,10)

> system.time(
+ results <- sapply(2:top,function (k) coef(lm(tm[1:k] ~ log(time[1:k])))[2])
+ )
   user  system elapsed 
   4.26    0.00    4.27 

> system.time(
+ results <- lapply(2:top,function (k) coef(lm(tm[1:k] ~ log(time[1:k])))[2])
+ )
   user  system elapsed 
   4.25    0.00    4.25 

> system.time(
+ results <- for(k in 2:top) coef(lm(tm[1:k] ~ log(time[1:k])))[2]
+ )
   user  system elapsed 
   4.25    0.00    4.25 

> system.time(
+ results <- for(k in 2:top) lm.fit(matrix(log(time[1:k]),ncol=1),
+                                 tm[1:k])$coefficients[2]
+ )
   user  system elapsed 
   0.43    0.00    0.42 

唯一更快的解决方案是lm.fit()。不要误会,每次运行分析时的时间都会有所不同,因此 0.02 的差异在 R 中并不显着。 sapply、forlapply 都是在这里同样快。诀窍是使用lm.fit

如果您有一个名为 Data 的数据框,您可以使用类似 :

Data <- data.frame(Y=rnorm(top),X1=rnorm(top),X2=rnorm(top))

mf <- model.matrix(Y~X1+X2,data=Data)
results <- sapply(2:top, function(k)
  lm.fit(mf[1:k,],Data$Y[1:k])$coefficients[2]
)

作为更通用的解决方案。

Well, there is no fast loop alternative, unless you can vectorize. In some circumstances functions like ave, aggregate, ddply, tapply, ... can give you a substantial win, but often the trick lies in using faster functions, like cumsum (cfr. the answer of user615147)

To illustrate :

top <- 1000
tm <- rnorm(top,10)   
time <- rnorm(top,10)

> system.time(
+ results <- sapply(2:top,function (k) coef(lm(tm[1:k] ~ log(time[1:k])))[2])
+ )
   user  system elapsed 
   4.26    0.00    4.27 

> system.time(
+ results <- lapply(2:top,function (k) coef(lm(tm[1:k] ~ log(time[1:k])))[2])
+ )
   user  system elapsed 
   4.25    0.00    4.25 

> system.time(
+ results <- for(k in 2:top) coef(lm(tm[1:k] ~ log(time[1:k])))[2]
+ )
   user  system elapsed 
   4.25    0.00    4.25 

> system.time(
+ results <- for(k in 2:top) lm.fit(matrix(log(time[1:k]),ncol=1),
+                                 tm[1:k])$coefficients[2]
+ )
   user  system elapsed 
   0.43    0.00    0.42 

The only faster solution is lm.fit(). Don't be mistaken, the timings differ a bit every time you run the analysis, so a difference of 0.02 is not significant in R. sapply, for and lapply are all exactly as fast here. The trick is to use lm.fit.

If you have a dataframe called Data, you could use something like :

Data <- data.frame(Y=rnorm(top),X1=rnorm(top),X2=rnorm(top))

mf <- model.matrix(Y~X1+X2,data=Data)
results <- sapply(2:top, function(k)
  lm.fit(mf[1:k,],Data$Y[1:k])$coefficients[2]
)

as a more general solution.

£冰雨忧蓝° 2024-10-26 11:07:36
results <- sapply(2:top,function (k) coef(lm(tm[1:k] ~ log(time[1:k])))[2])

〜应用函数系列是在 R 中迭代的最快方法。

还可以看看使用 lm.fit() 来加快你的回归速度,

cumsum(1:10)

第二个问题是如何做的

results <- sapply(2:top,function (k) coef(lm(tm[1:k] ~ log(time[1:k])))[2])

~apply family of functions is the fastest way to iterate in R.

can also look at using lm.fit() to speed up your regrssion a bit

cumsum(1:10)

is how to do the second question

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文