当前位置：文江博客话题详情

为什么 R 中的循环很慢？

发布于 2024-12-01 02:58:14 字数 224 浏览 0 评论 0原文

我知道 R 中的循环很慢，我应该尝试以矢量化的方式来做事。

但为什么？为什么循环很慢而apply却很快？ apply 调用几个子函数——这看起来并不快。

更新：抱歉，这个问题提出得不恰当。我将矢量化与 apply 混淆了。我的问题应该是，

“为什么矢量化更快？”

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

断舍离 2024-12-08 02:58:14

循环并不总是很慢，而 apply 却很快。 2008 年 5 月的《R News》对此进行了很好的讨论：

乌韦·利格斯和约翰·福克斯。 R 帮助台：我怎样才能避免这个循环或
让它更快？ R 新闻，8(1):46-50，2008 年 5 月。

在“循环！”部分中（从第 48 页开始），他们说：

许多关于 R 的评论都指出使用循环是一个特别糟糕的主意。这不一定是真的。在某些情况下，矢量化代码很难编写，或者矢量化代码可能会消耗大量内存。

他们进一步建议：

在循环之前将新对象初始化为完整长度，而不是
而不是在循环内增加它们的大小。
不要做某事
可以在循环外完成的循环。
不要简单地避免循环
为了避免循环。

他们有一个简单的示例，其中 for 循环需要 1.3 秒，但 apply 内存不足。

回复收藏 0 原文

不如归去 2024-12-08 02:58:14

R 中的循环很慢，其原因与任何解释语言都很慢的原因相同：每个
操作带来了很多额外的负担。

查看 eval.cR_execClosure > （这是调用的函数
用户定义的函数）。它有近 100 行长并且执行各种操作
操作——创建执行环境，将参数分配给
想想当你在 C 中调用一个函数时（将

args 推到
堆栈、跳转、弹出参数）。

这就是为什么你会得到这样的时间（正如 joran 在评论中指出的那样，
实际上，apply 并不快；这是 mean 中的内部 C 循环
这就是快。 apply 只是常规的旧 R 代码）：

A = matrix(as.numeric(1:100000))

使用循环： 0.342 秒：

system.time({
    Sum = 0
    for (i in seq_along(A)) {
        Sum = Sum + A[[i]]
    }
    Sum
})

使用 sum：不可测量的小：

sum(A)

这有点令人不安，因为渐近地，循环同样好
作为总和；没有任何实际理由让它变慢；它只是做得更多
每次迭代的额外工作。

所以考虑一下：（

# 0.370 seconds
system.time({
    I = 0
    while (I < 100000) {
        10
        I = I + 1
    }
})

# 0.743 seconds -- double the time just adding parentheses
system.time({
    I = 0
    while (I < 100000) {
        ((((((((((10))))))))))
        I = I + 1
    }
})

这个例子是由 Radford Neal 发现的）

因为R 中的 ( 是一个运算符，实际上每次使用它时都需要进行名称查找：

> `(` = function(x) 2
> (3)
[1] 2

或者，一般来说，解释操作（任何语言）都有更多步骤。当然，这些步骤提供的好处如下好吧：你不能做 C 中的 ( 技巧。

Loops in R are slow for the same reason any interpreted language is slow: every
operation carries around a lot of extra baggage.

Look at R_execClosure in eval.c (this is the function called to call a
user-defined function). It's nearly 100 lines long and performs all sorts of
operations -- creating an environment for execution, assigning arguments into
the environment, etc.

Think how much less happens when you call a function in C (push args on to
stack, jump, pop args).

So that is why you get timings like these (as joran pointed out in the comment,
it's not actually apply that's being fast; it's the internal C loop in mean
that's being fast. apply is just regular old R code):

A = matrix(as.numeric(1:100000))

Using a loop: 0.342 seconds:

system.time({
    Sum = 0
    for (i in seq_along(A)) {
        Sum = Sum + A[[i]]
    }
    Sum
})

Using sum: unmeasurably small:

sum(A)

It's a little disconcerting because, asymptotically, the loop is just as good
as sum; there's no practical reason it should be slow; it's just doing more
extra work each iteration.

So consider:

# 0.370 seconds
system.time({
    I = 0
    while (I < 100000) {
        10
        I = I + 1
    }
})

# 0.743 seconds -- double the time just adding parentheses
system.time({
    I = 0
    while (I < 100000) {
        ((((((((((10))))))))))
        I = I + 1
    }
})

(That example was discovered by Radford Neal)

Because ( in R is an operator, and actually requires a name lookup every time you use it:

> `(` = function(x) 2
> (3)
[1] 2

Or, in general, interpreted operations (in any language) have more steps. Of course, those steps provide benefits as well: you couldn't do that ( trick in C.

回复收藏 0 原文

溺渁∝ 2024-12-08 02:58:14

对所提出问题的唯一答案是；循环不慢如果您需要做的是迭代执行某些功能的一组数据，并且该功能或操作不是矢量化的。一般来说，for() 循环与 apply() 一样快，但可能比 lapply() 慢一点称呼。最后一点在 SO 上得到了很好的阐述，例如在这个 Answer 中，并且如果涉及设置和操作的代码适用循环是循环整体计算负担的重要组成部分。

为什么许多人认为 for() 循环很慢，是因为他们（用户）编写了糟糕的代码。一般来说（尽管有几个例外），如果您需要扩展/增长一个对象，这也将涉及复制，因此您既有复制又增长对象的开销。这不仅限于循环，而且如果您在循环的每次迭代中进行复制/增长，当然，循环会很慢，因为您会发生许多复制/增长操作。

在 R 中使用 for() 循环的一般习惯是，在循环开始之前分配所需的存储空间，然后填充由此分配的对象。如果你遵循这个习惯用法，循环将不会很慢。这就是 apply() 为您管理的内容，但它只是隐藏在视图之外。

当然，如果您使用 for() 循环实现的操作存在矢量化函数，则不要这样做。同样，如果存在矢量化函数，不要使用apply()等（例如apply(foo, 2,mean)最好通过colMeans(foo))。

回复收藏 0 原文

失眠症患者 2024-12-08 02:58:14

作为比较（不要过多解读！）：我在 R 以及 Chrome 和 IE 8 中的 JavaScript 中运行了一个（非常）简单的 for 循环。
请注意，Chrome 会编译为本机代码，而带有编译器包的 R 会编译为字节码。

# In R 2.13.1, this took 500 ms
f <- function() { sum<-0.5; for(i in 1:1000000) sum<-sum+i; sum }
system.time( f() )

# And the compiled version took 130 ms
library(compiler)
g <- cmpfun(f)
system.time( g() )

@Gavin Simpson：顺便说一句，在 S-Plus 中花了 1162 毫秒......

并且与 JavaScript 的“相同”代码：

// In IE8, this took 282 ms
// In Chrome 14.0, this took 4 ms
function f() {
    var sum = 0.5;
    for(i=1; i<=1000000; ++i) sum = sum + i;
    return sum;
}

var start = new Date().getTime();
f();
time = new Date().getTime() - start;

Just as a comparison (don't read too much into it!): I ran a (very) simple for loop in R and in JavaScript in Chrome and IE 8.
Note that Chrome does compilation to native code, and R with the compiler package compiles to bytecode.

# In R 2.13.1, this took 500 ms
f <- function() { sum<-0.5; for(i in 1:1000000) sum<-sum+i; sum }
system.time( f() )

# And the compiled version took 130 ms
library(compiler)
g <- cmpfun(f)
system.time( g() )

@Gavin Simpson: Btw, it took 1162 ms in S-Plus...

And the "same" code as JavaScript:

// In IE8, this took 282 ms
// In Chrome 14.0, this took 4 ms
function f() {
    var sum = 0.5;
    for(i=1; i<=1000000; ++i) sum = sum + i;
    return sum;
}

var start = new Date().getTime();
f();
time = new Date().getTime() - start;

回复收藏 0 原文

~没有更多了~