为什么 R 中的循环很慢?

发布于 2024-12-01 02:58:14 字数 224 浏览 0 评论 0原文

我知道 R 中的循环很慢,我应该尝试以矢量化的方式来做事。

但为什么?为什么循环很慢而apply却很快? apply 调用几个子函数——这看起来并不快。

更新:抱歉,这个问题提出得不恰当。我将矢量化与 apply 混淆了。我的问题应该是,

“为什么矢量化更快?”

I know that loops are slow in R and that I should try to do things in a vectorised manner instead.

But, why? Why are loops slow and apply is fast? apply calls several sub-functions -- that doesn't seem fast.

Update: I'm sorry, the question was ill-posed. I was confusing vectorisation with apply. My question should have been,

"Why is vectorisation faster?"

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

断舍离 2024-12-08 02:58:14

循环并不总是很慢,而 apply 却很快。 2008 年 5 月的《R News》对此进行了很好的讨论:

乌韦·利格斯和约翰·福克斯。 R 帮助台:我怎样才能避免这个循环或
让它更快? R 新闻,8(1):46-50,2008 年 5 月。

在“循环!”部分中(从第 48 页开始),他们说:

许多关于 R 的评论都指出使用循环是一个特别糟糕的主意。这不一定是真的。在某些情况下,矢量化代码很难编写,或者矢量化代码可能会消耗大量内存。

他们进一步建议:

  • 在循环之前将新对象初始化为完整长度,而不是
    而不是在循环内增加它们的大小。
  • 不要做某事
    可以在循环外完成的循环。
  • 不要简单地避免循环
    为了避免循环。

他们有一个简单的示例,其中 for 循环需要 1.3 秒,但 apply 内存不足。

It's not always the case that loops are slow and apply is fast. There's a nice discussion of this in the May, 2008, issue of R News:

Uwe Ligges and John Fox. R Help Desk: How can I avoid this loop or
make it faster? R News, 8(1):46-50, May 2008.

In the section "Loops!" (starting on pg 48), they say:

Many comments about R state that using loops is a particularly bad idea. This is not necessarily true. In certain cases, it is difficult to write vectorized code, or vectorized code may consume a huge amount of memory.

They further suggest:

  • Initialize new objects to full length before the loop, rather
    than increasing their size within the loop.

  • Do not do things in a
    loop that can be done outside the loop.

  • Do not avoid loops simply
    for the sake of avoiding loops.

They have a simple example where a for loop takes 1.3 sec but apply runs out of memory.

不如归去 2024-12-08 02:58:14

R 中的循环很慢,其原因与任何解释语言都很慢的原因相同:每个
操作带来了很多额外的负担。

查看 eval.cR_execClosure > (这是调用的函数
用户定义的函数)。它有近 100 行长并且执行各种操作
操作——创建执行环境,将参数分配给
想想当你在 C 中调用一个函数时(将

args 推到
堆栈、跳转、弹出参数)。

这就是为什么你会得到这样的时间(正如 joran 在评论中指出的那样,
实际上,apply 并不快;这是 mean 中的内部 C 循环
这就是快。 apply 只是常规的旧 R 代码):

A = matrix(as.numeric(1:100000))

使用循环: 0.342 秒:

system.time({
    Sum = 0
    for (i in seq_along(A)) {
        Sum = Sum + A[[i]]
    }
    Sum
})

使用 sum: 不可测量的小:

sum(A)

这有点令人不安,因为渐近地,循环同样好
作为总和;没有任何实际理由让它变慢;它只是做得更多
每次迭代的额外工作。

所以考虑一下:(

# 0.370 seconds
system.time({
    I = 0
    while (I < 100000) {
        10
        I = I + 1
    }
})

# 0.743 seconds -- double the time just adding parentheses
system.time({
    I = 0
    while (I < 100000) {
        ((((((((((10))))))))))
        I = I + 1
    }
})

这个例子是由 Radford Neal 发现的)

因为R 中的 ( 是一个运算符,实际上每次使用它时都需要进行名称查找:

> `(` = function(x) 2
> (3)
[1] 2

或者,一般来说,解释操作(任何语言)都有更多步骤。当然,这些步骤提供的好处如下好吧:你不能 C 中的 ( 技巧。

Loops in R are slow for the same reason any interpreted language is slow: every
operation carries around a lot of extra baggage.

Look at R_execClosure in eval.c (this is the function called to call a
user-defined function). It's nearly 100 lines long and performs all sorts of
operations -- creating an environment for execution, assigning arguments into
the environment, etc.

Think how much less happens when you call a function in C (push args on to
stack, jump, pop args).

So that is why you get timings like these (as joran pointed out in the comment,
it's not actually apply that's being fast; it's the internal C loop in mean
that's being fast. apply is just regular old R code):

A = matrix(as.numeric(1:100000))

Using a loop: 0.342 seconds:

system.time({
    Sum = 0
    for (i in seq_along(A)) {
        Sum = Sum + A[[i]]
    }
    Sum
})

Using sum: unmeasurably small:

sum(A)

It's a little disconcerting because, asymptotically, the loop is just as good
as sum; there's no practical reason it should be slow; it's just doing more
extra work each iteration.

So consider:

# 0.370 seconds
system.time({
    I = 0
    while (I < 100000) {
        10
        I = I + 1
    }
})

# 0.743 seconds -- double the time just adding parentheses
system.time({
    I = 0
    while (I < 100000) {
        ((((((((((10))))))))))
        I = I + 1
    }
})

(That example was discovered by Radford Neal)

Because ( in R is an operator, and actually requires a name lookup every time you use it:

> `(` = function(x) 2
> (3)
[1] 2

Or, in general, interpreted operations (in any language) have more steps. Of course, those steps provide benefits as well: you couldn't do that ( trick in C.

溺渁∝ 2024-12-08 02:58:14

对所提出问题的唯一答案是;循环如果您需要做的是迭代执行某些功能的一组数据,并且该功能或操作不是矢量化的。一般来说,for() 循环与 apply() 一样快,但可能比 lapply() 慢一点称呼。最后一点在 SO 上得到了很好的阐述,例如在这个 Answer 中,并且如果涉及设置和操作的代码适用循环循环整体计算负担的重要组成部分。

为什么许多人认为 for() 循环很慢,是因为他们(用户)编写了糟糕的代码。一般来说(尽管有几个例外),如果您需要扩展/增长一个对象,这也将涉及复制,因此您既有复制增长对象的开销。这不仅限于循环,而且如果您在循环的每次迭代中进行复制/增长,当然,循环会很慢,因为您会发生许多复制/增长操作。

在 R 中使用 for() 循环的一般习惯是,在循环开始之前分配所需的存储空间,然后填充由此分配的对象。如果你遵循这个习惯用法,循环将不会很慢。这就是 apply() 为您管理的内容,但它只是隐藏在视图之外。

当然,如果您使用 for() 循环实现的操作存在矢量化函数,则不要这样做。同样,如果存在矢量化函数,不要使用apply()等(例如apply(foo, 2,mean)最好通过colMeans(foo))。

The only Answer to the Question posed is; loops are not slow if what you need to do is iterate over a set of data performing some function and that function or the operation is not vectorized. A for() loop will be as quick, in general, as apply(), but possibly a little bit slower than an lapply() call. The last point is well covered on SO, for example in this Answer, and applies if the code involved in setting up and operating the loop is a significant part of the overall computational burden of the loop.

Why many people think for() loops are slow is because they, the user, are writing bad code. In general (though there are several exceptions), if you need to expand/grow an object, that too will involve copying so you have both the overhead of copying and growing the object. This is not just restricted to loops, but if you copy/grow at each iteration of a loop, of course, the loop is going to be slow because you are incurring many copy/grow operations.

The general idiom for using for() loops in R is that you allocate the storage you require before the loop starts, and then fill in the object thus allocated. If you follow that idiom, loops will not be slow. This is what apply() manages for you, but it is just hidden from view.

Of course, if a vectorised function exists for the operation you are implementing with the for() loop, don't do that. Likewise, don't use apply() etc if a vectorised function exists (e.g. apply(foo, 2, mean) is better performed via colMeans(foo)).

失眠症患者 2024-12-08 02:58:14

作为比较(不要过多解读!):我在 R 以及 Chrome 和 IE 8 中的 JavaScript 中运行了一个(非常)简单的 for 循环。
请注意,Chrome 会编译为本机代码,而带有编译器包的 R 会编译为字节码。

# In R 2.13.1, this took 500 ms
f <- function() { sum<-0.5; for(i in 1:1000000) sum<-sum+i; sum }
system.time( f() )

# And the compiled version took 130 ms
library(compiler)
g <- cmpfun(f)
system.time( g() )

@Gavin Simpson:顺便说一句,在 S-Plus 中花了 1162 毫秒......

并且与 JavaScript 的“相同”代码:

// In IE8, this took 282 ms
// In Chrome 14.0, this took 4 ms
function f() {
    var sum = 0.5;
    for(i=1; i<=1000000; ++i) sum = sum + i;
    return sum;
}

var start = new Date().getTime();
f();
time = new Date().getTime() - start;

Just as a comparison (don't read too much into it!): I ran a (very) simple for loop in R and in JavaScript in Chrome and IE 8.
Note that Chrome does compilation to native code, and R with the compiler package compiles to bytecode.

# In R 2.13.1, this took 500 ms
f <- function() { sum<-0.5; for(i in 1:1000000) sum<-sum+i; sum }
system.time( f() )

# And the compiled version took 130 ms
library(compiler)
g <- cmpfun(f)
system.time( g() )

@Gavin Simpson: Btw, it took 1162 ms in S-Plus...

And the "same" code as JavaScript:

// In IE8, this took 282 ms
// In Chrome 14.0, this took 4 ms
function f() {
    var sum = 0.5;
    for(i=1; i<=1000000; ++i) sum = sum + i;
    return sum;
}

var start = new Date().getTime();
f();
time = new Date().getTime() - start;
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文