“应用”的优势是什么?功能?什么时候它们比“for”更好用?循环,什么时候不是?
可能的重复:
R 的 apply 系列是否比语法糖更重要
正如标题所示说。也许是个愚蠢的问题,但我的理解是,当使用“apply”函数时,迭代是在编译的代码中执行的,而不是在 R 解析器中执行的。这似乎意味着,例如,如果存在大量迭代并且每个操作都相对简单,则 lapply 仅比“for”循环更快。例如,如果对包含在 lapply 中的函数进行一次调用需要 10 秒,并且只有 12 次迭代,我可以想象使用“for”和“lapply”之间几乎没有任何区别。
现在我想起来了,如果无论如何都必须解析“lapply”内的函数,为什么使用“lapply”而不是“for”会有任何性能优势,除非你正在做一些有编译函数的事情(如求和或乘法等)?
提前致谢!
乔什
Possible Duplicate:
Is R's apply family more than syntactic sugar
Just what the title says. Stupid question, perhaps, but my understanding has been that when using an "apply" function, the iteration is performed in compiled code rather than in the R parser. This would seem to imply that lapply, for instance, is only faster than a "for" loop if there are a great many iterations and each operation is relatively simple. For instance, if a single call to a function wrapped up in lapply takes 10 seconds, and there are only, say, 12 iterations of it, I would imagine that there's virtually no difference at all between using "for" and "lapply".
Now that I think of it, if the function inside the "lapply" has to be parsed anyway, why should there be ANY performance benefit from using "lapply" instead of "for" unless you're doing something that there are compiled functions for (like summing or multiplying, etc)?
Thanks in advance!
Josh
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
有多种原因导致人们可能更喜欢
apply
系列函数而不是for
循环,反之亦然。首先,如果正确执行,
for()
和apply()
、sapply()
通常会一样快。lapply()
与其他函数相比,lapply() 在 R 内部的编译代码中执行的操作更多,因此比这些函数更快。当“循环”数据的行为占计算时间的很大一部分时,速度优势似乎最大;在许多日常使用中,您不太可能从本质上更快的lapply()
中获益。最后,这些都将调用 R 函数,因此需要解释它们然后运行。for()
循环通常更容易实现,特别是如果您来自循环流行的编程背景。在循环中工作可能比强制将迭代计算放入apply
系列函数之一中更自然。但是,要正确使用for()
循环,您需要做一些额外的工作来设置存储并管理将循环的输出重新插入在一起。apply
函数会自动为您完成此操作。例如:这是一个愚蠢的例子,因为
>
是一个矢量化运算符,但我想要强调一点,即你必须管理输出。最主要的是,对于for()
循环,您始终在开始循环之前分配足够的存储空间来保存输出。如果您不知道需要多少存储空间,则分配合理的存储块,然后在循环中检查是否已耗尽该存储空间,然后添加另一大存储块。在我看来,使用 apply 系列函数之一的主要原因是为了更优雅、更易读的代码。我们可以让 R 处理该问题,并简洁地要求 R 对数据子集运行函数,而不是管理输出存储和设置循环(如上所示)。速度通常不会影响决定,至少对我来说是这样。我使用最适合情况的函数,并且会生成简单、易于理解的代码,因为如果我不记得代码是什么,我很可能会浪费比总是选择最快的函数节省的时间更多的时间一天或一周或更长时间后做!
apply
系列适合标量或向量运算。for()
循环通常适合使用同一索引i
执行多个迭代操作。例如,我编写了使用for()
循环对对象进行 k 折叠或引导交叉验证的代码。我可能永远不会考虑使用apply
系列之一来执行此操作,因为每个 CV 迭代都需要多个操作、访问当前帧中的大量对象,并填充几个保存输出的输出对象。迭代。至于最后一点,关于为什么
lapply()
可能比for()
或apply()
更快,您需要认识到这一点“循环”可以在解释的 R 代码或编译的代码中执行。是的,两者仍然会调用需要解释的 R 函数,但如果您直接从编译的 C 代码(例如lapply()
)进行循环和调用,那么这就是性能增益的地方来自apply()
,可以归结为实际 R 代码中的for()
循环。请参阅apply()
的源代码,了解它是for()
循环的包装器,然后查看lapply()
的代码code>,即:您应该明白为什么
lapply()
和for()
以及其他apply
之间的速度存在差异> 家庭功能。.Internal()
是 R 调用 R 本身使用的编译 C 代码的方法之一。除了对FUN
进行操作和健全性检查之外,整个计算都是用 C 语言完成的,调用 R 函数FUN
。将其与apply()
的源代码进行比较。There are several reasons why one might prefer an
apply
family function over afor
loop, or vice-versa.Firstly,
for()
andapply()
,sapply()
will generally be just as quick as each other if executed correctly.lapply()
does more of it's operating in compiled code within the R internals than the others, so can be faster than those functions. It appears the speed advantage is greatest when the act of "looping" over the data is a significant part of the compute time; in many general day-to-day uses you are unlikely to gain much from the inherently quickerlapply()
. In the end, these all will be calling R functions so they need to be interpreted and then run.for()
loops can often be easier to implement, especially if you come from a programming background where loops are prevalent. Working in a loop may be more natural than forcing the iterative computation into one of theapply
family functions. However, to usefor()
loops properly, you need to do some extra work to set-up storage and manage plugging the output of the loop back together again. Theapply
functions do this for you automagically. E.g.:that is a silly example as
>
is a vectorised operator but I wanted something to make a point, namely that you have to manage the output. The main thing is that withfor()
loops, you always allocate sufficient storage to hold the outputs before you start the loop. If you don't know how much storage you will need, then allocate a reasonable chunk of storage, and then in the loop check if you have exhausted that storage, and bolt on another big chunk of storage.The main reason, in my mind, for using one of the
apply
family of functions is for more elegant, readable code. Rather than managing the output storage and setting up the loop (as shown above) we can let R handle that and succinctly ask R to run a function on subsets of our data. Speed usually does not enter into the decision, for me at least. I use the function that suits the situation best and will result in simple, easy to understand code, because I'm far more likely to waste more time than I save by always choosing the fastest function if I can't remember what the code is doing a day or a week or more later!The
apply
family lend themselves to scalar or vector operations. Afor()
loop will often lend itself to doing multiple iterated operations using the same indexi
. For example, I have written code that usesfor()
loops to do k-fold or bootstrap cross-validation on objects. I probably would never entertain doing that with one of theapply
family as each CV iteration needs multiple operations, access to lots of objects in the current frame, and fills in several output objects that hold the output of the iterations.As to the last point, about why
lapply()
can possibly be faster thatfor()
orapply()
, you need to realise that the "loop" can be performed in interpreted R code or in compiled code. Yes, both will still be calling R functions that need to be interpreted, but if you are doing the looping and calling directly from compiled C code (e.g.lapply()
) then that is where the performance gain can come from overapply()
say which boils down to afor()
loop in actual R code. See the source forapply()
to see that it is a wrapper around afor()
loop, and then look at the code forlapply()
, which is:and you should see why there can be a difference in speed between
lapply()
andfor()
and the otherapply
family functions. The.Internal()
is one of R's ways of calling compiled C code used by R itself. Apart from a manipulation, and a sanity check onFUN
, the entire computation is done in C, calling the R functionFUN
. Compare that with the source forapply()
.来自 Burns 的 R Inferno (pdf),第 25 页:
From Burns' R Inferno (pdf), p25: