为什么转换为``列表''改善了``lapply''的性能?
我惊讶地发现第一行的运行速度比第二行慢得多,第二行的性能可疑地接近矢量化版本。如果处理列表比处理 numeric(n)
向量快得多,为什么 R 不自动将其输入转换为列表?
> system.time(lapply(1:10^7, sqrt))
user system elapsed
4.445 0.204 4.692
> system.time(lapply(list(1:10^7), sqrt))
user system elapsed
0.048 0.015 0.062
> system.time(sqrt(1:10^7))
user system elapsed
0.04 0.00 0.04
这是版本信息
$ R --version
R version 4.1.3 (2022-03-10) -- "One Push-Up"
Copyright (C) 2022 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin21.4.0 (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under the terms of the
GNU General Public License versions 2 or 3.
For more information about these matters see
https://www.gnu.org/licenses/.
$ sw_vers
ProductName: macOS
ProductVersion: 12.3.1
BuildVersion: 21E258
I am surprised to see the first line runs much slower compared to the second one, which is suspiciously close in performance to the vectorized version. If processing a list is so much faster than processing a numeric(n)
vector, why doesn't R convert its input to a list automatically?
> system.time(lapply(1:10^7, sqrt))
user system elapsed
4.445 0.204 4.692
> system.time(lapply(list(1:10^7), sqrt))
user system elapsed
0.048 0.015 0.062
> system.time(sqrt(1:10^7))
user system elapsed
0.04 0.00 0.04
Here is the version information
$ R --version
R version 4.1.3 (2022-03-10) -- "One Push-Up"
Copyright (C) 2022 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin21.4.0 (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under the terms of the
GNU General Public License versions 2 or 3.
For more information about these matters see
https://www.gnu.org/licenses/.
$ sw_vers
ProductName: macOS
ProductVersion: 12.3.1
BuildVersion: 21E258
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
原因是第二个表达式只是一个长度为 1 的
list
,这与直接应用
sqrt
基本相同。相反,如果我们想纯粹对list
的每个元素执行此操作,则需要as.list
而不是list
即转换为 <如果目的是循环遍历向量的每个元素,则不需要
vector
中的 code>list 。在向量
中,每个元素都是一个单元(与矩阵
相同 - 仅具有dim
属性),但在data.frame/tibble/中data.table,每个单元是一列。因此,lapply
循环遍历 data.frame 中的单元,即列,其中作为vector
中的单个元素。当我们用list
包装一个向量时,它将整个向量封装为单个list
元素由于
sqrt
是一个向量化函数,当我们通过循环第一个列表
来应用sqrt
,它只循环一次,但在第二个列表中,它循环多次。因此,我们得到了类似的计时(当然,额外的计时是将矢量转换为
list
与as.list
)更快的选择是使用
vapply
(如果我们在循环上应用非向量化函数)The reason is that the second expression is just a
list
of length 1which is basically the same as applying
sqrt
directly. Instead, if we want to do this purely on each element of alist
, it would requireas.list
instead oflist
i.e.Converting to
list
fromvector
is unnecessary if the intention is to loop over each element of vector. In avector
, each element is a unit (same withmatrix
- only havingdim
attributes), but in a data.frame/tibble/data.table, each unit is a column. Thus,lapply
loops over the unit i.e. column in data.frame where as the single element in avector
. When we wrap a vector withlist
, it is encapsulating the whole vector as a singlelist
elementAs
sqrt
is a vectorized function, the when we apply thesqrt
by looping over the firstlist
, it loops only once, but in second, it loops multiple times.Thus, we get similar timings (of course the extra timing will be to convert the vector to
list
withas.list
)A faster option would be to use
vapply
(if we are applying non-vectorized functions on a loop)