如何使用 apply 对 R 中的 data.frame 中的特定列进行因式分解
我有一个名为 mydata 的 data.frame 和一个向量 id,其中包含 data.frame 中我想要转换为因子的列索引。现在下面的代码解决了这个问题
for(i in ids) mydata[, i]<-as.factor(mydata[, i])
现在我想通过使用 apply 而不是显式的 for 循环来清理这段代码。
mydata[, ids]<-apply(mydata[, ids], 2, as.factor)
但是,最后一个语句给了我一个 data.frame,其中类型是字符而不是因子。我看不出这两行代码之间的区别。为什么他们没有产生相同的结果?
亲切的问候, 迈克尔
I have a data.frame called mydata and a vector ids containing indices of the columns in the data.frame that I would like to convert to factors. Now the following code solves the problem
for(i in ids) mydata[, i]<-as.factor(mydata[, i])
Now I wanted to clean this code up by using apply instead of an explicit for-loop.
mydata[, ids]<-apply(mydata[, ids], 2, as.factor)
However, the last statement gives me a data.frame where the types are character instead of factors. I fail to see the distinction between these two lines of code. Why do they not produce the same result?
Kind regards,
Michael
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
apply
的结果是向量、数组或值列表(请参阅?apply
)。对于您的问题,您应该使用
lapply
代替:请注意,这是
lapply
比for
循环快得多的地方。一般来说,循环和 lapply 将具有相似的性能,但<-.data.frame
操作非常慢。通过使用lapply,可以避免每次迭代中的<-
操作,并将其替换为单个分配。这要快得多。The result of
apply
is a vector or array or list of values (see?apply
).For your problem, you should use
lapply
instead:Notice that this is one place where
lapply
will be much faster than afor
loop. In general a loop and lapply will have similar performance, but the<-.data.frame
operation is very slow. By usinglapply
one avoids the<-
operation in each iteration, and replaces it with a single assign. This is much faster.这是因为 apply() 的工作方式完全不同。它将首先在本地环境中执行 as.factor 函数,从中收集结果,然后尝试将它们合并到数组中,而不是数据帧中。在您的情况下,该数组是一个矩阵。 R 遇到不同的因素,除了先将它们转换为字符之外没有其他方法来绑定它们。该字符矩阵用于填充您的数据框。
您可以使用 lapply (请参阅 Andrie 的答案)或 plyr 函数中的 colwise 。
That is because apply() works completely different. It will first carry out the function as.factor in a local environment, collect the results from that, and then try to merge them in to an array and not a dataframe. This array is in your case a matrix. R meets different factors and has no other way to cbind them than to convert them to character first. That character matrix is used to fill up your dataframe.
You can use lapply for that (see Andrie's answer) or colwise from the plyr function.