计算频率和从长到宽投射的更快方法
我试图获取两个变量“week”和“id”的级别的每个组合的计数。我希望结果将“id”作为行,“week”作为列,并将计数作为值。
到目前为止我尝试过的示例(尝试了很多其他操作,包括添加虚拟变量 = 1,然后添加 fun.aggregate = sum
):
library(plyr)
ddply(data, .(id), dcast, id ~ week, value_var = "id",
fun.aggregate = length, fill = 0, .parallel = TRUE)
但是,我一定做错了什么因为这个功能还没有完成。有更好的方法吗?
输入:
id week
1 1
1 2
1 3
1 1
2 3
输出:
1 2 3
1 2 1 1
2 0 0 1
I am trying to obtain counts of each combination of levels of two variables, "week" and "id". I'd like the result to have "id" as rows, and "week" as columns, and the counts as the values.
Example of what I've tried so far (tried a bunch of other things, including adding a dummy variable = 1 and then fun.aggregate = sum
over that):
library(plyr)
ddply(data, .(id), dcast, id ~ week, value_var = "id",
fun.aggregate = length, fill = 0, .parallel = TRUE)
However, I must be doing something wrong because this function is not finishing. Is there a better way to do this?
Input:
id week
1 1
1 2
1 3
1 1
2 3
Output:
1 2 3
1 2 1 1
2 0 0 1
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您可以只使用
table
命令:如果“id”和“week”是数据框中唯一的列,您可以简单地使用:
You could just use the
table
command:If "id" and "week" are the only columns in your data frame, you can simply use:
为此,您不需要
ddply
。来自reshape2
的dcast
就足够了:编辑: 对于基本 R 解决方案(
table
除外 - 如发布者Joshua Uhlrich),尝试 xtabs:You don't need
ddply
for this. Thedcast
fromreshape2
is sufficient:Edit : For a base R solution (other than
table
- as posted by Joshua Uhlrich), tryxtabs
:ddply 花费这么长时间的原因是按组拆分不是并行运行的(仅对“拆分”进行计算),因此对于大量组,它会很慢(并且< code>.parallel = T) 没有帮助。
使用
data.table::dcast
(data.table
version >= 1.9.2)的方法在时间和内存方面应该非常高效。在这种情况下,我们可以依赖默认参数值并简单地使用:或显式设置参数:
对于
data.table
1.9.2 之前的替代方案,请参阅编辑。The reason
ddply
is taking so long is that the splitting by group is not run in parallel (only the computations on the 'splits'), therefore with a large number of groups it will be slow (and.parallel = T
) will not help.An approach using
data.table::dcast
(data.table
version >= 1.9.2) should be extremely efficient in time and memory. In this case, we can rely on default argument values and simply use:Or setting the arguments explicitly:
For pre-
data.table
1.9.2 alternatives, see edits.tidyverse
选项可以是:仅使用
pivot_wider
-或使用
janitor
中的tabyl
:data
A
tidyverse
option could be :Using only
pivot_wider
-Or using
tabyl
fromjanitor
:data