计算频率和从长到宽投射的更快方法

发布于 2024-12-17 06:21:38 字数 520 浏览 8 评论 0原文

我试图获取两个变量“week”和“id”的级别的每个组合的计数。我希望结果将“id”作为行，“week”作为列，并将计数作为值。

到目前为止我尝试过的示例（尝试了很多其他操作，包括添加虚拟变量 = 1，然后添加 fun.aggregate = sum）：

library(plyr)
ddply(data, .(id), dcast, id ~ week, value_var = "id", 
        fun.aggregate = length, fill = 0, .parallel = TRUE)

但是，我一定做错了什么因为这个功能还没有完成。有更好的方法吗？

输入：

id      week
1       1
1       2
1       3
1       1
2       3

输出：

  1  2  3
1 2  1  1
2 0  0  1

原文

I am trying to obtain counts of each combination of levels of two variables, "week" and "id". I'd like the result to have "id" as rows, and "week" as columns, and the counts as the values.

Example of what I've tried so far (tried a bunch of other things, including adding a dummy variable = 1 and then fun.aggregate = sum over that):

library(plyr)
ddply(data, .(id), dcast, id ~ week, value_var = "id", 
        fun.aggregate = length, fill = 0, .parallel = TRUE)

However, I must be doing something wrong because this function is not finishing. Is there a better way to do this?

Input:

id      week
1       1
1       2
1       3
1       1
2       3

Output:

  1  2  3
1 2  1  1
2 0  0  1

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

抹茶夏天i‖ 2024-12-24 06:21:38

您可以只使用 table 命令：

table(data$id,data$week)

    1 2 3
  1 2 1 1
  2 0 0 1

如果“id”和“week”是数据框中唯一的列，您可以简单地使用：

table(data)
#    week
# id  1 2 3
#   1 2 1 1
#   2 0 0 1

You could just use the table command:

table(data$id,data$week)

    1 2 3
  1 2 1 1
  2 0 0 1

If "id" and "week" are the only columns in your data frame, you can simply use:

table(data)
#    week
# id  1 2 3
#   1 2 1 1
#   2 0 0 1

回复收藏 0 原文

说不完的你爱 2024-12-24 06:21:38

为此，您不需要ddply。来自 reshape2 的 dcast 就足够了：

dat <- data.frame(
    id = c(rep(1, 4), 2),
    week = c(1:3, 1, 3)
)

library(reshape2)
dcast(dat, id~week, fun.aggregate=length)

  id 1 2 3
1  1 2 1 1
2  2 0 0 1

编辑： 对于基本 R 解决方案（table 除外 - 如发布者Joshua Uhlrich），尝试 xtabs：

xtabs(~id+week, data=dat)

   week
id  1 2 3
  1 2 1 1
  2 0 0 1

You don't need ddply for this. The dcast from reshape2 is sufficient:

dat <- data.frame(
    id = c(rep(1, 4), 2),
    week = c(1:3, 1, 3)
)

library(reshape2)
dcast(dat, id~week, fun.aggregate=length)

  id 1 2 3
1  1 2 1 1
2  2 0 0 1

Edit : For a base R solution (other than table - as posted by Joshua Uhlrich), try xtabs:

xtabs(~id+week, data=dat)

   week
id  1 2 3
  1 2 1 1
  2 0 0 1

回复收藏 0 原文

左岸枫 2024-12-24 06:21:38

ddply 花费这么长时间的原因是按组拆分不是并行运行的（仅对“拆分”进行计算），因此对于大量组，它会很慢（并且< code>.parallel = T) 没有帮助。

使用 data.table::dcast（data.table version >= 1.9.2）的方法在时间和内存方面应该非常高效。在这种情况下，我们可以依赖默认参数值并简单地使用：

library(data.table) 
dcast(setDT(data), id ~ week)
# Using 'week' as value column. Use 'value.var' to override
# Aggregate function missing, defaulting to 'length'
#    id 1 2 3
# 1:  1 2 1 1
# 2:  2 0 0 1

或显式设置参数：

dcast(setDT(data), id ~ week, value.var = "week", fun = length)
#    id 1 2 3
# 1:  1 2 1 1
# 2:  2 0 0 1

对于 data.table 1.9.2 之前的替代方案，请参阅编辑。

The reason ddply is taking so long is that the splitting by group is not run in parallel (only the computations on the 'splits'), therefore with a large number of groups it will be slow (and .parallel = T) will not help.

An approach using data.table::dcast (data.table version >= 1.9.2) should be extremely efficient in time and memory. In this case, we can rely on default argument values and simply use:

library(data.table) 
dcast(setDT(data), id ~ week)
# Using 'week' as value column. Use 'value.var' to override
# Aggregate function missing, defaulting to 'length'
#    id 1 2 3
# 1:  1 2 1 1
# 2:  2 0 0 1

Or setting the arguments explicitly:

dcast(setDT(data), id ~ week, value.var = "week", fun = length)
#    id 1 2 3
# 1:  1 2 1 1
# 2:  2 0 0 1

For pre-data.table 1.9.2 alternatives, see edits.

回复收藏 0 原文

音盲 2024-12-24 06:21:38

tidyverse 选项可以是：

library(dplyr)
library(tidyr)

df %>%
  count(id, week) %>%
  pivot_wider(names_from = week, values_from = n, values_fill = list(n = 0))
  #spread(week, n, fill = 0) #In older version of tidyr

#     id   `1`   `2`   `3`
#   <dbl> <dbl> <dbl> <dbl>
#1     1     2     1     1
#2     2     0     0     1

仅使用 pivot_wider -

tidyr::pivot_wider(df, names_from = week, 
                   values_from = week, values_fn = length, values_fill = 0)

或使用 janitor 中的 tabyl：

janitor::tabyl(df, id, week)
# id 1 2 3
#  1 2 1 1
#  2 0 0 1

data

df <- structure(list(id = c(1L, 1L, 1L, 1L, 2L), week = c(1L, 2L, 3L, 
1L, 3L)), class = "data.frame", row.names = c(NA, -5L))

A tidyverse option could be :

library(dplyr)
library(tidyr)

df %>%
  count(id, week) %>%
  pivot_wider(names_from = week, values_from = n, values_fill = list(n = 0))
  #spread(week, n, fill = 0) #In older version of tidyr

#     id   `1`   `2`   `3`
#   <dbl> <dbl> <dbl> <dbl>
#1     1     2     1     1
#2     2     0     0     1

Using only pivot_wider -

tidyr::pivot_wider(df, names_from = week, 
                   values_from = week, values_fn = length, values_fill = 0)

Or using tabyl from janitor :

janitor::tabyl(df, id, week)
# id 1 2 3
#  1 2 1 1
#  2 0 0 1

data

df <- structure(list(id = c(1L, 1L, 1L, 1L, 2L), week = c(1L, 2L, 3L, 
1L, 3L)), class = "data.frame", row.names = c(NA, -5L))

回复收藏 0 原文

~没有更多了~

关于作者

绅刃

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

计算频率和从长到宽投射的更快方法

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

yuanzihao09

1337793151

横笛休吹塞上声

你在我安

qq_QhAHT0kB

aaaa123451

友情链接

计算频率和从长到宽投射的更快方法

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

yuanzihao09

1337793151

横笛休吹塞上声

你在我安

qq_QhAHT0kB

aaaa123451

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。