计算每组的行数并将结果添加到原始数据框

发布于 2024-12-05 04:41:04 字数 738 浏览 1 评论 0原文

假设我有一个 data.frame 对象:

df <- data.frame(name=c('black','black','black','red','red'),
                 type=c('chair','chair','sofa','sofa','plate'),
                 num=c(4,5,12,4,3))

现在我想计算 nametype 的每个组合的行数(观察值) 。这可以像这样完成:

table(df[ , c("name","type")])

或者也可以使用plyr(尽管我不确定如何)。

但是,如何将结果合并到原始数据框中呢?结果将如下所示:

df
#    name  type num count
# 1 black chair   4     2
# 2 black chair   5     2
# 3 black  sofa  12     1
# 4   red  sofa   4     1
# 5   red plate   3     1

其中 count 现在存储聚合结果。

使用 plyr 的解决方案学习起来也很有趣,尽管我想看看如何使用基础 R 来完成此任务。

Say I have a data.frame object:

df <- data.frame(name=c('black','black','black','red','red'),
                 type=c('chair','chair','sofa','sofa','plate'),
                 num=c(4,5,12,4,3))

Now I want to count the number of rows (observations) of for each combination of name and type. This can be done like so:

table(df[ , c("name","type")])

or possibly also with plyr, (though I am not sure how).

However, how do I get the results incorporated into the original data frame? So that the results will look like this:

df
#    name  type num count
# 1 black chair   4     2
# 2 black chair   5     2
# 3 black  sofa  12     1
# 4   red  sofa   4     1
# 5   red plate   3     1

where count now stores the results from the aggregation.

A solution with plyr could be interesting to learn as well, though I would like to see how this is done with base R.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(12

我一直都在从未离去 2024-12-12 04:41:04

使用 data.table

library(data.table)
dt = as.data.table(df)

# or coerce to data.table by reference:
# setDT(df)

dt[ , count := .N, by = .(name, type)]

对于 data.table 1.8.2 之前的替代方案,请参阅编辑历史记录。


使用dplyr

library(dplyr)
df %>%
  group_by(name, type) %>%
  mutate(count = n())

或者简单地:

add_count(df, name, type)

使用plyr

plyr::ddply(df, .(name, type), transform, count = length(num))

Using data.table:

library(data.table)
dt = as.data.table(df)

# or coerce to data.table by reference:
# setDT(df)

dt[ , count := .N, by = .(name, type)]

For pre-data.table 1.8.2 alternative, see edit history.


Using dplyr:

library(dplyr)
df %>%
  group_by(name, type) %>%
  mutate(count = n())

Or simply:

add_count(df, name, type)

Using plyr:

plyr::ddply(df, .(name, type), transform, count = length(num))
寒尘 2024-12-12 04:41:04

您可以使用ave

df$count <- ave(df$num, df[,c("name","type")], FUN=length)

You can use ave:

df$count <- ave(df$num, df[,c("name","type")], FUN=length)
長街聽風 2024-12-12 04:41:04

你可以这样做:

> ddply(df,.(name,type),transform,count = NROW(piece))
   name  type num count
1 black chair   4     2
2 black chair   5     2
3 black  sofa  12     1
4   red plate   3     1
5   red  sofa   4     1

或者也许更直观地,

> ddply(df,.(name,type),transform,count = length(num))
   name  type num count
1 black chair   4     2
2 black chair   5     2
3 black  sofa  12     1
4   red plate   3     1
5   red  sofa   4     1

You can do this:

> ddply(df,.(name,type),transform,count = NROW(piece))
   name  type num count
1 black chair   4     2
2 black chair   5     2
3 black  sofa  12     1
4   red plate   3     1
5   red  sofa   4     1

or perhaps more intuitively,

> ddply(df,.(name,type),transform,count = length(num))
   name  type num count
1 black chair   4     2
2 black chair   5     2
3 black  sofa  12     1
4   red plate   3     1
5   red  sofa   4     1
乱了心跳 2024-12-12 04:41:04

这应该可以完成你的工作:

df_agg <- aggregate(num~name+type,df,FUN=NROW)
names(df_agg)[3] <- "count"
df <- merge(df,df_agg,by=c('name','type'),all.x=TRUE)

This should do your work :

df_agg <- aggregate(num~name+type,df,FUN=NROW)
names(df_agg)[3] <- "count"
df <- merge(df,df_agg,by=c('name','type'),all.x=TRUE)
友欢 2024-12-12 04:41:04

基本的R函数aggregate将通过一行获取计数,但是将这些计数添加回原始的data.frame似乎需要一点处理。

df <- data.frame(name=c('black','black','black','red','red'),
                 type=c('chair','chair','sofa','sofa','plate'),
                 num=c(4,5,12,4,3))
df
#    name  type num
# 1 black chair   4
# 2 black chair   5
# 3 black  sofa  12
# 4   red  sofa   4
# 5   red plate   3

rows.per.group  <- aggregate(rep(1, length(paste0(df$name, df$type))),
                             by=list(df$name, df$type), sum)
rows.per.group
#   Group.1 Group.2 x
# 1   black   chair 2
# 2     red   plate 1
# 3   black    sofa 1
# 4     red    sofa 1

my.summary <- do.call(data.frame, rows.per.group)
colnames(my.summary) <- c(colnames(df)[1:2], 'rows.per.group')
my.data <- merge(df, my.summary, by = c(colnames(df)[1:2]))
my.data
#    name  type num rows.per.group
# 1 black chair   4              2
# 2 black chair   5              2
# 3 black  sofa  12              1
# 4   red plate   3              1
# 5   red  sofa   4              1

The base R function aggregate will obtain the counts with a one-liner, but adding those counts back to the original data.frame seems to take a bit of processing.

df <- data.frame(name=c('black','black','black','red','red'),
                 type=c('chair','chair','sofa','sofa','plate'),
                 num=c(4,5,12,4,3))
df
#    name  type num
# 1 black chair   4
# 2 black chair   5
# 3 black  sofa  12
# 4   red  sofa   4
# 5   red plate   3

rows.per.group  <- aggregate(rep(1, length(paste0(df$name, df$type))),
                             by=list(df$name, df$type), sum)
rows.per.group
#   Group.1 Group.2 x
# 1   black   chair 2
# 2     red   plate 1
# 3   black    sofa 1
# 4     red    sofa 1

my.summary <- do.call(data.frame, rows.per.group)
colnames(my.summary) <- c(colnames(df)[1:2], 'rows.per.group')
my.data <- merge(df, my.summary, by = c(colnames(df)[1:2]))
my.data
#    name  type num rows.per.group
# 1 black chair   4              2
# 2 black chair   5              2
# 3 black  sofa  12              1
# 4   red plate   3              1
# 5   red  sofa   4              1
烟燃烟灭 2024-12-12 04:41:04

使用sqldf包:

library(sqldf)

sqldf("select a.*, b.cnt
       from df a,
           (select name, type, count(1) as cnt
            from df
            group by name, type) b
      where a.name = b.name and
            a.type = b.type")

#    name  type num cnt
# 1 black chair   4   2
# 2 black chair   5   2
# 3 black  sofa  12   1
# 4   red  sofa   4   1
# 5   red plate   3   1

Using sqldf package:

library(sqldf)

sqldf("select a.*, b.cnt
       from df a,
           (select name, type, count(1) as cnt
            from df
            group by name, type) b
      where a.name = b.name and
            a.type = b.type")

#    name  type num cnt
# 1 black chair   4   2
# 2 black chair   5   2
# 3 black  sofa  12   1
# 4   red  sofa   4   1
# 5   red plate   3   1
差↓一点笑了 2024-12-12 04:41:04

另一种选择是使用 dplyr< 中的 add_tally /代码>。以下是一个可重现的示例:

df <- data.frame(name=c('black','black','black','red','red'),
                 type=c('chair','chair','sofa','sofa','plate'),
                 num=c(4,5,12,4,3))
library(dplyr)
df %>%
  group_by(name, type) %>%
  add_tally(name = "count")
#> # A tibble: 5 × 4
#> # Groups:   name, type [4]
#>   name  type    num count
#>   <chr> <chr> <dbl> <int>
#> 1 black chair     4     2
#> 2 black chair     5     2
#> 3 black sofa     12     1
#> 4 red   sofa      4     1
#> 5 red   plate     3     1

于 2022 年 9 月 11 日使用 reprex v2.0.2 创建

Another option using add_tally from dplyr. Here is a reproducible example:

df <- data.frame(name=c('black','black','black','red','red'),
                 type=c('chair','chair','sofa','sofa','plate'),
                 num=c(4,5,12,4,3))
library(dplyr)
df %>%
  group_by(name, type) %>%
  add_tally(name = "count")
#> # A tibble: 5 × 4
#> # Groups:   name, type [4]
#>   name  type    num count
#>   <chr> <chr> <dbl> <int>
#> 1 black chair     4     2
#> 2 black chair     5     2
#> 3 black sofa     12     1
#> 4 red   sofa      4     1
#> 5 red   plate     3     1

Created on 2022-09-11 with reprex v2.0.2

墨洒年华 2024-12-12 04:41:04

两行替代方法是生成一个 0 变量,然后用 split<-splitlengths 填充它,如下所示

# generate vector of 0s
df$count <-0L

# fill it in
split(df$count, df[c("name", "type")]) <- lengths(split(df$num, df[c("name", "type")]))

:返回所需的结果

df
   name  type num count
1 black chair   4     2
2 black chair   5     2
3 black  sofa  12     1
4   red  sofa   4     1
5   red plate   3     1

本质上,RHS 计算每个名称-类型组合的长度,返回长度为 6 的命名向量,其中“red.chair”和“black.plate”均为 0。它通过 split <- 被馈送到 LHS,它获取向量并适当地添加给定点中的值。这本质上就是 ave 的作用,您可以看到 ave 的倒数第二行是

split(x, g) <- lapply(split(x, g), FUN)

然而,lengthslengths 的优化版本代码>sapply(列表,长度)。

A two line alternative is to generate a variable of 0s and then fill it in with split<-, split, and lengths like this:

# generate vector of 0s
df$count <-0L

# fill it in
split(df$count, df[c("name", "type")]) <- lengths(split(df$num, df[c("name", "type")]))

This returns the desired result

df
   name  type num count
1 black chair   4     2
2 black chair   5     2
3 black  sofa  12     1
4   red  sofa   4     1
5   red plate   3     1

Essentially, the RHS calculates the lengths of each name-type combination, returning a named vector of length 6 with 0s for "red.chair" and "black.plate." This is fed to the LHS with split <- which takes the vector and appropriately adds the values in their given spots. This is essentially what ave does, as you can see that the second to final line of ave is

split(x, g) <- lapply(split(x, g), FUN)

However, lengths is an optimized version of sapply(list, length).

乖不如嘢 2024-12-12 04:41:04

您距离将行计数合并到基础数据集中仅一步之遥。

使用 broom 包中的 tidy() 函数,将频率表转换为数据框并使用 df 进行内连接:

df <- data.frame(name=c('black','black','black','red','red'),
                         type=c('chair','chair','sofa','sofa','plate'),
                         num=c(4,5,12,4,3))
library(broom)
df <- merge(df, tidy(table(df[ , c("name","type")])), by=c("name","type"))
df
   name  type num Freq
1 black chair   4    2
2 black chair   5    2
3 black  sofa  12    1
4   red plate   3    1
5   red  sofa   4    1

You were just one step away from incorporating the row count into the base dataset.

Using the tidy() function from the broom package, convert the frequency table into a data frame and inner join with df:

df <- data.frame(name=c('black','black','black','red','red'),
                         type=c('chair','chair','sofa','sofa','plate'),
                         num=c(4,5,12,4,3))
library(broom)
df <- merge(df, tidy(table(df[ , c("name","type")])), by=c("name","type"))
df
   name  type num Freq
1 black chair   4    2
2 black chair   5    2
3 black  sofa  12    1
4   red plate   3    1
5   red  sofa   4    1
无所谓啦 2024-12-12 04:41:04

基本 R 中的一行简单代码:

df$count = table(interaction(df[, (c("name", "type"))]))[interaction(df[, (c("name", "type"))])]

为了清晰/高效,两行相同:

fact = interaction(df[, (c("name", "type"))])
df$count = table(fact)[fact]

One simple line in base R:

df$count = table(interaction(df[, (c("name", "type"))]))[interaction(df[, (c("name", "type"))])]

Same in two lines, for clarity/efficiency:

fact = interaction(df[, (c("name", "type"))])
df$count = table(fact)[fact]
沉睡月亮 2024-12-12 04:41:04

collapse 中,使用fcountfcount 明显比任何其他选项更快。

library(collapse)
df |> 
  fcount(name, type, add = TRUE, name = "count")

#    name  type num count
# 1 black chair   4     2
# 2 black chair   5     2
# 3 black  sofa  12     1
# 4   red  sofa   4     1
# 5   red plate   3     1

In collapse, with fcount. fcount is noticeably faster than any other options.

library(collapse)
df |> 
  fcount(name, type, add = TRUE, name = "count")

#    name  type num count
# 1 black chair   4     2
# 2 black chair   5     2
# 3 black  sofa  12     1
# 4   red  sofa   4     1
# 5   red plate   3     1
乱世争霸 2024-12-12 04:41:04

另一种更概括的方法:

df$count <- unsplit(lapply(split(df, df[c("name","type")]), nrow), df[c("name","type")])

Another way that generalizes more:

df$count <- unsplit(lapply(split(df, df[c("name","type")]), nrow), df[c("name","type")])
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文