当前位置：文江博客话题详情

按多列分组并对其他多列求和

发布于 2024-12-17 00:22:34 字数 216 浏览 4 评论 0原文

我有一个包含大约 200 列的数据框，我想按其中的前 10 列对表进行分组，这些列是因子，并对其余列进行求和。

我有我想要分组的所有列名称的列表以及我想要聚合的所有列的列表。

我正在寻找的输出格式需要是具有相同数量列的相同数据帧，只是分组在一起。

是否有使用包 data.table、plyr 或任何其他包的解决方案？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

﹉夏雨初晴づ 2024-12-24 00:22:34

data.table 方式是：

DT[, lapply(.SD,sum), by=list(col1,col2,col3,...)]

或

DT[, lapply(.SD,sum), by=colnames(DT)[1:10]]

其中 .SD 是 (S) (D)ata 不包括组列的子集。（旁白：如果您需要一般性地引用组列，它们位于 .BY 中。）

The data.table way is :

DT[, lapply(.SD,sum), by=list(col1,col2,col3,...)]

DT[, lapply(.SD,sum), by=colnames(DT)[1:10]]

where .SD is the (S)ubset of (D)ata excluding group columns. (Aside: If you need to refer to group columns generically, they are in .BY.)

回复收藏 0 原文

戏蝶舞 2024-12-24 00:22:34

请参阅下面的使用 dplyr::across 的更现代的答案。

dplyr 的方式是：

library(dplyr)
df %>%
  group_by(col1, col2, col3) %>%
  summarise_each(funs(sum))

您可以使用 ?dplyr 帮助文件中提到的特殊函数进一步指定要从 summarise_each 中汇总或排除的列::选择。

See below for a more modern answer using dplyr::across.

The dplyr way would be:

library(dplyr)
df %>%
  group_by(col1, col2, col3) %>%
  summarise_each(funs(sum))

You can further specify the columns to be summarised or excluded from the summarise_each by using the special functions mentioned in the help file of ?dplyr::select.

回复收藏 0 原文

暗地喜欢 2024-12-24 00:22:34

在基本 R 中，这将是...

aggregate( as.matrix(df[,11:200]), as.list(df[,1:10]), FUN = sum)

编辑：
自从我写这篇文章以来，聚合函数已经取得了长足的进步。上面的铸造都不是必要的。

aggregate( df[,11:200], df[,1:10], FUN = sum )

有很多种写法。假设前 10 列被命名为 a1 到 a10 我喜欢下面的内容，尽管它很冗长。

aggregate(. ~ a1 + a2 + a3 + a4 + a5 + a6 + a7 + a8 + a9 + a10, data = dat, FUN = sum)

（您可以使用粘贴来构造公式并使用公式）

In base R this would be...

aggregate( as.matrix(df[,11:200]), as.list(df[,1:10]), FUN = sum)

EDIT:
The aggregate function has come a long way since I wrote this. None of the casting above is necessary.

aggregate( df[,11:200], df[,1:10], FUN = sum )

And there are a variety of ways to write this. Assuming the first 10 columns are named a1 through a10 I like the following, even though it is verbose.

aggregate(. ~ a1 + a2 + a3 + a4 + a5 + a6 + a7 + a8 + a9 + a10, data = dat, FUN = sum)

(You could use paste to construct the formula and use formula)

回复收藏 0 原文

失与倦＂ 2024-12-24 00:22:34

这似乎是 ddply 的任务（我使用 plyr 中包含的“棒球”数据集）：

library(plyr)
groupColumns = c("year","team")
dataColumns = c("hr", "rbi","sb")
res = ddply(baseball, groupColumns, function(x) colSums(x[dataColumns]))
head(res)

这为每个 groupColumns 提供了 dataColumns 中指定的列的总和。

This seems like a task for ddply (I use the 'baseball' dataset which is included with plyr):

library(plyr)
groupColumns = c("year","team")
dataColumns = c("hr", "rbi","sb")
res = ddply(baseball, groupColumns, function(x) colSums(x[dataColumns]))
head(res)

This gives per groupColumns the sum of the columns specified in dataColumns.

回复收藏 0 原文

若言繁花未落 2024-12-24 00:22:34

使用 plyr::ddply：

library(plyr)
ddply(dtfr, .(name1, name2, namex), numcolwise(sum))

Using plyr::ddply:

library(plyr)
ddply(dtfr, .(name1, name2, namex), numcolwise(sum))

回复收藏 0 原文

め可乐爱微笑 2024-12-24 00:22:34

让我们考虑这个例子：

df <- data.frame(a = 'a', b = c('a', 'a', 'b', 'b', 'b'), c = 1:5, d = 11:15,
                 stringsAsFactors = TRUE)

更新 dplyr 1.1.0 及以上

您可以使用 pick 来选择列 -

df %>% 
  group_by(pick(where(is.factor))) %>% 
  summarise(across(everything(), sum))

或者使用 .by 参数。

df %>% summarise(across(everything(), sum), .by = where(is.factor))

在 dplyr 1.1.0 之前

，_all、_at 和 _if 动词现已被取代我们现在使用across对所有因子列进行分组并对所有其他列进行求和，我们可以这样做：

library(dplyr)

df %>% 
   group_by(across(where(is.factor))) %>% 
   summarise(across(everything(), sum))

#  a     b         c     d
#  <fct> <fct> <int> <int>
#1 a     a         3    23
#2 a     b        12    42

对所有因子列和数字列进行求和：

df %>% 
  group_by(across(where(is.factor))) %>% 
  summarise(across(where(is.numeric), sum))

我们也可以按位置执行此操作，但必须小心自此以来的数字不计算分组列。

df %>% group_by(across(1:2)) %>% summarise(across(1:2, sum))

Let's consider this example :

df <- data.frame(a = 'a', b = c('a', 'a', 'b', 'b', 'b'), c = 1:5, d = 11:15,
                 stringsAsFactors = TRUE)

Update dplyr 1.1.0 onwards

You may use pick to select columns -

df %>% 
  group_by(pick(where(is.factor))) %>% 
  summarise(across(everything(), sum))

Or use the .by argument.

df %>% summarise(across(everything(), sum), .by = where(is.factor))

Before dplyr 1.1.0

_all, _at and _if verbs are now superseded and we use across now to group all the factor columns and sum all the other columns, we can do :

library(dplyr)

df %>% 
   group_by(across(where(is.factor))) %>% 
   summarise(across(everything(), sum))

#  a     b         c     d
#  <fct> <fct> <int> <int>
#1 a     a         3    23
#2 a     b        12    42

To group all factor columns and sum numeric columns :

df %>% 
  group_by(across(where(is.factor))) %>% 
  summarise(across(where(is.numeric), sum))

We can also do this by position but have to be careful of the number since it doesn't count the grouping columns.

df %>% group_by(across(1:2)) %>% summarise(across(1:2, sum))

回复收藏 0 原文

淡写薰衣草的香 2024-12-24 00:22:34

使用 dplyr 进行此操作的另一种通用方法（不需要列列表）是：

df %>% group_by_if(is.factor) %>% summarize_if(is.numeric,sum,na.rm = TRUE)

Another way to do this with dplyr that would be generic (don't need list of columns) would be:

df %>% group_by_if(is.factor) %>% summarize_if(is.numeric,sum,na.rm = TRUE)

回复收藏 0 原文

~没有更多了~

关于作者

避讳

暂无简介

文章

25 人气

关注发私信

友情链接

文江博客

按多列分组并对其他多列求和

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（7）

关于作者

相关话题

热门标签

推荐作者

紫罗兰の梦幻

-2134

liuxuanli

意中人

○愚か者の日

xxhui

友情链接

按多列分组并对其他多列求和

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（7）

关于作者

相关话题

热门标签

推荐作者

紫罗兰の梦幻

-2134

liuxuanli

意中人

○愚か者の日

xxhui

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。