如何使用 R 在数据框的多列上运行tapply()？

发布于 2024-11-29 00:51:43 字数 783 浏览 1 评论 0原文

我有一个如下所示的数据框：

a   b1  b2  b3  b4  b5  b6  b7  b8  b9
D   4   6   9   5   3   9   7   9   8
F   7   3   8   1   3   1   4   4   3
R   2   5   5   1   4   2   3   1   6
D   9   2   1   4   3   3   8   2   5
D   5   4   3   1   6   4   1   8   3
R   3   7   9   1   8   5   3   4   2
D   4   1   8   2   6   3   2   7   5
F   7   1   7   2   7   1   6   2   4
D   6   3   9   3   9   9   7   1   2

函数 tapply(df[,2], INDEX = df$a, sum) 可以很好地生成一个表，该表将 df[,2] 中的所有内容相加df$a，但是当我尝试 tapply(df[,2:10], INDEX = df$a, sum) 获取类似的表时，除了每列的总和 (2, 3, 4,..., 10)，我收到一条错误消息：

tapply(df[, 2:10], INDEX = df$a, sum) 中的错误：参数必须具有相同的长度

此外，我希望表的行名称是 df[,2 的列名称:10]，这样第 1 行是 b1，第 2 行是 b2，第 9 行是 b9。

原文

I have a data frame like the following:

a   b1  b2  b3  b4  b5  b6  b7  b8  b9
D   4   6   9   5   3   9   7   9   8
F   7   3   8   1   3   1   4   4   3
R   2   5   5   1   4   2   3   1   6
D   9   2   1   4   3   3   8   2   5
D   5   4   3   1   6   4   1   8   3
R   3   7   9   1   8   5   3   4   2
D   4   1   8   2   6   3   2   7   5
F   7   1   7   2   7   1   6   2   4
D   6   3   9   3   9   9   7   1   2

The function tapply(df[,2], INDEX = df$a, sum) works fine to produce a table that sums everything in df[,2] by df$a, but when I try tapply(df[,2:10], INDEX = df$a, sum) to get a similar table, except with a sum for each column (2, 3, 4,..., 10), I get an error message reading:

Error in tapply(df[, 2:10], INDEX = df$a, sum) : arguments must have same length

Additionally, I would like the row names of the table to be the column names of df[,2:10], such that row 1 is b1, row 2 is b2, and row 9 is b9.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

为你拒绝所有暧昧 2024-12-06 00:51:43

这是因为 tapply 适用于向量，并将 df[,2:10] 转换为向量。接下来， sum 将为您提供总和，而不是每列的总和。使用aggregate()，例如：

aggregate(df[,2:10],by=list(df$a), sum)

如果你想返回一个列表，你可以使用by()。确保指定 colSums 而不是 sum，因为 by 适用于分割的数据帧：

by(df[,2:10],df$a,FUN=colSums)

That's because tapply works on vectors, and transforms df[,2:10] to a vector. Next to that, sum will give you the total sum, not the sum per column. Use aggregate(), eg :

aggregate(df[,2:10],by=list(df$a), sum)

If you want a list returned, you could use by() for that. Make sure to specify colSums instead of sum, as by works on a splitted dataframe :

by(df[,2:10],df$a,FUN=colSums)

回复收藏 0 原文

极度宠爱 2024-12-06 00:51:43

另一种可能性是结合 apply 和 tapply。

apply(df[,-1], 2, function(x) tapply(x, df$a, sum))

将产生输出（这是一个矩阵），

    b1  ...   b9
D   sD1 ...  sD9
F   sF1 ...  sF9
R   sR1 ...  sR9

然后您可以使用 as.data.frame() 获取数据帧作为输出。

Another possibility is to combine apply and tapply.

apply(df[,-1], 2, function(x) tapply(x, df$a, sum))

Will produce the output (which is a matrix)

    b1  ...   b9
D   sD1 ...  sD9
F   sF1 ...  sF9
R   sR1 ...  sR9

You can then use as.data.frame() to get a data frame as output.

回复收藏 0 原文

究竟谁懂我的在乎 2024-12-06 00:51:43

这是一种应用data.table来解决这个问题的方法。

library(data.table)
DT <- data.table(df)
DT[, lapply(.SD, sum), by=a]

这是一个 dplyr 方法

library(dplyr)
df %>% group_by(a) %>% summarise_all(funs(sum))

Here is a way to apply data.table to this problem.

library(data.table)
DT <- data.table(df)
DT[, lapply(.SD, sum), by=a]

And here is a dplyr approach

library(dplyr)
df %>% group_by(a) %>% summarise_all(funs(sum))

回复收藏 0 原文

~没有更多了~

关于作者

萌︼了一个春

暂无简介

0 文章

0 评论

24 人气

关注发私信

友情链接

文江博客

如何使用 R 在数据框的多列上运行tapply()？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

胡图图

zt006

z祗昰~

冰葑

野の

天空

友情链接

如何使用 R 在数据框的多列上运行tapply()？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

胡图图

zt006

z祗昰~

冰葑

野の

天空

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。