数据框所有行的平均列值

发布于 2024-10-23 12:41:27 字数 302 浏览 3 评论 0原文

我有一个从这样的文件中读取的数据框:

name, points, wins, losses, margin
joe, 1, 1, 0, 1
bill, 2, 3, 0, 4
joe, 5, 2, 5, -2
cindy, 10, 2, 3, -2.5

等等。

我想平均该数据所有行的列值,有没有一种简单的方法可以在 R 中做到这一点?

例如,我想获取所有“Joe's”的平均列值,得出类似的结果

joe, 3, 1.5, 2.5, -.5

I've got a data frame that I read from a file like this:

name, points, wins, losses, margin
joe, 1, 1, 0, 1
bill, 2, 3, 0, 4
joe, 5, 2, 5, -2
cindy, 10, 2, 3, -2.5

etc.

I want to average out the column values across all rows of this data, is there an easy way to do this in R?

For example, I want to get the average column values for all "Joe's", coming out with something like

joe, 3, 1.5, 2.5, -.5

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

莫言歌 2024-10-30 12:41:27

加载数据后:

df <- structure(list(name = structure(c(3L, 1L, 3L, 2L), .Label = c("bill", "cindy", "joe"), class = "factor"), points = c(1L, 2L, 5L, 10L), wins = c(1L, 3L, 2L, 2L), losses = c(0L, 0L, 5L, 3L), margin = c(1, 4, -2, -2.5)), .Names = c("name", "points", "wins", "losses", "margin"), class = "data.frame", row.names = c(NA, -4L))

只需使用aggregate函数:

> aggregate(. ~ name, data = df, mean)
   name points wins losses margin
1  bill      2  3.0    0.0    4.0
2 cindy     10  2.0    3.0   -2.5
3   joe      3  1.5    2.5   -0.5

After loading your data:

df <- structure(list(name = structure(c(3L, 1L, 3L, 2L), .Label = c("bill", "cindy", "joe"), class = "factor"), points = c(1L, 2L, 5L, 10L), wins = c(1L, 3L, 2L, 2L), losses = c(0L, 0L, 5L, 3L), margin = c(1, 4, -2, -2.5)), .Names = c("name", "points", "wins", "losses", "margin"), class = "data.frame", row.names = c(NA, -4L))

Just use the aggregate function:

> aggregate(. ~ name, data = df, mean)
   name points wins losses margin
1  bill      2  3.0    0.0    4.0
2 cindy     10  2.0    3.0   -2.5
3   joe      3  1.5    2.5   -0.5
时光暖心i 2024-10-30 12:41:27

强制性的 plyrreshape 解决方案:

library(plyr)
ddply(df, "name", function(x) mean(x[-1]))


library(reshape)
cast(melt(df), name ~ ..., mean)

Obligatory plyr and reshape solutions:

library(plyr)
ddply(df, "name", function(x) mean(x[-1]))


library(reshape)
cast(melt(df), name ~ ..., mean)
跨年 2024-10-30 12:41:27

以及一个 data.table 解决方案,可实现简单的语法和内存效率

library(data.table)
DT <- data.table(df)
DT[,lapply(.SD, mean), by = name]

And a data.table solution for easy syntax and memory efficiency

library(data.table)
DT <- data.table(df)
DT[,lapply(.SD, mean), by = name]
稍尽春風 2024-10-30 12:41:27

我还有一个办法。
我在其他例子中展示了它。

如果我们的矩阵 xt 为:

abc d
A 1 2 3 4
A 5 6 7 8
A 9 10 11 12
A 13 14 15 16
B 17 18 19 20
B 21 22 23 24
B 25 26 27 28
B 29 30 31 32
C 33 34 35 36
C 37 38 39 40
C 41 42 43 44
C 45 46 47 48

只需几个步骤即可计算重复列的平均值:
1. 使用聚合函数计算平均值
2. 进行两项修改:聚合将 rownames 写入新的(第一)列,因此您必须将其重新定义为 rownames...
3....并通过选择列 2:xa 对象的列数来删除此列。

xa=aggregate(xt,by=list(rownames(xt)),FUN=mean)
rownames(xa)=xa[,1]
xa=xa[,2:5]

之后我们得到:

abc d
A 7 8 9 10
B 23 24 25 26
中 39 40 41 42

I have yet another way.
I show it on other example.

If we have matrix xt as:

a b c d
A 1 2 3 4
A 5 6 7 8
A 9 10 11 12
A 13 14 15 16
B 17 18 19 20
B 21 22 23 24
B 25 26 27 28
B 29 30 31 32
C 33 34 35 36
C 37 38 39 40
C 41 42 43 44
C 45 46 47 48

One can compute mean for duplicated columns in few steps:
1. Compute mean using aggregate function
2. Make two modifications: aggregate writes rownames as new (first) column so you have to define it back as a rownames...
3.... and remove this column, by selecting columns 2:number of columns of xa object.

xa=aggregate(xt,by=list(rownames(xt)),FUN=mean)
rownames(xa)=xa[,1]
xa=xa[,2:5]

After that we get:

a b c d
A 7 8 9 10
B 23 24 25 26
C 39 40 41 42

╭⌒浅淡时光〆 2024-10-30 12:41:27

您可以简单地使用 tidyverse 中的函数按名称对数据进行分组,然后通过给定函数(例如平均值)汇总所有剩余的列:

df <- tibble(name=c("joe","bill","joe","cindy"),
             points=c(1,2,5,10), wins=c(1,3,2,2),
             losses=c(0,0,5,3),
             margin=c(1,4,-2, -2.5))

df %>% dplyr::group_by(name) %>% dplyr::summarise_all(mean)

You can simply use functions from the tidyverse to group your data by name, and then summarise all remaining columns by a given function (eg. mean):

df <- tibble(name=c("joe","bill","joe","cindy"),
             points=c(1,2,5,10), wins=c(1,3,2,2),
             losses=c(0,0,5,3),
             margin=c(1,4,-2, -2.5))

df %>% dplyr::group_by(name) %>% dplyr::summarise_all(mean)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文