按组向原始数据添加一列平均值

发布于 2024-12-13 00:07:58 字数 567 浏览 0 评论 0 原文

我想根据 R data.frame 中的因子列添加一列均值。像这样:

df1 <- data.frame(X = rep(x = LETTERS[1:2], each = 3), Y = 1:6)
df2 <- aggregate(data = df1, Y ~ X, FUN = mean)
df3 <- merge(x = df1, y = df2, by = "X", suffixes = c(".Old",".New"))
df3
#   X Y.Old Y.New
# 1 A     1     2
# 2 A     2     2
# 3 A     3     2
# 4 B     4     5
# 5 B     5     5
# 6 B     6     5

为了解决这个问题,我必须创建两个不必要的data.frames。我想知道一种方法,可以将一列均值按因子列附加到我的原始 data.frame 中,而不创建任何额外的 data.frames。感谢您的时间和帮助。

I want to add a column of means based on factor column in R data.frame. Like this:

df1 <- data.frame(X = rep(x = LETTERS[1:2], each = 3), Y = 1:6)
df2 <- aggregate(data = df1, Y ~ X, FUN = mean)
df3 <- merge(x = df1, y = df2, by = "X", suffixes = c(".Old",".New"))
df3
#   X Y.Old Y.New
# 1 A     1     2
# 2 A     2     2
# 3 A     3     2
# 4 B     4     5
# 5 B     5     5
# 6 B     6     5

To accomplish this problem I've to create two unnecessary data.frames. I'd like to know a way to append a column of means by factor column into my original data.frame without creating any extra data.frames. Thanks for your time and help.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

疑心病 2024-12-20 00:07:58

执行此操作的两种替代方法:

1) 使用 包:

library(dplyr)
df1 <- df1 %>% 
  group_by(X) %>% 
  mutate(Y.new = mean(Y))

2) 带有 包:

library(data.table)
setDT(df1)[, Y.new := mean(Y), by = X]

两者都给出以下结果:

<前><代码>> df1
XY Y 新
1:A 1 2
2:A 2 2
3:A 3 2
4:B 4 5
5:B 5 5
6:B 6 5

Two alternative ways of doing this:

1) with the package:

library(dplyr)
df1 <- df1 %>% 
  group_by(X) %>% 
  mutate(Y.new = mean(Y))

2) with the package:

library(data.table)
setDT(df1)[, Y.new := mean(Y), by = X]

both give the following result:

> df1
   X Y Y.new
1: A 1     2
2: A 2     2
3: A 3     2
4: B 4     5
5: B 5     5
6: B 6     5
亽野灬性zι浪 2024-12-20 00:07:58

这就是 ave 函数的用途。

df1$Y.New <- ave(df1$Y, df1$X)

This is what the ave function is for.

df1$Y.New <- ave(df1$Y, df1$X)
清风疏影 2024-12-20 00:07:58

ddplytransform 可以解决这个问题(尽管我确信您至少会得到 4 种不同的方法来做到这一点):

library(plyr)
ddply(df1,.(X),transform,Y.New = mean(Y))
  X Y Y.New
1 A 1     2
2 A 2     2
3 A 3     2
4 B 4     5
5 B 5     5
6 B 6     5

ddply and transform to the rescue (although I'm sure you'll get at least 4 different ways to do this):

library(plyr)
ddply(df1,.(X),transform,Y.New = mean(Y))
  X Y Y.New
1 A 1     2
2 A 2     2
3 A 3     2
4 B 4     5
5 B 5     5
6 B 6     5
久隐师 2024-12-20 00:07:58

乔兰回答得很漂亮,这不是对你问题的回答,而是谈话的延伸。如果您正在寻找两个分类变量与因变量的关系的均值表,则可以使用 Hadley 函数:

cast(CO2, Type ~ Treatment, value="uptake", fun.aggregate=mean, margins=TRUE)

这是 CO2 数据的主视图,并查看均值表:

> head(CO2)
  Plant   Type  Treatment conc uptake
1   Qn1 Quebec nonchilled   95   16.0
2   Qn1 Quebec nonchilled  175   30.4
3   Qn1 Quebec nonchilled  250   34.8
4   Qn1 Quebec nonchilled  350   37.2
5   Qn1 Quebec nonchilled  500   35.3
6   Qn1 Quebec nonchilled  675   39.2

> library(reshape)

> cast(CO2, Type ~ Treatment, mean, margins=TRUE)  
         Type nonchilled  chilled    (all)
1      Quebec   35.33333 31.75238 33.54286
2 Mississippi   25.95238 15.81429 20.88333
3       (all)   30.64286 23.78333 27.21310

Joran answered beautifully, This is not an answer to your question but an extension of the conversation. If you're looking for table of means for two categorical variable's relationship to a dependent here's the Hadley function for that:

cast(CO2, Type ~ Treatment, value="uptake", fun.aggregate=mean, margins=TRUE)

Here's a head view of CO2 data, and a look at the means table:

> head(CO2)
  Plant   Type  Treatment conc uptake
1   Qn1 Quebec nonchilled   95   16.0
2   Qn1 Quebec nonchilled  175   30.4
3   Qn1 Quebec nonchilled  250   34.8
4   Qn1 Quebec nonchilled  350   37.2
5   Qn1 Quebec nonchilled  500   35.3
6   Qn1 Quebec nonchilled  675   39.2

> library(reshape)

> cast(CO2, Type ~ Treatment, mean, margins=TRUE)  
         Type nonchilled  chilled    (all)
1      Quebec   35.33333 31.75238 33.54286
2 Mississippi   25.95238 15.81429 20.88333
3       (all)   30.64286 23.78333 27.21310
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文