当前位置：文江博客话题详情

r dataframe

按组向原始数据添加一列平均值

发布于 2024-12-13 00:07:58 字数 567 浏览 0 评论 0 原文

我想根据 R data.frame 中的因子列添加一列均值。像这样：

df1 <- data.frame(X = rep(x = LETTERS[1:2], each = 3), Y = 1:6)
df2 <- aggregate(data = df1, Y ~ X, FUN = mean)
df3 <- merge(x = df1, y = df2, by = "X", suffixes = c(".Old",".New"))
df3
#   X Y.Old Y.New
# 1 A     1     2
# 2 A     2     2
# 3 A     3     2
# 4 B     4     5
# 5 B     5     5
# 6 B     6     5

为了解决这个问题，我必须创建两个不必要的data.frames。我想知道一种方法，可以将一列均值按因子列附加到我的原始 data.frame 中，而不创建任何额外的 data.frames。感谢您的时间和帮助。

原文

I want to add a column of means based on factor column in R data.frame. Like this:

df1 <- data.frame(X = rep(x = LETTERS[1:2], each = 3), Y = 1:6)
df2 <- aggregate(data = df1, Y ~ X, FUN = mean)
df3 <- merge(x = df1, y = df2, by = "X", suffixes = c(".Old",".New"))
df3
#   X Y.Old Y.New
# 1 A     1     2
# 2 A     2     2
# 3 A     3     2
# 4 B     4     5
# 5 B     5     5
# 6 B     6     5

To accomplish this problem I've to create two unnecessary data.frames. I'd like to know a way to append a column of means by factor column into my original data.frame without creating any extra data.frames. Thanks for your time and help.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

疑心病 2024-12-20 00:07:58

执行此操作的两种替代方法：

1) 使用 dplyr 包：

library(dplyr)
df1 <- df1 %>% 
  group_by(X) %>% 
  mutate(Y.new = mean(Y))

2) 带有 data.table 包：

library(data.table)
setDT(df1)[, Y.new := mean(Y), by = X]

两者都给出以下结果：

<前><代码>> df1
XY Y 新
1：A 1 2
2：A 2 2
3：A 3 2
4：B 4 5
5：B 5 5
6：B 6 5

Two alternative ways of doing this:

1) with the dplyr package:

library(dplyr)
df1 <- df1 %>% 
  group_by(X) %>% 
  mutate(Y.new = mean(Y))

2) with the data.table package:

library(data.table)
setDT(df1)[, Y.new := mean(Y), by = X]

both give the following result:

> df1
   X Y Y.new
1: A 1     2
2: A 2     2
3: A 3     2
4: B 4     5
5: B 5     5
6: B 6     5

回复收藏 0 原文

亽野灬性zι浪 2024-12-20 00:07:58

这就是 ave 函数的用途。

df1$Y.New <- ave(df1$Y, df1$X)

This is what the ave function is for.

df1$Y.New <- ave(df1$Y, df1$X)

回复收藏 0 原文

清风疏影 2024-12-20 00:07:58

ddply 和 transform 可以解决这个问题（尽管我确信您至少会得到 4 种不同的方法来做到这一点）：

library(plyr)
ddply(df1,.(X),transform,Y.New = mean(Y))
  X Y Y.New
1 A 1     2
2 A 2     2
3 A 3     2
4 B 4     5
5 B 5     5
6 B 6     5

ddply and transform to the rescue (although I'm sure you'll get at least 4 different ways to do this):

library(plyr)
ddply(df1,.(X),transform,Y.New = mean(Y))
  X Y Y.New
1 A 1     2
2 A 2     2
3 A 3     2
4 B 4     5
5 B 5     5
6 B 6     5

回复收藏 0 原文

久隐师 2024-12-20 00:07:58

乔兰回答得很漂亮，这不是对你问题的回答，而是谈话的延伸。如果您正在寻找两个分类变量与因变量的关系的均值表，则可以使用 Hadley 函数：

cast(CO2, Type ~ Treatment, value="uptake", fun.aggregate=mean, margins=TRUE)

这是 CO2 数据的主视图，并查看均值表：

> head(CO2)
  Plant   Type  Treatment conc uptake
1   Qn1 Quebec nonchilled   95   16.0
2   Qn1 Quebec nonchilled  175   30.4
3   Qn1 Quebec nonchilled  250   34.8
4   Qn1 Quebec nonchilled  350   37.2
5   Qn1 Quebec nonchilled  500   35.3
6   Qn1 Quebec nonchilled  675   39.2

> library(reshape)

> cast(CO2, Type ~ Treatment, mean, margins=TRUE)  
         Type nonchilled  chilled    (all)
1      Quebec   35.33333 31.75238 33.54286
2 Mississippi   25.95238 15.81429 20.88333
3       (all)   30.64286 23.78333 27.21310

Joran answered beautifully, This is not an answer to your question but an extension of the conversation. If you're looking for table of means for two categorical variable's relationship to a dependent here's the Hadley function for that:

cast(CO2, Type ~ Treatment, value="uptake", fun.aggregate=mean, margins=TRUE)

Here's a head view of CO2 data, and a look at the means table:

> head(CO2)
  Plant   Type  Treatment conc uptake
1   Qn1 Quebec nonchilled   95   16.0
2   Qn1 Quebec nonchilled  175   30.4
3   Qn1 Quebec nonchilled  250   34.8
4   Qn1 Quebec nonchilled  350   37.2
5   Qn1 Quebec nonchilled  500   35.3
6   Qn1 Quebec nonchilled  675   39.2

> library(reshape)

> cast(CO2, Type ~ Treatment, mean, margins=TRUE)  
         Type nonchilled  chilled    (all)
1      Quebec   35.33333 31.75238 33.54286
2 Mississippi   25.95238 15.81429 20.88333
3       (all)   30.64286 23.78333 27.21310

回复收藏 0 原文

~没有更多了~

关于作者

小苏打饼

暂无简介

0 文章

0 评论

24 人气

关注发私信

友情链接

文江博客

按组向原始数据添加一列平均值

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

游缘惊梦

小兔几

Glik

生生漫

Luxian

Champion-Ming

友情链接

按组向原始数据添加一列平均值

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

游缘惊梦

小兔几

Glik

生生漫

Luxian

Champion-Ming

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。