计算组平均值、总和或其他汇总统计数据。并将列分配给原始数据

发布于 2024-11-08 11:49:21 字数 543 浏览 9 评论 0原文

我想计算 mean (或任何其他长度为 1 的汇总统计数据,例如 minmaxlength、分组变量(“组”)每个级别内数值变量(“值”)的 sum)。

摘要统计数据应分配给一个与原始数据相同长度的新变量。也就是说,原始数据的每一行都应该有一个与当前组值相对应的值 - 数据集不应该折叠为每组一行。例如,考虑组 mean

之前

id  group  value
1   a      10
2   a      20
3   b      100
4   b      200

之后

id  group  value  grp.mean.values
1   a      10     15
2   a      20     15
3   b      100    150
4   b      200    150

I want to calculate mean (or any other summary statistics of length one, e.g. min, max, length, sum) of a numeric variable ("value") within each level of a grouping variable ("group").

The summary statistic should be assigned to a new variable which has the same length as the original data. That is, each row of the original data should have a value corresponding to the current group value - the data set should not be collapsed to one row per group. For example, consider group mean:

Before

id  group  value
1   a      10
2   a      20
3   b      100
4   b      200

After

id  group  value  grp.mean.values
1   a      10     15
2   a      20     15
3   b      100    150
4   b      200    150

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

爱人如己 2024-11-15 11:49:21

您可以在 dplyr 中使用 mutate 执行此操作:

library(dplyr)
df %>%
  group_by(group) %>%
  mutate(grp.mean.values = mean(value))

...或使用 data.table 通过引用分配新列 (: =):

library(data.table)
setDT(df)[ , grp.mean.values := mean(value), by = group]

You may do this in dplyr using mutate:

library(dplyr)
df %>%
  group_by(group) %>%
  mutate(grp.mean.values = mean(value))

...or use data.table to assign the new column by reference (:=):

library(data.table)
setDT(df)[ , grp.mean.values := mean(value), by = group]
寒尘 2024-11-15 11:49:21

看一下 ave 函数。如果

df$grp.mean.values <- ave(df$value, df$group)

您想使用ave来计算每个组的其他内容,则需要指定FUN = your-desired-function,例如FUN = min:

df$grp.min <- ave(df$value, df$group, FUN = min)

Have a look at the ave function. Something like

df$grp.mean.values <- ave(df$value, df$group)

If you want to use ave to calculate something else per group, you need to specify FUN = your-desired-function, e.g. FUN = min:

df$grp.min <- ave(df$value, df$group, FUN = min)
初雪 2024-11-15 11:49:21

一种选择是使用plyrddply 需要一个 data.frame (第一个 d)并返回一个 data.frame (第二个 d)。其他 XXply 函数的工作方式类似;即ldply需要一个list并返回一个data.framedlply则相反......等等等等。第二个参数是分组变量。第三个参数是我们要为每个组计算的函数。

require(plyr)
ddply(dat, "group", transform, grp.mean.values = mean(value))

  id group value grp.mean.values
1  1     a    10              15
2  2     a    20              15
3  3     b   100             150
4  4     b   200             150

One option is to use plyr. ddply expects a data.frame (the first d) and returns a data.frame (the second d). Other XXply functions work in a similar way; i.e. ldply expects a list and returns a data.frame, dlply does the opposite...and so on and so forth. The second argument is the grouping variable(s). The third argument is the function we want to compute for each group.

require(plyr)
ddply(dat, "group", transform, grp.mean.values = mean(value))

  id group value grp.mean.values
1  1     a    10              15
2  2     a    20              15
3  3     b   100             150
4  4     b   200             150
蒗幽 2024-11-15 11:49:21

这是使用基本函数aggregatemerge的另一个选项:

merge(x, aggregate(value ~ group, data = x, mean), 
     by = "group", suffixes = c("", "mean"))

  group id value.x value.y
1     a  1      10      15
2     a  2      20      15
3     b  3     100     150
4     b  4     200     150

您可以使用后缀获得“更好”的列名称:

merge(x, aggregate(value ~ group, data = x, mean), 
     by = "group", suffixes = c("", ".mean"))


  group id value value.mean
1     a  1    10         15
2     a  2    20         15
3     b  3   100        150
4     b  4   200        150

Here is another option using base functions aggregate and merge:

merge(x, aggregate(value ~ group, data = x, mean), 
     by = "group", suffixes = c("", "mean"))

  group id value.x value.y
1     a  1      10      15
2     a  2      20      15
3     b  3     100     150
4     b  4     200     150

You can get "better" column names with suffixes:

merge(x, aggregate(value ~ group, data = x, mean), 
     by = "group", suffixes = c("", ".mean"))


  group id value value.mean
1     a  1    10         15
2     a  2    20         15
3     b  3   100        150
4     b  4   200        150
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文