r dplyr使用group_by总结均值和stdev

发布于 2025-01-30 09:59:45 字数 1588 浏览 3 评论 0原文

我有一个看起来像这样的数据框：

df <- data.frame("Experiment" = c(rep("Exp1", 6), rep("Exp2", 5), rep("Exp3", 4)),
                 "Replicate" = c("A","A","A","B","C","C","A","A","B","B","C","A","B","B","C"),
                 "Type" = c("alpha","beta","gamma","alpha","alpha","beta","alpha","gamma","beta","gamma","beta","alpha","alpha","gamma","beta"),
                 "Frequency" = c(10,100,1000,15,5,105,10,1010,95,1020,105,15,10,990,100))

我正在尝试计算频率的均值用于实验和类型的组合，而我首先通过运行此行尝试：

df %>% group_by(Experiment, Type) %>% summarise(mean = mean(Frequency), sd = sd(Frequency)

如果我运行此行，我会得到一个看起来像下面的tibble：

Experiment   Type   mean   sd
Exp1         alpha  10     5
Exp1         beta   102.   3.54
Epx1         gamma  1000   NA

但是我希望R认为所有type> （alpha， beta，gamma）应为实验>实验和replicate）存在>频率 类型的值，r将使用0而不是不包括该值。

换句话说，我想要的需要按照以下方式计算：

Experiment   Type   mean              sd
Exp1         alpha  mean(10,15,5)     sd(10,15,5)
Exp1         beta   mean(100,0,105)   sd(100,0,105)
Exp1         gamma  mean(1000,0,0)    sd(1000,0,0)

例如，对于exp1 beta，总结函数我在上面使用了上述计算<代码>平均值（100,105）和SD（100,105）因为exp1 复制B在我的df中不存在。但是我希望r计算平均值（100,0,105）和sd（100,0,105）而不是。有人能给我一些关于如何做到这一点的想法吗？

原文

I have a dataframe that looks like this:

df <- data.frame("Experiment" = c(rep("Exp1", 6), rep("Exp2", 5), rep("Exp3", 4)),
                 "Replicate" = c("A","A","A","B","C","C","A","A","B","B","C","A","B","B","C"),
                 "Type" = c("alpha","beta","gamma","alpha","alpha","beta","alpha","gamma","beta","gamma","beta","alpha","alpha","gamma","beta"),
                 "Frequency" = c(10,100,1000,15,5,105,10,1010,95,1020,105,15,10,990,100))

I'm trying to calculate mean and stdev of Frequency for combination of Experiment and Type, and I first tried it by running this line:

df %>% group_by(Experiment, Type) %>% summarise(mean = mean(Frequency), sd = sd(Frequency)

If I run this, I get a tibble that looks like below:

Experiment   Type   mean   sd
Exp1         alpha  10     5
Exp1         beta   102.   3.54
Epx1         gamma  1000   NA

But I'd like R to think that all Type (alpha, beta, gamma) should exist for every combination of Experiment and Replicate, so that if there is no Frequency value for Type, R will use 0 instead of not including that value.

In other words, what I want needs to be calculated like below:

Experiment   Type   mean              sd
Exp1         alpha  mean(10,15,5)     sd(10,15,5)
Exp1         beta   mean(100,0,105)   sd(100,0,105)
Exp1         gamma  mean(1000,0,0)    sd(1000,0,0)

For example, for Exp1 beta, the summarise function I used above calculates mean(100,105) and sd(100,105) because Exp1 Replicate B doesn't exist in my df. But I want R to calculate mean(100,0,105) and sd(100,0,105) instead. Would anyone be able to give me some ideas on how to do this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

亣腦蒛氧 2025-02-06 09:59:45

您需要首先完成您的数据框架以用0填充缺失的数据，然后将“已完成”数据帧输送到您的功能。

library(tidyverse)

df %>% 
  complete(Experiment, Type, Replicate, fill = list(Frequency = 0)) %>% 
  group_by(Experiment, Type) %>% 
  summarise(mean = mean(Frequency), sd = sd(Frequency), .groups = "drop")

# A tibble: 9 × 4
  Experiment Type    mean     sd
  <chr>      <chr>  <dbl>  <dbl>
1 Exp1       alpha  10      5   
2 Exp1       beta   68.3   59.2 
3 Exp1       gamma 333.   577.  
4 Exp2       alpha   3.33   5.77
5 Exp2       beta   66.7   58.0 
6 Exp2       gamma 677.   586.  
7 Exp3       alpha   8.33   7.64
8 Exp3       beta   33.3   57.7 
9 Exp3       gamma 330    572.

You need to first complete your dataframe to fill in missing data with 0, then pipe the "completed" dataframe to your functions.

library(tidyverse)

df %>% 
  complete(Experiment, Type, Replicate, fill = list(Frequency = 0)) %>% 
  group_by(Experiment, Type) %>% 
  summarise(mean = mean(Frequency), sd = sd(Frequency), .groups = "drop")

# A tibble: 9 × 4
  Experiment Type    mean     sd
  <chr>      <chr>  <dbl>  <dbl>
1 Exp1       alpha  10      5   
2 Exp1       beta   68.3   59.2 
3 Exp1       gamma 333.   577.  
4 Exp2       alpha   3.33   5.77
5 Exp2       beta   66.7   58.0 
6 Exp2       gamma 677.   586.  
7 Exp3       alpha   8.33   7.64
8 Exp3       beta   33.3   57.7 
9 Exp3       gamma 330    572.

回复收藏 0 原文

始终不够 2025-02-06 09:59:45

您需要在group_by功能中包含replicate，然后将输出转换为更宽的tibble。数字列可以通过替换Na值进行突变。然后，串联平均值和SD列将产生所需的输出。

df %>% group_by(Experiment, Type, Replicate) %>% 
  summarise(mean = mean(Frequency), sd = sd(Frequency)) %>% 
  pivot_wider(names_from = Replicate, values_from =  c(mean, sd)) %>% 
mutate(across(where(is.double),~ replace_na(.,0))) %>% 
  mutate(mean = paste0("mean(", mean_A, ",", mean_B, ",", mean_C, ")"),
         sd = paste0("sd(", sd_A, ",", sd_B, ",", sd_C, ")")) %>% 
  select(Experiment, Type, mean, sd)

输出是

# A tibble: 9 x 4
# Groups:   Experiment, Type [9]
  Experiment Type  mean              sd       
  <chr>      <chr> <chr>             <chr>    
1 Exp1       alpha mean(10,15,5)     sd(0,0,0)
2 Exp1       beta  mean(100,0,105)   sd(0,0,0)
3 Exp1       gamma mean(1000,0,0)    sd(0,0,0)
4 Exp2       alpha mean(10,0,0)      sd(0,0,0)
5 Exp2       beta  mean(0,95,105)    sd(0,0,0)
6 Exp2       gamma mean(1010,1020,0) sd(0,0,0)
7 Exp3       alpha mean(15,10,0)     sd(0,0,0)
8 Exp3       beta  mean(0,0,100)     sd(0,0,0)
9 Exp3       gamma mean(0,990,0)     sd(0,0,0)

You need to include Replicate in the group_by function and conver the output into a wider tibble. The number columns can be mutated by replacing NA values. Then, concatenating the mean and sd columns would give the desired output.

df %>% group_by(Experiment, Type, Replicate) %>% 
  summarise(mean = mean(Frequency), sd = sd(Frequency)) %>% 
  pivot_wider(names_from = Replicate, values_from =  c(mean, sd)) %>% 
mutate(across(where(is.double),~ replace_na(.,0))) %>% 
  mutate(mean = paste0("mean(", mean_A, ",", mean_B, ",", mean_C, ")"),
         sd = paste0("sd(", sd_A, ",", sd_B, ",", sd_C, ")")) %>% 
  select(Experiment, Type, mean, sd)

The output is

# A tibble: 9 x 4
# Groups:   Experiment, Type [9]
  Experiment Type  mean              sd       
  <chr>      <chr> <chr>             <chr>    
1 Exp1       alpha mean(10,15,5)     sd(0,0,0)
2 Exp1       beta  mean(100,0,105)   sd(0,0,0)
3 Exp1       gamma mean(1000,0,0)    sd(0,0,0)
4 Exp2       alpha mean(10,0,0)      sd(0,0,0)
5 Exp2       beta  mean(0,95,105)    sd(0,0,0)
6 Exp2       gamma mean(1010,1020,0) sd(0,0,0)
7 Exp3       alpha mean(15,10,0)     sd(0,0,0)
8 Exp3       beta  mean(0,0,100)     sd(0,0,0)
9 Exp3       gamma mean(0,990,0)     sd(0,0,0)

回复收藏 0 原文

~没有更多了~