r dplyr使用group_by总结均值和stdev
我有一个看起来像这样的数据框:
df <- data.frame("Experiment" = c(rep("Exp1", 6), rep("Exp2", 5), rep("Exp3", 4)),
"Replicate" = c("A","A","A","B","C","C","A","A","B","B","C","A","B","B","C"),
"Type" = c("alpha","beta","gamma","alpha","alpha","beta","alpha","gamma","beta","gamma","beta","alpha","alpha","gamma","beta"),
"Frequency" = c(10,100,1000,15,5,105,10,1010,95,1020,105,15,10,990,100))
我正在尝试计算频率
的均值用于实验
和类型
的组合,而我首先通过运行此行尝试:
df %>% group_by(Experiment, Type) %>% summarise(mean = mean(Frequency), sd = sd(Frequency)
如果我运行此行,我会得到一个看起来像下面的tibble:
Experiment Type mean sd
Exp1 alpha 10 5
Exp1 beta 102. 3.54
Epx1 gamma 1000 NA
但是我希望R认为所有type
> (alpha
, beta
,gamma
)应为实验>实验
和replicate
)存在>频率 类型
的值,r将使用0
而不是不包括该值。
换句话说,我想要的需要按照以下方式计算:
Experiment Type mean sd
Exp1 alpha mean(10,15,5) sd(10,15,5)
Exp1 beta mean(100,0,105) sd(100,0,105)
Exp1 gamma mean(1000,0,0) sd(1000,0,0)
例如,对于exp1
beta
,总结
函数我在上面使用了上述计算<代码>平均值(100,105)和SD(100,105)
因为exp1
复制B
在我的df中不存在
。但是我希望r计算平均值(100,0,105)
和sd(100,0,105)
而不是。有人能给我一些关于如何做到这一点的想法吗?
I have a dataframe that looks like this:
df <- data.frame("Experiment" = c(rep("Exp1", 6), rep("Exp2", 5), rep("Exp3", 4)),
"Replicate" = c("A","A","A","B","C","C","A","A","B","B","C","A","B","B","C"),
"Type" = c("alpha","beta","gamma","alpha","alpha","beta","alpha","gamma","beta","gamma","beta","alpha","alpha","gamma","beta"),
"Frequency" = c(10,100,1000,15,5,105,10,1010,95,1020,105,15,10,990,100))
I'm trying to calculate mean and stdev of Frequency
for combination of Experiment
and Type
, and I first tried it by running this line:
df %>% group_by(Experiment, Type) %>% summarise(mean = mean(Frequency), sd = sd(Frequency)
If I run this, I get a tibble that looks like below:
Experiment Type mean sd
Exp1 alpha 10 5
Exp1 beta 102. 3.54
Epx1 gamma 1000 NA
But I'd like R to think that all Type
(alpha
, beta
, gamma
) should exist for every combination of Experiment
and Replicate
, so that if there is no Frequency
value for Type
, R will use 0
instead of not including that value.
In other words, what I want needs to be calculated like below:
Experiment Type mean sd
Exp1 alpha mean(10,15,5) sd(10,15,5)
Exp1 beta mean(100,0,105) sd(100,0,105)
Exp1 gamma mean(1000,0,0) sd(1000,0,0)
For example, for Exp1
beta
, the summarise
function I used above calculates mean(100,105)
and sd(100,105)
because Exp1
Replicate B
doesn't exist in my df
. But I want R to calculate mean(100,0,105)
and sd(100,0,105)
instead. Would anyone be able to give me some ideas on how to do this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您需要首先
完成
您的数据框架以用0填充缺失的数据,然后将“已完成”数据帧输送到您的功能。You need to first
complete
your dataframe to fill in missing data with 0, then pipe the "completed" dataframe to your functions.您需要在
group_by
功能中包含replicate
,然后将输出转换为更宽的tibble。数字列可以通过替换Na值进行突变。然后,串联平均值和SD列将产生所需的输出。输出是
You need to include
Replicate
in thegroup_by
function and conver the output into a wider tibble. The number columns can be mutated by replacing NA values. Then, concatenating the mean and sd columns would give the desired output.The output is