如果n = 1,则用组替换值
我正在尝试对包含大量缺失数据的数据集采取分组均值,并且某些组有 1 或 0 个样本可从中导出均值。我试图对每个海洋中的每个物种采取平均值。然而,对于每个海洋只有一个值(或没有)的物种,我想使用“全球”平均值 - 例如,该物种在所有海洋中的平均值(而不是仅使用一个值来取“平均值”) 。
我的数据如下所示:
species<- c("turtle","turtle","turtle","turtle",
"turtle","turtle","turtle","turtle",
"shark", "shark", "shark","shark",
"shark", "shark", "shark","shark",
"bird")
gear<- c("t", "p", "t", "p",
"t", "p", "t", "p",
"t", "p", "t", "p",
"t", "p", "t", "p",
"t" )
ocean<- c("north", "south", "east", "west",
"north", "south", "east", "west",
"north", "south", "east", "west",
"north", "south", "east", "west",
"north")
rate<-c( 0.1 , 0.2, 0.3, 0.4,
0.2 , 0.2, 0.3, 0.4,
0.1 , 0.2, 0.3, 0.4,
0.2 , 0.2, 0.3, 0.4,
0.1 )
df<- as.data.frame(cbind(species, gear, region, rate))
df$rate<-as.numeric(df$rate)
db <- df %>%
group_by(species, gear, region) %>%
summarize(mean=mean(rate),
sd=sd(rate),
n = n()) %>%
mutate(se = sd/sqrt(n),
upper_rate = mean + 1.96*se,
lower_rate = mean - 1.96*se)
我想做的是用每个物种、海洋和装备的分组平均值填充数据框,但对于那些只有一种比率的数据(例如鸟类),我希望它为数据分配一个“全局”平均值所有海洋。 (例如,南、东、西海洋中的鸟类平均值为 0.10。
我正在尝试以一种干净且可重复的方式来做到这一点。我认为这很简单,但似乎无法弄清楚!任何帮助将不胜感激!
I am trying to take grouped means for a dataset that has a lot of missing data, and for which SOME groups have 1 or 0 samples from which to derive means. I am trying to take a mean for each species within each ocean. However for species with only one value (or none) per ocean, I would like to use a "global" mean -- eg, a mean for that species across all oceans (rather than use only one value to take a "mean").
My data looks like this:
species<- c("turtle","turtle","turtle","turtle",
"turtle","turtle","turtle","turtle",
"shark", "shark", "shark","shark",
"shark", "shark", "shark","shark",
"bird")
gear<- c("t", "p", "t", "p",
"t", "p", "t", "p",
"t", "p", "t", "p",
"t", "p", "t", "p",
"t" )
ocean<- c("north", "south", "east", "west",
"north", "south", "east", "west",
"north", "south", "east", "west",
"north", "south", "east", "west",
"north")
rate<-c( 0.1 , 0.2, 0.3, 0.4,
0.2 , 0.2, 0.3, 0.4,
0.1 , 0.2, 0.3, 0.4,
0.2 , 0.2, 0.3, 0.4,
0.1 )
df<- as.data.frame(cbind(species, gear, region, rate))
df$rate<-as.numeric(df$rate)
db <- df %>%
group_by(species, gear, region) %>%
summarize(mean=mean(rate),
sd=sd(rate),
n = n()) %>%
mutate(se = sd/sqrt(n),
upper_rate = mean + 1.96*se,
lower_rate = mean - 1.96*se)
What I would like to do is populate a dataframe with grouped means for EACH species AND ocean and gear, but for those with only one rate (eg birds), I want it to assign a "global" mean to all oceans. (Eg the bird mean in the south, east, and west oceans would be 0.10.
I am looking for the output to look like this:
I am trying to do this in a clean and reproducible way. I think it's really simple but can't seem to figure it out! Any help would be greatly appreciated!!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
输出:
这是基于我的理解,即您希望最终使用一个使用列的所有组合的表
stell
,Gear> Gear
和region
。对于原始集合中不存在或组合仅在原始数据集中的组合中的组合,我们将分配
emane
和SD
由分组物种
。如上所示,我们仍然有
na
bird
'ssd
。这是因为我们至少需要2个数据点才能计算SD
。但是bird
在原始数据中只有一个行(数据点)。对于将来的示例,最好使用更简化的数据集。任何人,希望这会有所帮助。
Output:
This is based on my understanding that you would like to end up with a table with ALL combinations using columns
species
,gear
andregion
.For those combination(s) which did not exist in original set or where combination only had one row in original data set, we will assign
mean
andsd
grouped byspecies
.As you can see in output above we still have
NA
forbird
'ssd
. This is because we need at least 2 data points to calculatesd
. Butbird
only had one row (data point) in original data.For future examples it maybe better to use a more simplified data set. Anywho, hope this helps.