根据R中的其他变量值选择变量以选择变量的清洁方法
我正在使用具有以下结构的数据框:
ID origin value1 value2
1 A 100 50
1 A 200 100
2 B 10 2
2 B 150 30
因此每一行可以有不同的来源,我需要按 ID 进行一些计算,但我使用的值变量取决于来源变量。因此,如果origin == 'A'
我应该使用value1
,如果是BI应该使用value2
。不考虑最后一个条件的我的代码看起来像这样:
df2 <- df %>%
group_by(ID) %>%
mutate(mean_value = mean(value1, na.rm = TRUE),
sd_value = sd(value1, na.rm = TRUE),
median_value = median(value1, na.rm = TRUE),
cv_value = sd_value1/mean_value1,
p25_value = quantile(value1, 0.25, na.rm = TRUE),
p75_value = quantile(value1, 0.75, na.rm = TRUE))
我知道我可以在每一行添加一个 if_else
语句,但我认为我的代码会失去一些可读性(在我的实际数据中有多个来源,这使得这有点麻烦)。因此,我正在考虑创建一个自定义函数,可能使用 map
或者可能使用 group_by origin 的东西,但我没有找到实现这些选项的好方法。有什么想法吗?我想要的数据框将如下所示(为了简单起见,我将仅添加第一个 mutate 列):
ID origin value1 value2 mean_value
1 A 100 50 150
1 A 200 100 150
2 B 10 2 16
2 B 150 30 16
因此,第一个平均值是 (100 + 200) / 2
(来自 value1),第二个平均值是 < code>(30 + 2) / 2(来自 value2)。
谢谢!
I'm working with a dataframe with the following structure:
ID origin value1 value2
1 A 100 50
1 A 200 100
2 B 10 2
2 B 150 30
So each row can have different origins and I need to make some calculations by ID, but the value variable I'm using depends on the origin variable. So if origin == 'A'
I should use value1
and if it's B I should use value2
. My code without taking this last condition into account looks like this:
df2 <- df %>%
group_by(ID) %>%
mutate(mean_value = mean(value1, na.rm = TRUE),
sd_value = sd(value1, na.rm = TRUE),
median_value = median(value1, na.rm = TRUE),
cv_value = sd_value1/mean_value1,
p25_value = quantile(value1, 0.25, na.rm = TRUE),
p75_value = quantile(value1, 0.75, na.rm = TRUE))
I know I could add an if_else
statement to each line, but I think my code will lose some readability (In my actual data there's multiple origins, which makes this a bit more cumbersome). So, I was thinking of creating a custom function, maybe using map
or maybe something using group_by origin, but I'm not finding a good way to implement these options. Any ideas? My desired dataframe would look like this (I'll add only the first mutate column for simplicity):
ID origin value1 value2 mean_value
1 A 100 50 150
1 A 200 100 150
2 B 10 2 16
2 B 150 30 16
So the first mean value is (100 + 200) / 2
(from value1) and the second is (30 + 2) / 2
(from value2).
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我们可以先创建一个临时列,然后进行
earne
之后。这样,我们可能需要使用ifelse/case_when
仅一次-output
数据
We could create a temporary column first and then do the
mean
afterwards. In this way, we may need to useifelse/case_when
only once-output
data