根据R中的其他变量值选择变量以选择变量的清洁方法

发布于 2025-01-17 19:02:05 字数 1317 浏览 0 评论 0原文

我正在使用具有以下结构的数据框:

ID     origin    value1    value2
1        A         100       50
1        A         200       100
2        B         10        2
2        B         150       30

因此每一行可以有不同的来源,我需要按 ID 进行一些计算,但我使用的值变量取决于来源变量。因此,如果origin == 'A'我应该使用value1,如果是BI应该使用value2。不考虑最后一个条件的我的代码看起来像这样:

df2 <- df %>% 
  group_by(ID) %>% 
  mutate(mean_value = mean(value1, na.rm = TRUE),
         sd_value = sd(value1, na.rm = TRUE),
         median_value = median(value1, na.rm = TRUE),
         cv_value = sd_value1/mean_value1,
         p25_value = quantile(value1, 0.25, na.rm = TRUE),
         p75_value = quantile(value1, 0.75, na.rm = TRUE)) 

我知道我可以在每一行添加一个 if_else 语句,但我认为我的代码会失去一些可读性(在我的实际数据中有多个来源,这使得这有点麻烦)。因此,我正在考虑创建一个自定义函数,可能使用 map 或者可能使用 group_by origin 的东西,但我没有找到实现这些选项的好方法。有什么想法吗?我想要的数据框将如下所示(为了简单起见,我将仅添加第一个 mutate 列):

ID     origin    value1    value2 mean_value 
1        A         100       50      150
1        A         200       100     150
2        B         10        2       16
2        B         150       30      16

因此,第一个平均值是 (100 + 200) / 2 (来自 value1),第二个平均值是 < code>(30 + 2) / 2(来自 value2)。

谢谢!

I'm working with a dataframe with the following structure:

ID     origin    value1    value2
1        A         100       50
1        A         200       100
2        B         10        2
2        B         150       30

So each row can have different origins and I need to make some calculations by ID, but the value variable I'm using depends on the origin variable. So if origin == 'A' I should use value1 and if it's B I should use value2. My code without taking this last condition into account looks like this:

df2 <- df %>% 
  group_by(ID) %>% 
  mutate(mean_value = mean(value1, na.rm = TRUE),
         sd_value = sd(value1, na.rm = TRUE),
         median_value = median(value1, na.rm = TRUE),
         cv_value = sd_value1/mean_value1,
         p25_value = quantile(value1, 0.25, na.rm = TRUE),
         p75_value = quantile(value1, 0.75, na.rm = TRUE)) 

I know I could add an if_else statement to each line, but I think my code will lose some readability (In my actual data there's multiple origins, which makes this a bit more cumbersome). So, I was thinking of creating a custom function, maybe using map or maybe something using group_by origin, but I'm not finding a good way to implement these options. Any ideas? My desired dataframe would look like this (I'll add only the first mutate column for simplicity):

ID     origin    value1    value2 mean_value 
1        A         100       50      150
1        A         200       100     150
2        B         10        2       16
2        B         150       30      16

So the first mean value is (100 + 200) / 2 (from value1) and the second is (30 + 2) / 2 (from value2).

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

≈。彩虹 2025-01-24 19:02:05

我们可以先创建一个临时列,然后进行earne之后。这样,我们可能需要使用ifelse/case_when仅一次

library(dplyr)
df %>%
   mutate(valuenew = case_when(origin == 'A' ~ value1, 
    TRUE ~ value2)) %>% 
   group_by(ID) %>%
   mutate(mean_value = mean(valuenew, na.rm = TRUE), .keep = "unused") %>%
   ungroup

-output

# A tibble: 4 × 5
     ID origin value1 value2 mean_value
  <int> <chr>   <int>  <int>      <dbl>
1     1 A         100     50        150
2     1 A         200    100        150
3     2 B          10      2         16
4     2 B         150     30         16

数据

df <- structure(list(ID = c(1L, 1L, 2L, 2L), origin = c("A", "A", "B", 
"B"), value1 = c(100L, 200L, 10L, 150L), value2 = c(50L, 100L, 
2L, 30L)), class = "data.frame", row.names = c(NA, -4L))

We could create a temporary column first and then do the mean afterwards. In this way, we may need to use ifelse/case_when only once

library(dplyr)
df %>%
   mutate(valuenew = case_when(origin == 'A' ~ value1, 
    TRUE ~ value2)) %>% 
   group_by(ID) %>%
   mutate(mean_value = mean(valuenew, na.rm = TRUE), .keep = "unused") %>%
   ungroup

-output

# A tibble: 4 × 5
     ID origin value1 value2 mean_value
  <int> <chr>   <int>  <int>      <dbl>
1     1 A         100     50        150
2     1 A         200    100        150
3     2 B          10      2         16
4     2 B         150     30         16

data

df <- structure(list(ID = c(1L, 1L, 2L, 2L), origin = c("A", "A", "B", 
"B"), value1 = c(100L, 200L, 10L, 150L), value2 = c(50L, 100L, 
2L, 30L)), class = "data.frame", row.names = c(NA, -4L))
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文