根据R中的其他变量值选择变量以选择变量的清洁方法

发布于 2025-01-17 19:02:05 字数 1317 浏览 0 评论 0原文

我正在使用具有以下结构的数据框：

ID     origin    value1    value2
1        A         100       50
1        A         200       100
2        B         10        2
2        B         150       30

因此每一行可以有不同的来源，我需要按 ID 进行一些计算，但我使用的值变量取决于来源变量。因此，如果origin == 'A'我应该使用value1，如果是BI应该使用value2。不考虑最后一个条件的我的代码看起来像这样：

df2 <- df %>% 
  group_by(ID) %>% 
  mutate(mean_value = mean(value1, na.rm = TRUE),
         sd_value = sd(value1, na.rm = TRUE),
         median_value = median(value1, na.rm = TRUE),
         cv_value = sd_value1/mean_value1,
         p25_value = quantile(value1, 0.25, na.rm = TRUE),
         p75_value = quantile(value1, 0.75, na.rm = TRUE))

我知道我可以在每一行添加一个 if_else 语句，但我认为我的代码会失去一些可读性（在我的实际数据中有多个来源，这使得这有点麻烦）。因此，我正在考虑创建一个自定义函数，可能使用 map 或者可能使用 group_by origin 的东西，但我没有找到实现这些选项的好方法。有什么想法吗？我想要的数据框将如下所示（为了简单起见，我将仅添加第一个 mutate 列）：

ID     origin    value1    value2 mean_value 
1        A         100       50      150
1        A         200       100     150
2        B         10        2       16
2        B         150       30      16

因此，第一个平均值是 (100 + 200) / 2 （来自 value1），第二个平均值是 < code>(30 + 2) / 2（来自 value2）。

谢谢！

原文

I'm working with a dataframe with the following structure:

ID     origin    value1    value2
1        A         100       50
1        A         200       100
2        B         10        2
2        B         150       30

So each row can have different origins and I need to make some calculations by ID, but the value variable I'm using depends on the origin variable. So if origin == 'A' I should use value1 and if it's B I should use value2. My code without taking this last condition into account looks like this:

df2 <- df %>% 
  group_by(ID) %>% 
  mutate(mean_value = mean(value1, na.rm = TRUE),
         sd_value = sd(value1, na.rm = TRUE),
         median_value = median(value1, na.rm = TRUE),
         cv_value = sd_value1/mean_value1,
         p25_value = quantile(value1, 0.25, na.rm = TRUE),
         p75_value = quantile(value1, 0.75, na.rm = TRUE))

I know I could add an if_else statement to each line, but I think my code will lose some readability (In my actual data there's multiple origins, which makes this a bit more cumbersome). So, I was thinking of creating a custom function, maybe using map or maybe something using group_by origin, but I'm not finding a good way to implement these options. Any ideas? My desired dataframe would look like this (I'll add only the first mutate column for simplicity):

ID     origin    value1    value2 mean_value 
1        A         100       50      150
1        A         200       100     150
2        B         10        2       16
2        B         150       30      16

So the first mean value is (100 + 200) / 2 (from value1) and the second is (30 + 2) / 2 (from value2).

Thanks!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

≈。彩虹 2025-01-24 19:02:05

我们可以先创建一个临时列，然后进行earne之后。这样，我们可能需要使用ifelse/case_when仅一次

library(dplyr)
df %>%
   mutate(valuenew = case_when(origin == 'A' ~ value1, 
    TRUE ~ value2)) %>% 
   group_by(ID) %>%
   mutate(mean_value = mean(valuenew, na.rm = TRUE), .keep = "unused") %>%
   ungroup

-output

# A tibble: 4 × 5
     ID origin value1 value2 mean_value
  <int> <chr>   <int>  <int>      <dbl>
1     1 A         100     50        150
2     1 A         200    100        150
3     2 B          10      2         16
4     2 B         150     30         16

数据

df <- structure(list(ID = c(1L, 1L, 2L, 2L), origin = c("A", "A", "B", 
"B"), value1 = c(100L, 200L, 10L, 150L), value2 = c(50L, 100L, 
2L, 30L)), class = "data.frame", row.names = c(NA, -4L))

We could create a temporary column first and then do the mean afterwards. In this way, we may need to use ifelse/case_when only once

library(dplyr)
df %>%
   mutate(valuenew = case_when(origin == 'A' ~ value1, 
    TRUE ~ value2)) %>% 
   group_by(ID) %>%
   mutate(mean_value = mean(valuenew, na.rm = TRUE), .keep = "unused") %>%
   ungroup

-output

# A tibble: 4 × 5
     ID origin value1 value2 mean_value
  <int> <chr>   <int>  <int>      <dbl>
1     1 A         100     50        150
2     1 A         200    100        150
3     2 B          10      2         16
4     2 B         150     30         16

data

df <- structure(list(ID = c(1L, 1L, 2L, 2L), origin = c("A", "A", "B", 
"B"), value1 = c(100L, 200L, 10L, 150L), value2 = c(50L, 100L, 
2L, 30L)), class = "data.frame", row.names = c(NA, -4L))

回复收藏 0 原文

~没有更多了~