r conditional-statements multiple-columns

是否有不同列的条件值的R函数？

发布于 2025-01-21 16:43:23 字数 498 浏览 0 评论 0原文

假设您有一个看起来像这样的数据框架：

df <- tibble(PatientID = c(1,2,3,4,5),
         Treat1 = c("R", "O", "C", "O", "C"),
         Treat2 = c("O", "R", "R", NA, "O"),
         Treat3 = c("C", NA, "O", NA, "R"),
         Treat4 = c("H", NA, "H", NA, "H"),
         Treat5 = c("H", NA, NA, NA, "H"))

Treat 1：Treat5是患者所拥有的不同治疗方法。我希望创建一个新的变量“化学疗法”，其中1个，是0，0否基于患者是否接受过“ C”的治疗。

我一直在使用if_else（），但是由于我的实际数据集中有10个不同的处理变量，而且我想每个治疗列创建这样的列，我想知道我是否可以在不写这么长时间的情况下做到这一点。有一个更简单的方法吗？

原文

Suppose you have a dataframe that looks something like this:

df <- tibble(PatientID = c(1,2,3,4,5),
         Treat1 = c("R", "O", "C", "O", "C"),
         Treat2 = c("O", "R", "R", NA, "O"),
         Treat3 = c("C", NA, "O", NA, "R"),
         Treat4 = c("H", NA, "H", NA, "H"),
         Treat5 = c("H", NA, NA, NA, "H"))

Treat 1:Treat5 are different treatments that a patient has had. I'm looking to create a new variable "Chemo" with 1 for yes, 0 for no based on whether a patient has had treatment "C".

I've been using if_else(), but as I have 10 different treatment variables in my actual dataset, and I would like to create such a column per treatment, i wonder if I can do it without writing such long if statements. Is there an easier way to do this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

瀟灑尐姊 2025-01-28 16:43:23

使用if_any在start_with'处理'的列上循环，创建一个使用in％ in％ - if_any 返回是/false如果选择的任何列具有特定行的“ C”，则逻辑将使用+转换为二进制（或as.integer））

library(dplyr)
df <- df %>% 
   mutate(Chemo = +(if_any(starts_with("Treat"), ~ .x %in% "C")))

- 输出

df
# A tibble: 5 × 7
  PatientID Treat1 Treat2 Treat3 Treat4 Treat5 Chemo
      <dbl> <chr>  <chr>  <chr>  <chr>  <chr>  <int>
1         1 R      O      C      H      H          1
2         2 O      R      <NA>   <NA>   <NA>       0
3         3 C      R      O      H      <NA>       1
4         4 O      <NA>   <NA>   <NA>   <NA>       0
5         5 C      O      R      H      H          1

或使用base r带有rowsums

df$Chemo <- +(rowSums(df[startsWith(names(df), "Treat")] == "C", 
      na.rm = TRUE) > 0)

Use if_any to loop over the columns that starts_with 'Treat', create a logical vector with %in% - if_any returns TRUE/FALSE if any of the columns selected have 'C' for a particular row, the logical is converted to binary with + (or as.integer)

library(dplyr)
df <- df %>% 
   mutate(Chemo = +(if_any(starts_with("Treat"), ~ .x %in% "C")))

-output

df
# A tibble: 5 × 7
  PatientID Treat1 Treat2 Treat3 Treat4 Treat5 Chemo
      <dbl> <chr>  <chr>  <chr>  <chr>  <chr>  <int>
1         1 R      O      C      H      H          1
2         2 O      R      <NA>   <NA>   <NA>       0
3         3 C      R      O      H      <NA>       1
4         4 O      <NA>   <NA>   <NA>   <NA>       0
5         5 C      O      R      H      H          1

Or using base R with rowSums

df$Chemo <- +(rowSums(df[startsWith(names(df), "Treat")] == "C", 
      na.rm = TRUE) > 0)

回复收藏 0 原文

ゝ杯具 2025-01-28 16:43:23

使用str_detect和的另一个选项，以确定c是否发生在每一行的任何一个c列中。 +将逻辑转换为整数。

library(tidyverse)

df %>%
  rowwise() %>%
  mutate(Chemo = +any(str_detect(c_across(starts_with("Treat")), "C"), na.rm = TRUE)) %>%
  ungroup

输出

  PatientID Treat1 Treat2 Treat3 Treat4 Treat5 Chemo
      <dbl> <chr>  <chr>  <chr>  <chr>  <chr>  <int>
1         1 R      O      C      H      H          1
2         2 O      R      NA     NA     NA         0
3         3 C      R      O      H      NA         1
4         4 O      NA     NA     NA     NA         0
5         5 C      O      R      H      H          1

Another option using str_detect and any to determine if C occurs in any of the Treat columns for each row. The + converts the logical to an integer.

library(tidyverse)

df %>%
  rowwise() %>%
  mutate(Chemo = +any(str_detect(c_across(starts_with("Treat")), "C"), na.rm = TRUE)) %>%
  ungroup

Output

  PatientID Treat1 Treat2 Treat3 Treat4 Treat5 Chemo
      <dbl> <chr>  <chr>  <chr>  <chr>  <chr>  <int>
1         1 R      O      C      H      H          1
2         2 O      R      NA     NA     NA         0
3         3 C      R      O      H      NA         1
4         4 O      NA     NA     NA     NA         0
5         5 C      O      R      H      H          1

回复收藏 0 原文

凉栀 2025-01-28 16:43:23

替代dplyr方式：

library(dplyr)

df %>% 
  mutate(across(starts_with("Treat"), ~case_when(.=="C" ~1,
                                                 TRUE ~0), .names = 'new_{col}')) %>%
  mutate(Chemo = rowSums(select(., starts_with("new")))) %>% 
  select(-starts_with("new"))

  PatientID Treat1 Treat2 Treat3 Treat4 Treat5 Chemo
      <dbl> <chr>  <chr>  <chr>  <chr>  <chr>  <dbl>
1         1 R      O      C      H      H          1
2         2 O      R      NA     NA     NA         0
3         3 C      R      O      H      NA         1
4         4 O      NA     NA     NA     NA         0
5         5 C      O      R      H      H          1

An alternative dplyr way:

library(dplyr)

df %>% 
  mutate(across(starts_with("Treat"), ~case_when(.=="C" ~1,
                                                 TRUE ~0), .names = 'new_{col}')) %>%
  mutate(Chemo = rowSums(select(., starts_with("new")))) %>% 
  select(-starts_with("new"))

  PatientID Treat1 Treat2 Treat3 Treat4 Treat5 Chemo
      <dbl> <chr>  <chr>  <chr>  <chr>  <chr>  <dbl>
1         1 R      O      C      H      H          1
2         2 O      R      NA     NA     NA         0
3         3 C      R      O      H      NA         1
4         4 O      NA     NA     NA     NA         0
5         5 C      O      R      H      H          1

回复收藏 0 原文

~没有更多了~