是否有不同列的条件值的R函数?

发布于 2025-01-21 16:43:23 字数 498 浏览 0 评论 0原文

假设您有一个看起来像这样的数据框架:

df <- tibble(PatientID = c(1,2,3,4,5),
         Treat1 = c("R", "O", "C", "O", "C"),
         Treat2 = c("O", "R", "R", NA, "O"),
         Treat3 = c("C", NA, "O", NA, "R"),
         Treat4 = c("H", NA, "H", NA, "H"),
         Treat5 = c("H", NA, NA, NA, "H"))

Treat 1:Treat5是患者所拥有的不同治疗方法。我希望创建一个新的变量“化学疗法”,其中1个,是0,0否基于患者是否接受过“ C”的治疗。

我一直在使用if_else(),但是由于我的实际数据集中有10个不同的处理变量,而且我想每个治疗列创建这样的列,我想知道我是否可以在不写这么长时间的情况下做到这一点。有一个更简单的方法吗?

Suppose you have a dataframe that looks something like this:

df <- tibble(PatientID = c(1,2,3,4,5),
         Treat1 = c("R", "O", "C", "O", "C"),
         Treat2 = c("O", "R", "R", NA, "O"),
         Treat3 = c("C", NA, "O", NA, "R"),
         Treat4 = c("H", NA, "H", NA, "H"),
         Treat5 = c("H", NA, NA, NA, "H"))

Treat 1:Treat5 are different treatments that a patient has had. I'm looking to create a new variable "Chemo" with 1 for yes, 0 for no based on whether a patient has had treatment "C".

I've been using if_else(), but as I have 10 different treatment variables in my actual dataset, and I would like to create such a column per treatment, i wonder if I can do it without writing such long if statements. Is there an easier way to do this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

瀟灑尐姊 2025-01-28 16:43:23

使用if_anystart_with'处理'的列上循环,创建一个使用in% in% - if_any 返回是/false如果选择的任何列具有特定行的“ C”,则逻辑将使用+转换为二进制(或as.integer)

library(dplyr)
df <- df %>% 
   mutate(Chemo = +(if_any(starts_with("Treat"), ~ .x %in% "C")))

- 输出

df
# A tibble: 5 × 7
  PatientID Treat1 Treat2 Treat3 Treat4 Treat5 Chemo
      <dbl> <chr>  <chr>  <chr>  <chr>  <chr>  <int>
1         1 R      O      C      H      H          1
2         2 O      R      <NA>   <NA>   <NA>       0
3         3 C      R      O      H      <NA>       1
4         4 O      <NA>   <NA>   <NA>   <NA>       0
5         5 C      O      R      H      H          1

或使用base r带有rowsums

df$Chemo <- +(rowSums(df[startsWith(names(df), "Treat")] == "C", 
      na.rm = TRUE) > 0)

Use if_any to loop over the columns that starts_with 'Treat', create a logical vector with %in% - if_any returns TRUE/FALSE if any of the columns selected have 'C' for a particular row, the logical is converted to binary with + (or as.integer)

library(dplyr)
df <- df %>% 
   mutate(Chemo = +(if_any(starts_with("Treat"), ~ .x %in% "C")))

-output

df
# A tibble: 5 × 7
  PatientID Treat1 Treat2 Treat3 Treat4 Treat5 Chemo
      <dbl> <chr>  <chr>  <chr>  <chr>  <chr>  <int>
1         1 R      O      C      H      H          1
2         2 O      R      <NA>   <NA>   <NA>       0
3         3 C      R      O      H      <NA>       1
4         4 O      <NA>   <NA>   <NA>   <NA>       0
5         5 C      O      R      H      H          1

Or using base R with rowSums

df$Chemo <- +(rowSums(df[startsWith(names(df), "Treat")] == "C", 
      na.rm = TRUE) > 0)
ゝ杯具 2025-01-28 16:43:23

使用str_detect的另一个选项,以确定c是否发生在每一行的任何一个c列中。 +将逻辑转换为整数。

library(tidyverse)

df %>%
  rowwise() %>%
  mutate(Chemo = +any(str_detect(c_across(starts_with("Treat")), "C"), na.rm = TRUE)) %>%
  ungroup

输出

  PatientID Treat1 Treat2 Treat3 Treat4 Treat5 Chemo
      <dbl> <chr>  <chr>  <chr>  <chr>  <chr>  <int>
1         1 R      O      C      H      H          1
2         2 O      R      NA     NA     NA         0
3         3 C      R      O      H      NA         1
4         4 O      NA     NA     NA     NA         0
5         5 C      O      R      H      H          1

Another option using str_detect and any to determine if C occurs in any of the Treat columns for each row. The + converts the logical to an integer.

library(tidyverse)

df %>%
  rowwise() %>%
  mutate(Chemo = +any(str_detect(c_across(starts_with("Treat")), "C"), na.rm = TRUE)) %>%
  ungroup

Output

  PatientID Treat1 Treat2 Treat3 Treat4 Treat5 Chemo
      <dbl> <chr>  <chr>  <chr>  <chr>  <chr>  <int>
1         1 R      O      C      H      H          1
2         2 O      R      NA     NA     NA         0
3         3 C      R      O      H      NA         1
4         4 O      NA     NA     NA     NA         0
5         5 C      O      R      H      H          1
凉栀 2025-01-28 16:43:23

替代dplyr方式:

library(dplyr)

df %>% 
  mutate(across(starts_with("Treat"), ~case_when(.=="C" ~1,
                                                 TRUE ~0), .names = 'new_{col}')) %>%
  mutate(Chemo = rowSums(select(., starts_with("new")))) %>% 
  select(-starts_with("new"))
  PatientID Treat1 Treat2 Treat3 Treat4 Treat5 Chemo
      <dbl> <chr>  <chr>  <chr>  <chr>  <chr>  <dbl>
1         1 R      O      C      H      H          1
2         2 O      R      NA     NA     NA         0
3         3 C      R      O      H      NA         1
4         4 O      NA     NA     NA     NA         0
5         5 C      O      R      H      H          1

An alternative dplyr way:

library(dplyr)

df %>% 
  mutate(across(starts_with("Treat"), ~case_when(.=="C" ~1,
                                                 TRUE ~0), .names = 'new_{col}')) %>%
  mutate(Chemo = rowSums(select(., starts_with("new")))) %>% 
  select(-starts_with("new"))
  PatientID Treat1 Treat2 Treat3 Treat4 Treat5 Chemo
      <dbl> <chr>  <chr>  <chr>  <chr>  <chr>  <dbl>
1         1 R      O      C      H      H          1
2         2 O      R      NA     NA     NA         0
3         3 C      R      O      H      NA         1
4         4 O      NA     NA     NA     NA         0
5         5 C      O      R      H      H          1
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文