r中的2个变量的索引匹配

发布于 2025-02-07 02:32:39 字数 2908 浏览 1 评论 0原文

我正在研究一个需要更新子集的数据集。记录ID和患者ID用作钥匙，住院治疗是二进制变量（是/否）。我需要在DAT2中使用住院的数据更新DAT1。我尝试了if_else，但是由于2个数据集的大小不同，但这无效。任何帮助将不胜感激。

pacman::p_load(tidyverse)

set.seed(123)

record_id <- rep(1:10, each = 2)
patient_id <- rep(1:2, 10)
hospitalized <- sample(0:1, 20, replace = TRUE)
dat1 <- as_tibble(cbind(record_id, patient_id, hospitalized))

record_id2 <- 1:10
patient_id2 <- sample(1:2, 10, replace = TRUE)
hospitalized2 <- sample(0:1, 10, replace = TRUE)
dat2 <- as_tibble(cbind(record_id2, patient_id2, hospitalized2))

dat1$hospitalized <- if_else(match(dat1$record_id, dat2$record_id2) &
                               match(dat1$patient_id, dat2$patient_id2), dat2$hospitalized2, dat1$hospitalized)
#> Error in `if_else()`:
#> ! `true` must be length 20 (length of `condition`) or one, not 10.

^由

Session info

sessionInfo()
#> R version 4.1.3 (2022-03-10)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19042)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_United States.1252 
#> [2] LC_CTYPE=English_United States.1252   
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] forcats_0.5.1   stringr_1.4.0   dplyr_1.0.8     purrr_0.3.4    
#> [5] readr_2.1.2     tidyr_1.2.0     tibble_3.1.6    ggplot2_3.3.5  
#> [9] tidyverse_1.3.1
#> 
#> loaded via a namespace (and not attached):
#>  [1] tidyselect_1.1.2 xfun_0.30        haven_2.4.3      colorspace_2.0-3
#>  [5] vctrs_0.3.8      generics_0.1.2   htmltools_0.5.2  yaml_2.3.5      
#>  [9] utf8_1.2.2       rlang_1.0.2      pillar_1.7.0     glue_1.6.2      
#> [13] withr_2.5.0      DBI_1.1.2        dbplyr_2.1.1     readxl_1.3.1    
#> [17] modelr_0.1.8     lifecycle_1.0.1  cellranger_1.1.0 munsell_0.5.0   
#> [21] gtable_0.3.0     rvest_1.0.2      evaluate_0.15    knitr_1.38      
#> [25] tzdb_0.2.0       fastmap_1.1.0    fansi_1.0.2      highr_0.9       
#> [29] Rcpp_1.0.8.3     broom_0.7.12     backports_1.4.1  scales_1.1.1    
#> [33] jsonlite_1.8.0   fs_1.5.2         hms_1.1.1        digest_0.6.29   
#> [37] stringi_1.7.6    grid_4.1.3       cli_3.2.0        tools_4.1.3     
#> [41] magrittr_2.0.2   pacman_0.5.1     crayon_1.5.1     pkgconfig_2.0.3 
#> [45] ellipsis_0.3.2   xml2_1.3.3       reprex_2.0.1     lubridate_1.8.0 
#> [49] assertthat_0.2.1 rmarkdown_2.13   httr_1.4.2       rstudioapi_0.13 
#> [53] R6_2.5.1         compiler_4.1.3

原文

I'm working on a dataset that needs a subset to be updated. Record ids and patient ids are used as keys and hospitalized is a binary variable (yes/no). I need to update dat1 with the hospitalized data in dat2. I tried if_else, but that didn't work because the 2 dataset are of different sizes. Any help would be greatly appreciated.

pacman::p_load(tidyverse)

set.seed(123)

record_id <- rep(1:10, each = 2)
patient_id <- rep(1:2, 10)
hospitalized <- sample(0:1, 20, replace = TRUE)
dat1 <- as_tibble(cbind(record_id, patient_id, hospitalized))

record_id2 <- 1:10
patient_id2 <- sample(1:2, 10, replace = TRUE)
hospitalized2 <- sample(0:1, 10, replace = TRUE)
dat2 <- as_tibble(cbind(record_id2, patient_id2, hospitalized2))

dat1$hospitalized <- if_else(match(dat1$record_id, dat2$record_id2) &
                               match(dat1$patient_id, dat2$patient_id2), dat2$hospitalized2, dat1$hospitalized)
#> Error in `if_else()`:
#> ! `true` must be length 20 (length of `condition`) or one, not 10.

^{Created on 2022-06-13 by the reprex package (v2.0.1)}

Session info

sessionInfo()
#> R version 4.1.3 (2022-03-10)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19042)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_United States.1252 
#> [2] LC_CTYPE=English_United States.1252   
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] forcats_0.5.1   stringr_1.4.0   dplyr_1.0.8     purrr_0.3.4    
#> [5] readr_2.1.2     tidyr_1.2.0     tibble_3.1.6    ggplot2_3.3.5  
#> [9] tidyverse_1.3.1
#> 
#> loaded via a namespace (and not attached):
#>  [1] tidyselect_1.1.2 xfun_0.30        haven_2.4.3      colorspace_2.0-3
#>  [5] vctrs_0.3.8      generics_0.1.2   htmltools_0.5.2  yaml_2.3.5      
#>  [9] utf8_1.2.2       rlang_1.0.2      pillar_1.7.0     glue_1.6.2      
#> [13] withr_2.5.0      DBI_1.1.2        dbplyr_2.1.1     readxl_1.3.1    
#> [17] modelr_0.1.8     lifecycle_1.0.1  cellranger_1.1.0 munsell_0.5.0   
#> [21] gtable_0.3.0     rvest_1.0.2      evaluate_0.15    knitr_1.38      
#> [25] tzdb_0.2.0       fastmap_1.1.0    fansi_1.0.2      highr_0.9       
#> [29] Rcpp_1.0.8.3     broom_0.7.12     backports_1.4.1  scales_1.1.1    
#> [33] jsonlite_1.8.0   fs_1.5.2         hms_1.1.1        digest_0.6.29   
#> [37] stringi_1.7.6    grid_4.1.3       cli_3.2.0        tools_4.1.3     
#> [41] magrittr_2.0.2   pacman_0.5.1     crayon_1.5.1     pkgconfig_2.0.3 
#> [45] ellipsis_0.3.2   xml2_1.3.3       reprex_2.0.1     lubridate_1.8.0 
#> [49] assertthat_0.2.1 rmarkdown_2.13   httr_1.4.2       rstudioapi_0.13 
#> [53] R6_2.5.1         compiler_4.1.3

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

能怎样 2025-02-14 02:32:39

而不是使用匹配，请执行left_join 'record_id'列和'detter__id'列，然后cocecce要从“住院2”中更新“住院”

library(dplyr)
dat1 <- dat1 %>%
    left_join(dat2, by = c('record_id' = 'record_id2', 
                      patient_id = 'patient_id2')) %>% 
     mutate(hospitalized = coalesce(hospitalized2, hospitalized),
       .keep = 'unused')

ifelse/if_else/case_when - 要求所有参数的长度相同，但是dat2 $ hosidenciped2 的长度较小。另外，＆amp; on Match> Match输出不是正确的选项-Zero值

> match(dat1$record_id, dat2$record_id2) 
 [1]  1  1  2  2  3  3  4  4  5  5  6  6  7  7  8  8  9  9 10 10
>  match(dat1$patient_id, dat2$patient_id2)
 [1] 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2
> match(dat1$patient_id, dat2$patient_id2) &  match(dat1$patient_id, dat2$patient_id2)
 [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

Instead of using match, do a left_join by the 'record_id' columns and the 'patient_id' columns and then coalesce to update the 'hospitalized' from 'hospitalized2'

library(dplyr)
dat1 <- dat1 %>%
    left_join(dat2, by = c('record_id' = 'record_id2', 
                      patient_id = 'patient_id2')) %>% 
     mutate(hospitalized = coalesce(hospitalized2, hospitalized),
       .keep = 'unused')

ifelse/if_else/case_when - requires all arguments to be same length, but the dat2$hospitalized2 is having lesser length. In addition the & on match output is not a correct option because & will return TRUE for all non-zero values

> match(dat1$record_id, dat2$record_id2) 
 [1]  1  1  2  2  3  3  4  4  5  5  6  6  7  7  8  8  9  9 10 10
>  match(dat1$patient_id, dat2$patient_id2)
 [1] 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2
> match(dat1$patient_id, dat2$patient_id2) &  match(dat1$patient_id, dat2$patient_id2)
 [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

回复收藏 0 原文

~没有更多了~