r中的2个变量的索引匹配
我正在研究一个需要更新子集的数据集。记录ID和患者ID用作钥匙,住院治疗是二进制变量(是/否)。我需要在DAT2中使用住院的数据更新DAT1。我尝试了if_else
,但是由于2个数据集的大小不同,但这无效。任何帮助将不胜感激。
pacman::p_load(tidyverse)
set.seed(123)
record_id <- rep(1:10, each = 2)
patient_id <- rep(1:2, 10)
hospitalized <- sample(0:1, 20, replace = TRUE)
dat1 <- as_tibble(cbind(record_id, patient_id, hospitalized))
record_id2 <- 1:10
patient_id2 <- sample(1:2, 10, replace = TRUE)
hospitalized2 <- sample(0:1, 10, replace = TRUE)
dat2 <- as_tibble(cbind(record_id2, patient_id2, hospitalized2))
dat1$hospitalized <- if_else(match(dat1$record_id, dat2$record_id2) &
match(dat1$patient_id, dat2$patient_id2), dat2$hospitalized2, dat1$hospitalized)
#> Error in `if_else()`:
#> ! `true` must be length 20 (length of `condition`) or one, not 10.
由
Session infosessionInfo()
#> R version 4.1.3 (2022-03-10)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19042)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=English_United States.1252
#> [2] LC_CTYPE=English_United States.1252
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United States.1252
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.8 purrr_0.3.4
#> [5] readr_2.1.2 tidyr_1.2.0 tibble_3.1.6 ggplot2_3.3.5
#> [9] tidyverse_1.3.1
#>
#> loaded via a namespace (and not attached):
#> [1] tidyselect_1.1.2 xfun_0.30 haven_2.4.3 colorspace_2.0-3
#> [5] vctrs_0.3.8 generics_0.1.2 htmltools_0.5.2 yaml_2.3.5
#> [9] utf8_1.2.2 rlang_1.0.2 pillar_1.7.0 glue_1.6.2
#> [13] withr_2.5.0 DBI_1.1.2 dbplyr_2.1.1 readxl_1.3.1
#> [17] modelr_0.1.8 lifecycle_1.0.1 cellranger_1.1.0 munsell_0.5.0
#> [21] gtable_0.3.0 rvest_1.0.2 evaluate_0.15 knitr_1.38
#> [25] tzdb_0.2.0 fastmap_1.1.0 fansi_1.0.2 highr_0.9
#> [29] Rcpp_1.0.8.3 broom_0.7.12 backports_1.4.1 scales_1.1.1
#> [33] jsonlite_1.8.0 fs_1.5.2 hms_1.1.1 digest_0.6.29
#> [37] stringi_1.7.6 grid_4.1.3 cli_3.2.0 tools_4.1.3
#> [41] magrittr_2.0.2 pacman_0.5.1 crayon_1.5.1 pkgconfig_2.0.3
#> [45] ellipsis_0.3.2 xml2_1.3.3 reprex_2.0.1 lubridate_1.8.0
#> [49] assertthat_0.2.1 rmarkdown_2.13 httr_1.4.2 rstudioapi_0.13
#> [53] R6_2.5.1 compiler_4.1.3
I'm working on a dataset that needs a subset to be updated. Record ids and patient ids are used as keys and hospitalized is a binary variable (yes/no). I need to update dat1 with the hospitalized data in dat2. I tried if_else
, but that didn't work because the 2 dataset are of different sizes. Any help would be greatly appreciated.
pacman::p_load(tidyverse)
set.seed(123)
record_id <- rep(1:10, each = 2)
patient_id <- rep(1:2, 10)
hospitalized <- sample(0:1, 20, replace = TRUE)
dat1 <- as_tibble(cbind(record_id, patient_id, hospitalized))
record_id2 <- 1:10
patient_id2 <- sample(1:2, 10, replace = TRUE)
hospitalized2 <- sample(0:1, 10, replace = TRUE)
dat2 <- as_tibble(cbind(record_id2, patient_id2, hospitalized2))
dat1$hospitalized <- if_else(match(dat1$record_id, dat2$record_id2) &
match(dat1$patient_id, dat2$patient_id2), dat2$hospitalized2, dat1$hospitalized)
#> Error in `if_else()`:
#> ! `true` must be length 20 (length of `condition`) or one, not 10.
Created on 2022-06-13 by the reprex package (v2.0.1)
Session info
sessionInfo()
#> R version 4.1.3 (2022-03-10)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19042)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=English_United States.1252
#> [2] LC_CTYPE=English_United States.1252
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United States.1252
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.8 purrr_0.3.4
#> [5] readr_2.1.2 tidyr_1.2.0 tibble_3.1.6 ggplot2_3.3.5
#> [9] tidyverse_1.3.1
#>
#> loaded via a namespace (and not attached):
#> [1] tidyselect_1.1.2 xfun_0.30 haven_2.4.3 colorspace_2.0-3
#> [5] vctrs_0.3.8 generics_0.1.2 htmltools_0.5.2 yaml_2.3.5
#> [9] utf8_1.2.2 rlang_1.0.2 pillar_1.7.0 glue_1.6.2
#> [13] withr_2.5.0 DBI_1.1.2 dbplyr_2.1.1 readxl_1.3.1
#> [17] modelr_0.1.8 lifecycle_1.0.1 cellranger_1.1.0 munsell_0.5.0
#> [21] gtable_0.3.0 rvest_1.0.2 evaluate_0.15 knitr_1.38
#> [25] tzdb_0.2.0 fastmap_1.1.0 fansi_1.0.2 highr_0.9
#> [29] Rcpp_1.0.8.3 broom_0.7.12 backports_1.4.1 scales_1.1.1
#> [33] jsonlite_1.8.0 fs_1.5.2 hms_1.1.1 digest_0.6.29
#> [37] stringi_1.7.6 grid_4.1.3 cli_3.2.0 tools_4.1.3
#> [41] magrittr_2.0.2 pacman_0.5.1 crayon_1.5.1 pkgconfig_2.0.3
#> [45] ellipsis_0.3.2 xml2_1.3.3 reprex_2.0.1 lubridate_1.8.0
#> [49] assertthat_0.2.1 rmarkdown_2.13 httr_1.4.2 rstudioapi_0.13
#> [53] R6_2.5.1 compiler_4.1.3
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
而不是使用
匹配
,请执行left_join
'record_id'列和'detter__id'列,然后
cocecce
要从“住院2”中更新“住院”ifelse/if_else/case_when
- 要求所有参数的长度相同,但是dat2 $ hosidenciped2 的长度较小。另外,
&amp;
onMatch> Match
输出不是正确的选项-Zero值Instead of using
match
, do aleft_join
by
the 'record_id' columns and the 'patient_id' columns and thencoalesce
to update the 'hospitalized' from 'hospitalized2'ifelse/if_else/case_when
- requires all arguments to be same length, but thedat2$hospitalized2
is having lesser length. In addition the&
onmatch
output is not a correct option because&
will returnTRUE
for all non-zero values