根据R中的多个ID组合行值

发布于 2025-02-14 01:22:47 字数 5264 浏览 3 评论 0原文

我有dataFrame df ,其中显示了一个样本。我需要通过 study_id lab_study_dt lab_study_time 组合行,并在实验室中具有Na和非NA值,并在检测限制由这三个关键变量分组的同一行。

我已经尝试了小组,并摘要来做这件事,但没有得到我想要的结果。

df %>%
    group_by(study_id,lab_study_dt,lab_study_time) %>%
    summarise_all(funs(toString(na.omit(.))))
study_idlab_study_dtlab_study_timelab_polyslab_lymphslab_monolab_eoslab_basoslab_bandslab_wbc_countlab_rbc_countprotein_limit_of_detectionlab_proteingluc_limit_of_detectionlab_glucose
Jane8/13/20110:12NANANANANANA1NANANANaNa
Jane8/13/20110:12Nananananana nanana nananana nana na
jane3/4/201315:27Na NaNa Na Na Na Na NaNaNa NaNaNa Na NaNaNaNa NaNaNa NaNa
Jane3/4/201315:27NANANANANANa NaNaNA NANA NA NA NANANANA NA
JANE3/4/201315:278Na NaNA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NANa Na NaNaNa NaNa NaNa NaNA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NANa Na NaNana na
Jane3/4/201315:27NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NANA NA NA NA NA NA NA NA NA NAnanana1na nananananajane
3/4/201315:27nananana na nananana na na na na nananananajane3/4/2013
na na na na na15:27nanana nana na na na na nanana nana nana nana156naNaNa
George4/20/202121:18NANA NA NANANANANANANA NANA NANA NA NA NA NA NANA NANA
GEORGE4/20/202121:18NANA NANA NA NANA NA NANA NA NA NANA NA NANA NAGEORGENA NA NANANA NANA
NA NA NA NA4/23/202115:278NANANANANANANA NANANA NA23
GEORGE4/23/202112:27NANANANA NANA NA NANA NA NANA NANA NA NA1> 10NANA NA
GEORGE4/23/202112 :27nananana149na31nananana nana
george4/23/202112:27na nana na na na nanana na na na na na na na nana nanana na nana na na na na na na na na na na nananana naa

na na na na na na na na na na na na na na na na na 根据唯一的study_id,研究日期和研究时间,以及该行沿线的所有相关值。 So for example, entry Jane - 3/4/2013 - 15:27 would look like below:

study_idlab_study_dtlab_study_timelab_polyslab_lymphslab_monolab_eoslab_basoslab_bandslab_wbc_countlab_rbc_countprotein_limit_of_detectionlab_proteingluc_limit_of_detectionlab_glucose
jane3/4/201315:2786031na149na1000156nananana

谢谢

I have dataframe df of which a sample is shown below. I need to combine the rows by study_id, lab_study_dt, and lab_study_time, and have NA and non-NA values across labs and detection limits on the same row grouped by those three key variables.

I've tried group by and summarise_all to do this but didn't get the outcome I'm looking for.

df %>%
    group_by(study_id,lab_study_dt,lab_study_time) %>%
    summarise_all(funs(toString(na.omit(.))))
study_idlab_study_dtlab_study_timelab_polyslab_lymphslab_monolab_eoslab_basoslab_bandslab_wbc_countlab_rbc_countprotein_limit_of_detectionlab_proteingluc_limit_of_detectionlab_glucose
Jane8/13/20110:12NANANANANANA1NANANANANA
Jane8/13/20110:12NANANANANANANANANANANANA
Jane3/4/201315:27NA60NANANANANANANANANANA
Jane3/4/201315:27NANANANANANANA10000NANANANA
Jane3/4/201315:278NANANANANANANANANANANA
Jane3/4/201315:27NANANANANA1NANANANANANA
Jane3/4/201315:27NANANANANANA149NANANANANA
Jane3/4/201315:27NANA31NANANANANA156NANA
George4/20/202121:18NA60NANANANANANANANANANA
George4/20/202121:18NANANANANANANA10000NANANANA
George4/23/202115:278NANANANANANANANANA23
George4/23/202112:27NA65NANANANANANA1>10NANA
George4/23/202112:27NANANANANA1149NANANANANA
George4/23/202112:27NANA31NANANANA56NANANANA

The end dataframe would have one row per unique study_id, study date, and study time together, with all associated values along that row. So for example, entry Jane - 3/4/2013 - 15:27 would look like below:

study_idlab_study_dtlab_study_timelab_polyslab_lymphslab_monolab_eoslab_basoslab_bandslab_wbc_countlab_rbc_countprotein_limit_of_detectionlab_proteingluc_limit_of_detectionlab_glucose
Jane3/4/201315:2786031NANA11491000156NANA

Thank you in advance

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

GRAY°灰色天空 2025-02-21 01:22:47

我们可以按id/dt/time进行分组,然后在所有列上使用tidyr :: fill(即非NA值并首先通过任何NAS将其复制,然后通过任何NAS(“ Downup”,我的任意选择)。最后,我们只能将第一个切片保留在每个组中,然后删除分组。

library(tidyverse)    
df %>%
  group_by(study_id,lab_study_dt,lab_study_time) %>%
  fill(everything(), .direction = "downup") %>%
  slice(1) %>%
  ungroup()

这是您期望的输出吗?

# A tibble: 5 × 15
  study_id lab_study_dt lab_study_time lab_polys lab_lymphs lab_mono lab_eos lab_basos lab_bands lab_wbc_count lab_rbc_count protein_limit_of_detection lab_protein gluc_limit_of_detecti… lab_glucose
  <chr>    <chr>        <chr>              <int>      <int>    <int> <lgl>   <lgl>         <int>         <int>         <int>                      <int> <chr>                        <int>       <int>
1 George   4/20/2021    21:18                 NA         60       NA NA      NA               NA            NA         10000                         NA NA                              NA          NA
2 George   4/23/2021    12:27                 NA         65       31 NA      NA                1           149            56                          1 >10                             NA          NA
3 George   4/23/2021    15:27                  8         NA       NA NA      NA               NA            NA            NA                         NA NA                               2           3
4 Jane     3/4/2013     15:27                  8         60       31 NA      NA                1           149         10000                          1 56                              NA          NA
5 Jane     8/13/2011    0:12                  NA         NA       NA NA      NA               NA             1            NA                         NA NA                              NA          NA

We could group by the id/dt/time, then use tidyr::fill on all columns (ie everything()) to take any non-NA values and copy them first down through any NAs and then up though any NAs ("downup", my arbitrary choice). Finally, we can keep just the first slice within each group and then remove the grouping.

library(tidyverse)    
df %>%
  group_by(study_id,lab_study_dt,lab_study_time) %>%
  fill(everything(), .direction = "downup") %>%
  slice(1) %>%
  ungroup()

Is this the output you'd expect?

# A tibble: 5 × 15
  study_id lab_study_dt lab_study_time lab_polys lab_lymphs lab_mono lab_eos lab_basos lab_bands lab_wbc_count lab_rbc_count protein_limit_of_detection lab_protein gluc_limit_of_detecti… lab_glucose
  <chr>    <chr>        <chr>              <int>      <int>    <int> <lgl>   <lgl>         <int>         <int>         <int>                      <int> <chr>                        <int>       <int>
1 George   4/20/2021    21:18                 NA         60       NA NA      NA               NA            NA         10000                         NA NA                              NA          NA
2 George   4/23/2021    12:27                 NA         65       31 NA      NA                1           149            56                          1 >10                             NA          NA
3 George   4/23/2021    15:27                  8         NA       NA NA      NA               NA            NA            NA                         NA NA                               2           3
4 Jane     3/4/2013     15:27                  8         60       31 NA      NA                1           149         10000                          1 56                              NA          NA
5 Jane     8/13/2011    0:12                  NA         NA       NA NA      NA               NA             1            NA                         NA NA                              NA          NA
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文