在 R 中创建规则来计算每个患者每天的咨询次数

发布于 2025-01-14 14:07:09 字数 10105 浏览 3 评论 0原文

我已经使用实际数据集中的关键场景创建了以下数据集:

df <- data.frame (organisation_id  = c("1","1","2","2","2","2","2","2","3","3","3","3","3","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4"),
                  patient_id = c("1230","1230","1222","1222","1244","1244","987","987","2223","2223","2247","2247","2247","1234","1234","1234","1234","1234","1234","1234","1234","1239","1239","1239","3322","3322","3322","5434","5434","4488","4488","4488","1250","1250"),
                  date = c("08-02-2018","08-02-2018","12-01-2018","12-01-2018","12-01-2018","22-02-2018","12-01-2018","22-02-2018","01-03-2019","01-03-2019","01-03-2019","01-03-2019","01-03-2019","12-07-2020","12-07-2020","12-07-2020","12-07-2020","12-07-2020","12-07-2020","12-07-2020","12-07-2020","13-07-2020","13-07-2020","13-07-2020","16-06-2021","16-06-2021","16-06-2021","14-05-2019","14-05-2019","17-03-2020","17-03-2020","17-03-2020","03-02-2019","03-02-2019"),
                  consultation_mode = c("Telephone","Face-to-Face","Telephone","Telephone","Face-to-Face","Face-to-Face","Telephone","Telephone","Home visit","Home visit","Face-to-Face","Face-to-Face","Face-to-Face","Telephone","Telephone","Telephone","Telephone","Face-to-Face","Face-to-Face","Face-to-Face","Face-to-Face","Home visit","Home visit","Home visit","Face-to-Face","Telephone","Face-to-Face","Telephone","Face-to-Face","Face-to-Face","Telephone","Telephone","Face-to-Face","Face-to-Face"),
                  professional_id = c("24","11","123","110","123","110","123","333","444","444","444","444","444","1133","12","25","26","12","34","35","38","44","44","5556","443","443","445","29","29","555","5556","12","1133","113663"),
                  professional_role = c("Doctor","Support","Doctor","Support","Doctor","Support","Doctor","Nurse","Doctor","Doctor","Doctor","Doctor","Doctor","Support","Support","Nurse","Nurse","Support","Doctor","Doctor","Nurse","Nurse","Nurse","Doctor","Doctor","Doctor","Doctor","Doctor","Doctor","Doctor","Doctor","Support","Support","Support"),
                  professional_name = c("Dr John Taylor","Mary Wright","Dr Patricia Jones","James Davies","Dr Patricia Jones","James Davies","Dr Patricia Jones","Peter Hall","Dr Mary Wilson","Dr Mary Wilson","Dr Mary Wilson","Dr Mary Wilson","Dr Mary Wilson","Mary Wright","Anthony Patel","Jennifer Walker","Jennifer Walker","Anthony Patel","Dr Carol Bell","Dr Carol Bell","Deborah Dixon","Kevin R Collins","Kevin Collins","Dr Robert Brown","Dr Mary Wilson","Dr Mary Wilson","Dr John Snow","Dr John Taylor","Dr John Taylor","Dr James Smith","Dr Robert Brown","Anthony Patel","Mary Wright","Mary TEST Wright")
)

df$organisation_id <- as.factor(df$organisation_id)
df$patient_id <- as.factor(df$patient_id)
df$date <- as.Date(df$date, "%d-%m-%Y")
df$consultation_mode <- as.factor(df$consultation_mode)
df$professional_id <- as.factor(df$professional_id)
df$professional_role <- as.factor(df$professional_role)

我想创建两个额外的列(include?Nr_consultations_per_Pt_day),如下所示

:每个 organization_idpatent_iddateconsultation_mode 检查:

1- 如果有只有 1 个行,对于该professional_roleinclude? = 1 且 Nr_consultations_per_Pt_day = 1。

2- 如果多于 1 行,则对于每个不同的 professional_idprofessional_nameinclude? = 1,其中 <代码>consultation_role =“医生”或“护士”。

注意:如果“医生”或“护士”有 2 个以上具有不同 professional_idprofessional_name 相同的条目,则第一行将显示 include? = 1 以及以下行 include? = 0。例如 Jennifer Walker 的 25 / 26 个 ID。同样,如果“Doctor”或“Nurse”有 2 个以上具有相同 professional_idprofessional_name 不同的条目,则第一行将显示 include? = 1 以及以下行包括? = 0。例如 Kevin R Collins / Kevin Collins 的 44 ID。

2.1- 如果有 0 个“医生”或“护士”(全部为“支持”),则第一行获取 include? = 1,随后的行 include? = 0,该professional_roleNr_consultations_per_Pt_day = 1。

中间数据集:

组织_id病人_id日期咨询_模式专业_id专业_角色专业_名称包括?
1123008-02-2018电话24医生John Taylor 博士1
1123008-02-2018面对面11支持Mary Wright1
2122212-01-2018电话123医生Patricia Jones 博士1
2122212-01 -2018年电话110支持James Davies0
2124412-01-2018面对面123医生Dr Patricia Jones1
2124422-02-2018面对面110支持James Davies1
298712-01-2018电话123医生帕特里夏·琼斯博士1
298722-02-2018电话333护士Peter Hall1
3222301-03-2019家访444医生Mary Wilson博士 1
3222301-03-2019家访444医生Dr Mary Wilson0
3224701-03-2019面部 章第444博士医生 玛丽·威尔逊1
3224701-03-2019面对面444医生玛丽·威尔逊博士0
3224701-03-2019面对面444医生玛丽·威尔逊博士0
4123412-07-2020电话1133支持玛丽赖特0
4123412-07-2020电话12支持Anthony Patel0
4123412-07-2020电话25护士Jennifer Walker1
4123412-07-2020电话26护士Jennifer Walker0
4123412-07-2020面对面12支持安东尼Patel0
4123412-07-2020面对面34医生Carol Bell 博士1
4123412-07-2020面对面35医生Carol Bell 博士0
4123412-07-2020面对面38护士黛博拉迪克森1
4123913-07-2020家访44护士Kevin R Collins1
4123913-07-2020家访44护士Kevin Collins0
4123913-07-2020家访5556医生Dr Robert Brown1
4332216-06-2021面容 章第443博士玛丽 Wilson1
4332216-06-2021电话443医生Mary Wilson 博士1
4332216-06-2021面对面445医生John Snow博士 1
4543414-05-2019电话29医生John Taylor 博士1
454342019年5月14日面对面29医生Dr John Taylor1
4448817-03-2020面对面555医生Dr James Smith1
4448817-03-2020电话5556医生Dr Robert Brown1
4448817-03-2020电话12支持安东尼·帕特尔0
4125003-02-2019面对面1133支持Mary Wright1
4125003-02-2019面对面113663支持Mary TEST Wright0

最终数据集: 一个 organization_idpatent_iddate 以及每个类别 consultation_modeprofessional_role 的示例>。

organization_id患者_id日期consulting_modeprofessional_roleNr_consultations_per_Pt_day
1123008-02-2018面对面医生0
1123008-02-2018面对面护士0
1123008-02-2018面对面支持1
1123008-02-2018电话医生1
1123008-02-2018电话护士0
1123008-02-2018电话支持0
1123008-02-2018家访医生0
1123008-02-2018家访护士0
1123008-02-2018首页 访问支持0

关于如何在 R 中以有效的方式做到这一点有什么想法吗?

I have created the following dataset with key scenarios that I have in my actual dataset:

df <- data.frame (organisation_id  = c("1","1","2","2","2","2","2","2","3","3","3","3","3","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4"),
                  patient_id = c("1230","1230","1222","1222","1244","1244","987","987","2223","2223","2247","2247","2247","1234","1234","1234","1234","1234","1234","1234","1234","1239","1239","1239","3322","3322","3322","5434","5434","4488","4488","4488","1250","1250"),
                  date = c("08-02-2018","08-02-2018","12-01-2018","12-01-2018","12-01-2018","22-02-2018","12-01-2018","22-02-2018","01-03-2019","01-03-2019","01-03-2019","01-03-2019","01-03-2019","12-07-2020","12-07-2020","12-07-2020","12-07-2020","12-07-2020","12-07-2020","12-07-2020","12-07-2020","13-07-2020","13-07-2020","13-07-2020","16-06-2021","16-06-2021","16-06-2021","14-05-2019","14-05-2019","17-03-2020","17-03-2020","17-03-2020","03-02-2019","03-02-2019"),
                  consultation_mode = c("Telephone","Face-to-Face","Telephone","Telephone","Face-to-Face","Face-to-Face","Telephone","Telephone","Home visit","Home visit","Face-to-Face","Face-to-Face","Face-to-Face","Telephone","Telephone","Telephone","Telephone","Face-to-Face","Face-to-Face","Face-to-Face","Face-to-Face","Home visit","Home visit","Home visit","Face-to-Face","Telephone","Face-to-Face","Telephone","Face-to-Face","Face-to-Face","Telephone","Telephone","Face-to-Face","Face-to-Face"),
                  professional_id = c("24","11","123","110","123","110","123","333","444","444","444","444","444","1133","12","25","26","12","34","35","38","44","44","5556","443","443","445","29","29","555","5556","12","1133","113663"),
                  professional_role = c("Doctor","Support","Doctor","Support","Doctor","Support","Doctor","Nurse","Doctor","Doctor","Doctor","Doctor","Doctor","Support","Support","Nurse","Nurse","Support","Doctor","Doctor","Nurse","Nurse","Nurse","Doctor","Doctor","Doctor","Doctor","Doctor","Doctor","Doctor","Doctor","Support","Support","Support"),
                  professional_name = c("Dr John Taylor","Mary Wright","Dr Patricia Jones","James Davies","Dr Patricia Jones","James Davies","Dr Patricia Jones","Peter Hall","Dr Mary Wilson","Dr Mary Wilson","Dr Mary Wilson","Dr Mary Wilson","Dr Mary Wilson","Mary Wright","Anthony Patel","Jennifer Walker","Jennifer Walker","Anthony Patel","Dr Carol Bell","Dr Carol Bell","Deborah Dixon","Kevin R Collins","Kevin Collins","Dr Robert Brown","Dr Mary Wilson","Dr Mary Wilson","Dr John Snow","Dr John Taylor","Dr John Taylor","Dr James Smith","Dr Robert Brown","Anthony Patel","Mary Wright","Mary TEST Wright")
)

df$organisation_id <- as.factor(df$organisation_id)
df$patient_id <- as.factor(df$patient_id)
df$date <- as.Date(df$date, "%d-%m-%Y")
df$consultation_mode <- as.factor(df$consultation_mode)
df$professional_id <- as.factor(df$professional_id)
df$professional_role <- as.factor(df$professional_role)

I want to create two extra columns (include? and Nr_consultations_per_Pt_day) as per the below:

For each organisation_id, patient_id, date and consultation_mode check:

1- If there is only 1 row, include? = 1 and Nr_consultations_per_Pt_day = 1 for that professional_role.

2- If there is more than 1 row, include? = 1 for each different professional_id and professional_name with consultation_role = 'Doctor' or 'Nurse'.

Note: if there are 2+ entries for ‘Doctor’ or ‘Nurse’ with different professional_id but same professional_name, the first row gets include? = 1 and the following rows include? = 0. E.g. 25 / 26 IDs for Jennifer Walker. Similarly, if there are 2+ entries for ‘Doctor’ or ‘Nurse’ with same professional_id but different professional_name, the first row gets include? = 1 and the following rows include? = 0. E.g. 44 ID for Kevin R Collins / Kevin Collins.

2.1- If there is 0 'Doctor' or 'Nurse' (all ‘Support’), then the first row gets include? = 1 and the following rows include? = 0, with Nr_consultations_per_Pt_day = 1 for that professional_role.

Intermediate dataset:

organisation_idpatient_iddateconsultation_modeprofessional_idprofessional_roleprofessional_nameinclude?
1123008-02-2018Telephone24DoctorDr John Taylor1
1123008-02-2018Face-to-Face11SupportMary Wright1
2122212-01-2018Telephone123DoctorDr Patricia Jones1
2122212-01-2018Telephone110SupportJames Davies0
2124412-01-2018Face-to-Face123DoctorDr Patricia Jones1
2124422-02-2018Face-to-Face110SupportJames Davies1
298712-01-2018Telephone123DoctorDr Patricia Jones1
298722-02-2018Telephone333NursePeter Hall1
3222301-03-2019Home visit444DoctorDr Mary Wilson1
3222301-03-2019Home visit444DoctorDr Mary Wilson0
3224701-03-2019Face-to-Face444DoctorDr Mary Wilson1
3224701-03-2019Face-to-Face444DoctorDr Mary Wilson0
3224701-03-2019Face-to-Face444DoctorDr Mary Wilson0
4123412-07-2020Telephone1133SupportMary Wright0
4123412-07-2020Telephone12SupportAnthony Patel0
4123412-07-2020Telephone25NurseJennifer Walker1
4123412-07-2020Telephone26NurseJennifer Walker0
4123412-07-2020Face-to-Face12SupportAnthony Patel0
4123412-07-2020Face-to-Face34DoctorDr Carol Bell1
4123412-07-2020Face-to-Face35DoctorDr Carol Bell0
4123412-07-2020Face-to-Face38NurseDeborah Dixon1
4123913-07-2020Home visit44NurseKevin R Collins1
4123913-07-2020Home visit44NurseKevin Collins0
4123913-07-2020Home visit5556DoctorDr Robert Brown1
4332216-06-2021Face-to-Face443DoctorDr Mary Wilson1
4332216-06-2021Telephone443DoctorDr Mary Wilson1
4332216-06-2021Face-to-Face445DoctorDr John Snow1
4543414-05-2019Telephone29DoctorDr John Taylor1
4543414-05-2019Face-to-Face29DoctorDr John Taylor1
4448817-03-2020Face-to-Face555DoctorDr James Smith1
4448817-03-2020Telephone5556DoctorDr Robert Brown1
4448817-03-2020Telephone12SupportAnthony Patel0
4125003-02-2019Face-to-Face1133SupportMary Wright1
4125003-02-2019Face-to-Face113663SupportMary TEST Wright0

Final dataset:
Example for one organisation_id,patient_id,date and for each category of consultation_mode and professional_role.

organisation_idpatient_iddateconsultation_modeprofessional_roleNr_consultations_per_Pt_day
1123008-02-2018Face-to-FaceDoctor0
1123008-02-2018Face-to-FaceNurse0
1123008-02-2018Face-to-FaceSupport1
1123008-02-2018TelephoneDoctor1
1123008-02-2018TelephoneNurse0
1123008-02-2018TelephoneSupport0
1123008-02-2018Home visitDoctor0
1123008-02-2018Home visitNurse0
1123008-02-2018Home visitSupport0

etc.

Any ideas on how to do this in R in an efficient way?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

讽刺将军 2025-01-21 14:07:09

如果我正确理解您的描述,对于每一行,我们希望评估以下条件来决定是否 include? = 1

  1. organization_id-patent_id-date-consultation_mode 的行组大小
  2. 为 1organization_id-patent_id-date-consultation_mode 的行组大小大于 1 并且该行对应于:
    1. AND 医生是同名医生中的第一个
    2. 护士 AND 是具有相同 ID/姓名的护士中的第一个
    3. 支持 AND 是支持中的第一个 AND 属于没有医生或护士的organization_id-patent_id-date-consultation_mode 群组的一部分

此逻辑将创建“中间”表。为了创建“最终”表,我们会遍历 Advisory_mode 和 professional_role 的每个类别,如果存在包含 include? 的相应条目,则设置 Nr_consultations_per_Pt_day = 1 = 1 。

基于上述期望,我将这样做:

library(tidyverse)

# For each row, add the size of its 
# organisation_id-patient_id-date-consultation_mode group
df2 <- df %>% group_by(organisation_id, patient_id, date, consultation_mode) %>% 
    mutate(group_size = n()) %>% ungroup()

# For each row, indicate whether it's the first entry of 
# organisation_id-patient_id-date-consultation_mode-professional_role group 
# of people with the SAME NAME but possiblly different ID
df3 <- df2 %>% group_by(organisation_id, patient_id, date, consultation_mode, 
        professional_role, professional_name) %>% 
    mutate(first_by_name = row_number()==1) %>% 
    ungroup()

# For each row, indicate whether it's the first entry of 
# organisation_id-patient_id-date-consultation_mode-professional_role group 
# of people with the SAME ID but possiblly different name
df4 <- df3 %>% group_by(organisation_id, patient_id, date, consultation_mode, 
        professional_role, professional_id) %>% 
    mutate(first_by_id = row_number()==1) %>% 
    ungroup()

# For each row, indicate whether there's no doctor/nurse in its 
# organisation_id-patient_id-date-consultation_mode
# and indicate the first entry in such support-only group
df5 <- df4 %>% group_by(organisation_id, patient_id, date, consultation_mode) %>% 
    mutate(support_only_group = length(intersect(professional_role, c("Doctor", "Nurse"))) == 0) %>% 
    mutate(first_in_support_only = row_number()==1 & support_only_group) %>% 
    ungroup()

# Apply rules to determine the inclusion status of each row
df6 <- df5 %>% mutate(`include?` = if_else(
        group_size == 1 | 
        (professional_role %in% c("Doctor","Nurse") & (first_by_name & first_by_id)) |
        first_in_support_only, 1, 0)) 
df6

在此处输入图像描述

转换为最终表格:

# Convert into the final table
df7 <- df6 %>% 
    select(-c(group_size, first_by_name, first_by_id, support_only_group, first_in_support_only)) %>% 
    group_by(organisation_id, patient_id, date) %>% 
    expand(consultation_mode, professional_role) %>%
    left_join(df6) %>%
    mutate(Nr_consultations_per_Pt_day = replace_na(`include?`,0)) %>%
    select(-c(professional_id, professional_name, `include?`)) %>%
    group_by(organisation_id, patient_id, date, consultation_mode, professional_role) %>%
    summarise(Nr_consultations_per_Pt_day = sum(Nr_consultations_per_Pt_day))

df7 %>% filter(patient_id %in% c(2223, 1250, 1230))

在此处输入图像描述

If I understand your description correctly, for each row we want to evaluate the following conditions to decide whether include? = 1:

  1. The row's group size for organisation_id-patient_id-date-consultation_mode is 1
  2. The row's group size for organisation_id-patient_id-date-consultation_mode is greater than 1 AND the row corresponds to a:
    1. Doctor AND is the first among doctors with the same id/name
    2. Nurse AND is the first among nurses with the same id/name
    3. Support AND is the first among support AND is part of a organisation_id-patient_id-date-consultation_mode group that has no doctor or nurse

This logic will create the "intermediate" table. To create the "final" table, we go through each category of consultation_mode and professional_role and set Nr_consultations_per_Pt_day = 1 if there's a corresponding entry with include? = 1.

Based on the above expectation, here's how I'd do it:

library(tidyverse)

# For each row, add the size of its 
# organisation_id-patient_id-date-consultation_mode group
df2 <- df %>% group_by(organisation_id, patient_id, date, consultation_mode) %>% 
    mutate(group_size = n()) %>% ungroup()

# For each row, indicate whether it's the first entry of 
# organisation_id-patient_id-date-consultation_mode-professional_role group 
# of people with the SAME NAME but possiblly different ID
df3 <- df2 %>% group_by(organisation_id, patient_id, date, consultation_mode, 
        professional_role, professional_name) %>% 
    mutate(first_by_name = row_number()==1) %>% 
    ungroup()

# For each row, indicate whether it's the first entry of 
# organisation_id-patient_id-date-consultation_mode-professional_role group 
# of people with the SAME ID but possiblly different name
df4 <- df3 %>% group_by(organisation_id, patient_id, date, consultation_mode, 
        professional_role, professional_id) %>% 
    mutate(first_by_id = row_number()==1) %>% 
    ungroup()

# For each row, indicate whether there's no doctor/nurse in its 
# organisation_id-patient_id-date-consultation_mode
# and indicate the first entry in such support-only group
df5 <- df4 %>% group_by(organisation_id, patient_id, date, consultation_mode) %>% 
    mutate(support_only_group = length(intersect(professional_role, c("Doctor", "Nurse"))) == 0) %>% 
    mutate(first_in_support_only = row_number()==1 & support_only_group) %>% 
    ungroup()

# Apply rules to determine the inclusion status of each row
df6 <- df5 %>% mutate(`include?` = if_else(
        group_size == 1 | 
        (professional_role %in% c("Doctor","Nurse") & (first_by_name & first_by_id)) |
        first_in_support_only, 1, 0)) 
df6

enter image description here

Convert into the final table:

# Convert into the final table
df7 <- df6 %>% 
    select(-c(group_size, first_by_name, first_by_id, support_only_group, first_in_support_only)) %>% 
    group_by(organisation_id, patient_id, date) %>% 
    expand(consultation_mode, professional_role) %>%
    left_join(df6) %>%
    mutate(Nr_consultations_per_Pt_day = replace_na(`include?`,0)) %>%
    select(-c(professional_id, professional_name, `include?`)) %>%
    group_by(organisation_id, patient_id, date, consultation_mode, professional_role) %>%
    summarise(Nr_consultations_per_Pt_day = sum(Nr_consultations_per_Pt_day))

df7 %>% filter(patient_id %in% c(2223, 1250, 1230))

enter image description here

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文