在 R 中创建规则来计算每个患者每天的咨询次数

发布于 2025-01-14 14:07:09 字数 10105 浏览 3 评论 0原文

我已经使用实际数据集中的关键场景创建了以下数据集：

df <- data.frame (organisation_id  = c("1","1","2","2","2","2","2","2","3","3","3","3","3","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4"),
                  patient_id = c("1230","1230","1222","1222","1244","1244","987","987","2223","2223","2247","2247","2247","1234","1234","1234","1234","1234","1234","1234","1234","1239","1239","1239","3322","3322","3322","5434","5434","4488","4488","4488","1250","1250"),
                  date = c("08-02-2018","08-02-2018","12-01-2018","12-01-2018","12-01-2018","22-02-2018","12-01-2018","22-02-2018","01-03-2019","01-03-2019","01-03-2019","01-03-2019","01-03-2019","12-07-2020","12-07-2020","12-07-2020","12-07-2020","12-07-2020","12-07-2020","12-07-2020","12-07-2020","13-07-2020","13-07-2020","13-07-2020","16-06-2021","16-06-2021","16-06-2021","14-05-2019","14-05-2019","17-03-2020","17-03-2020","17-03-2020","03-02-2019","03-02-2019"),
                  consultation_mode = c("Telephone","Face-to-Face","Telephone","Telephone","Face-to-Face","Face-to-Face","Telephone","Telephone","Home visit","Home visit","Face-to-Face","Face-to-Face","Face-to-Face","Telephone","Telephone","Telephone","Telephone","Face-to-Face","Face-to-Face","Face-to-Face","Face-to-Face","Home visit","Home visit","Home visit","Face-to-Face","Telephone","Face-to-Face","Telephone","Face-to-Face","Face-to-Face","Telephone","Telephone","Face-to-Face","Face-to-Face"),
                  professional_id = c("24","11","123","110","123","110","123","333","444","444","444","444","444","1133","12","25","26","12","34","35","38","44","44","5556","443","443","445","29","29","555","5556","12","1133","113663"),
                  professional_role = c("Doctor","Support","Doctor","Support","Doctor","Support","Doctor","Nurse","Doctor","Doctor","Doctor","Doctor","Doctor","Support","Support","Nurse","Nurse","Support","Doctor","Doctor","Nurse","Nurse","Nurse","Doctor","Doctor","Doctor","Doctor","Doctor","Doctor","Doctor","Doctor","Support","Support","Support"),
                  professional_name = c("Dr John Taylor","Mary Wright","Dr Patricia Jones","James Davies","Dr Patricia Jones","James Davies","Dr Patricia Jones","Peter Hall","Dr Mary Wilson","Dr Mary Wilson","Dr Mary Wilson","Dr Mary Wilson","Dr Mary Wilson","Mary Wright","Anthony Patel","Jennifer Walker","Jennifer Walker","Anthony Patel","Dr Carol Bell","Dr Carol Bell","Deborah Dixon","Kevin R Collins","Kevin Collins","Dr Robert Brown","Dr Mary Wilson","Dr Mary Wilson","Dr John Snow","Dr John Taylor","Dr John Taylor","Dr James Smith","Dr Robert Brown","Anthony Patel","Mary Wright","Mary TEST Wright")
)

df$organisation_id <- as.factor(df$organisation_id)
df$patient_id <- as.factor(df$patient_id)
df$date <- as.Date(df$date, "%d-%m-%Y")
df$consultation_mode <- as.factor(df$consultation_mode)
df$professional_id <- as.factor(df$professional_id)
df$professional_role <- as.factor(df$professional_role)

我想创建两个额外的列（include? 和 Nr_consultations_per_Pt_day），如下所示

：每个 organization_id、patent_id、date 和 consultation_mode 检查：

1- 如果有只有 1 个行，对于该professional_role，include? = 1 且 Nr_consultations_per_Pt_day = 1。

2- 如果多于 1 行，则对于每个不同的 professional_id 和 professional_name，include? = 1，其中 <代码>consultation_role =“医生”或“护士”。

注意：如果“医生”或“护士”有 2 个以上具有不同 professional_id 但 professional_name 相同的条目，则第一行将显示 include? = 1 以及以下行 include? = 0。例如 Jennifer Walker 的 25 / 26 个 ID。同样，如果“Doctor”或“Nurse”有 2 个以上具有相同 professional_id 但 professional_name 不同的条目，则第一行将显示 include? = 1 以及以下行包括？ = 0。例如 Kevin R Collins / Kevin Collins 的 44 ID。

2.1- 如果有 0 个“医生”或“护士”（全部为“支持”），则第一行获取 include? = 1，随后的行 include? = 0，该professional_role的Nr_consultations_per_Pt_day = 1。

中间数据集：

组织_id	病人_id	日期	咨询_模式	专业_id专业	_角色	专业_名称	包括？
1	1230	08-02-2018	电话	24	医生	John Taylor 博士	1
1	1230	08-02-2018	面对面	11	支持	Mary Wright	1
2	1222	12-01-2018	电话	123	医生	Patricia Jones 博士	1
2	1222	12-01 -2018年	电话	110	支持	James Davies	0
2	1244	12-01-2018	面对面	123	医生	Dr Patricia Jones	1
2	1244	22-02-2018	面对面	110	支持	James Davies	1
2	987	12-01-2018	电话	123	医生	帕特里夏·琼斯博士	1
2	987	22-02-2018	电话	333	护士	Peter Hall	1
3	2223	01-03-2019	家访	444	医生	Mary Wilson	博士 1
3	2223	01-03-2019	家访	444	医生	Dr Mary Wilson	0
3	2247	01-03-2019	面部章	第444	博士	医生玛丽·威尔逊	1
3	2247	01-03-2019	面对面	444	医生	玛丽·威尔逊博士	0
3	2247	01-03-2019	面对面	444	医生	玛丽·威尔逊博士	0
4	1234	12-07-2020	电话	1133	支持	玛丽赖特	0
4	1234	12-07-2020	电话	12	支持	Anthony Patel	0
4	1234	12-07-2020	电话	25	护士	Jennifer Walker	1
4	1234	12-07-2020	电话	26	护士	Jennifer Walker	0
4	1234	12-07-2020	面对面	12	支持	安东尼Patel	0
4	1234	12-07-2020	面对面	34	医生	Carol Bell 博士	1
4	1234	12-07-2020	面对面	35	医生	Carol Bell 博士	0
4	1234	12-07-2020	面对面	38	护士	黛博拉迪克森	1
4	1239	13-07-2020	家访	44	护士	Kevin R Collins	1
4	1239	13-07-2020	家访	44	护士	Kevin Collins	0
4	1239	13-07-2020	家访	5556	医生	Dr Robert Brown	1
4	3322	16-06-2021	面容章	第443	博士	玛丽 Wilson	1
4	3322	16-06-2021	电话	443	医生	Mary Wilson 博士	1
4	3322	16-06-2021	面对面	445	医生	John Snow	博士 1
4	5434	14-05-2019	电话	29	医生	John Taylor 博士	1
4	5434	2019年5月14日	面对面	29	医生	Dr John Taylor	1
4	4488	17-03-2020	面对面	555	医生	Dr James Smith	1
4	4488	17-03-2020	电话	5556	医生	Dr Robert Brown	1
4	4488	17-03-2020	电话	12	支持	安东尼·帕特尔	0
4	1250	03-02-2019	面对面	1133	支持	Mary Wright	1
4	1250	03-02-2019	面对面	113663	支持	Mary TEST Wright	0

最终数据集： 一个 organization_id、patent_id、date 以及每个类别 consultation_mode 和 professional_role 的示例>。

organization_id	患者_id	日期	consulting_mode	professional_role	Nr_consultations_per_Pt_day
1	1230	08-02-2018	面对面	医生	0
1	1230	08-02-2018	面对面	护士	0
1	1230	08-02-2018	面对面	支持	1
1	1230	08-02-2018	电话	医生	1
1	1230	08-02-2018	电话	护士	0
1	1230	08-02-2018	电话	支持	0
1	1230	08-02-2018	家访	医生	0
1	1230	08-02-2018	家访	护士	0
1	1230	08-02-2018	首页访问	支持	0

等

关于如何在 R 中以有效的方式做到这一点有什么想法吗？

原文

I have created the following dataset with key scenarios that I have in my actual dataset:

df <- data.frame (organisation_id  = c("1","1","2","2","2","2","2","2","3","3","3","3","3","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4"),
                  patient_id = c("1230","1230","1222","1222","1244","1244","987","987","2223","2223","2247","2247","2247","1234","1234","1234","1234","1234","1234","1234","1234","1239","1239","1239","3322","3322","3322","5434","5434","4488","4488","4488","1250","1250"),
                  date = c("08-02-2018","08-02-2018","12-01-2018","12-01-2018","12-01-2018","22-02-2018","12-01-2018","22-02-2018","01-03-2019","01-03-2019","01-03-2019","01-03-2019","01-03-2019","12-07-2020","12-07-2020","12-07-2020","12-07-2020","12-07-2020","12-07-2020","12-07-2020","12-07-2020","13-07-2020","13-07-2020","13-07-2020","16-06-2021","16-06-2021","16-06-2021","14-05-2019","14-05-2019","17-03-2020","17-03-2020","17-03-2020","03-02-2019","03-02-2019"),
                  consultation_mode = c("Telephone","Face-to-Face","Telephone","Telephone","Face-to-Face","Face-to-Face","Telephone","Telephone","Home visit","Home visit","Face-to-Face","Face-to-Face","Face-to-Face","Telephone","Telephone","Telephone","Telephone","Face-to-Face","Face-to-Face","Face-to-Face","Face-to-Face","Home visit","Home visit","Home visit","Face-to-Face","Telephone","Face-to-Face","Telephone","Face-to-Face","Face-to-Face","Telephone","Telephone","Face-to-Face","Face-to-Face"),
                  professional_id = c("24","11","123","110","123","110","123","333","444","444","444","444","444","1133","12","25","26","12","34","35","38","44","44","5556","443","443","445","29","29","555","5556","12","1133","113663"),
                  professional_role = c("Doctor","Support","Doctor","Support","Doctor","Support","Doctor","Nurse","Doctor","Doctor","Doctor","Doctor","Doctor","Support","Support","Nurse","Nurse","Support","Doctor","Doctor","Nurse","Nurse","Nurse","Doctor","Doctor","Doctor","Doctor","Doctor","Doctor","Doctor","Doctor","Support","Support","Support"),
                  professional_name = c("Dr John Taylor","Mary Wright","Dr Patricia Jones","James Davies","Dr Patricia Jones","James Davies","Dr Patricia Jones","Peter Hall","Dr Mary Wilson","Dr Mary Wilson","Dr Mary Wilson","Dr Mary Wilson","Dr Mary Wilson","Mary Wright","Anthony Patel","Jennifer Walker","Jennifer Walker","Anthony Patel","Dr Carol Bell","Dr Carol Bell","Deborah Dixon","Kevin R Collins","Kevin Collins","Dr Robert Brown","Dr Mary Wilson","Dr Mary Wilson","Dr John Snow","Dr John Taylor","Dr John Taylor","Dr James Smith","Dr Robert Brown","Anthony Patel","Mary Wright","Mary TEST Wright")
)

df$organisation_id <- as.factor(df$organisation_id)
df$patient_id <- as.factor(df$patient_id)
df$date <- as.Date(df$date, "%d-%m-%Y")
df$consultation_mode <- as.factor(df$consultation_mode)
df$professional_id <- as.factor(df$professional_id)
df$professional_role <- as.factor(df$professional_role)

I want to create two extra columns (include? and Nr_consultations_per_Pt_day) as per the below:

For each organisation_id, patient_id, date and consultation_mode check:

1- If there is only 1 row, include? = 1 and Nr_consultations_per_Pt_day = 1 for that professional_role.

2- If there is more than 1 row, include? = 1 for each different professional_id and professional_name with consultation_role = 'Doctor' or 'Nurse'.

Note: if there are 2+ entries for ‘Doctor’ or ‘Nurse’ with different professional_id but same professional_name, the first row gets include? = 1 and the following rows include? = 0. E.g. 25 / 26 IDs for Jennifer Walker. Similarly, if there are 2+ entries for ‘Doctor’ or ‘Nurse’ with same professional_id but different professional_name, the first row gets include? = 1 and the following rows include? = 0. E.g. 44 ID for Kevin R Collins / Kevin Collins.

2.1- If there is 0 'Doctor' or 'Nurse' (all ‘Support’), then the first row gets include? = 1 and the following rows include? = 0, with Nr_consultations_per_Pt_day = 1 for that professional_role.

Intermediate dataset:

organisation_id	patient_id	date	consultation_mode	professional_id	professional_role	professional_name	include?
1	1230	08-02-2018	Telephone	24	Doctor	Dr John Taylor	1
1	1230	08-02-2018	Face-to-Face	11	Support	Mary Wright	1
2	1222	12-01-2018	Telephone	123	Doctor	Dr Patricia Jones	1
2	1222	12-01-2018	Telephone	110	Support	James Davies	0
2	1244	12-01-2018	Face-to-Face	123	Doctor	Dr Patricia Jones	1
2	1244	22-02-2018	Face-to-Face	110	Support	James Davies	1
2	987	12-01-2018	Telephone	123	Doctor	Dr Patricia Jones	1
2	987	22-02-2018	Telephone	333	Nurse	Peter Hall	1
3	2223	01-03-2019	Home visit	444	Doctor	Dr Mary Wilson	1
3	2223	01-03-2019	Home visit	444	Doctor	Dr Mary Wilson	0
3	2247	01-03-2019	Face-to-Face	444	Doctor	Dr Mary Wilson	1
3	2247	01-03-2019	Face-to-Face	444	Doctor	Dr Mary Wilson	0
3	2247	01-03-2019	Face-to-Face	444	Doctor	Dr Mary Wilson	0
4	1234	12-07-2020	Telephone	1133	Support	Mary Wright	0
4	1234	12-07-2020	Telephone	12	Support	Anthony Patel	0
4	1234	12-07-2020	Telephone	25	Nurse	Jennifer Walker	1
4	1234	12-07-2020	Telephone	26	Nurse	Jennifer Walker	0
4	1234	12-07-2020	Face-to-Face	12	Support	Anthony Patel	0
4	1234	12-07-2020	Face-to-Face	34	Doctor	Dr Carol Bell	1
4	1234	12-07-2020	Face-to-Face	35	Doctor	Dr Carol Bell	0
4	1234	12-07-2020	Face-to-Face	38	Nurse	Deborah Dixon	1
4	1239	13-07-2020	Home visit	44	Nurse	Kevin R Collins	1
4	1239	13-07-2020	Home visit	44	Nurse	Kevin Collins	0
4	1239	13-07-2020	Home visit	5556	Doctor	Dr Robert Brown	1
4	3322	16-06-2021	Face-to-Face	443	Doctor	Dr Mary Wilson	1
4	3322	16-06-2021	Telephone	443	Doctor	Dr Mary Wilson	1
4	3322	16-06-2021	Face-to-Face	445	Doctor	Dr John Snow	1
4	5434	14-05-2019	Telephone	29	Doctor	Dr John Taylor	1
4	5434	14-05-2019	Face-to-Face	29	Doctor	Dr John Taylor	1
4	4488	17-03-2020	Face-to-Face	555	Doctor	Dr James Smith	1
4	4488	17-03-2020	Telephone	5556	Doctor	Dr Robert Brown	1
4	4488	17-03-2020	Telephone	12	Support	Anthony Patel	0
4	1250	03-02-2019	Face-to-Face	1133	Support	Mary Wright	1
4	1250	03-02-2019	Face-to-Face	113663	Support	Mary TEST Wright	0

Final dataset:
Example for one organisation_id,patient_id,date and for each category of consultation_mode and professional_role.

organisation_id	patient_id	date	consultation_mode	professional_role	Nr_consultations_per_Pt_day
1	1230	08-02-2018	Face-to-Face	Doctor	0
1	1230	08-02-2018	Face-to-Face	Nurse	0
1	1230	08-02-2018	Face-to-Face	Support	1
1	1230	08-02-2018	Telephone	Doctor	1
1	1230	08-02-2018	Telephone	Nurse	0
1	1230	08-02-2018	Telephone	Support	0
1	1230	08-02-2018	Home visit	Doctor	0
1	1230	08-02-2018	Home visit	Nurse	0
1	1230	08-02-2018	Home visit	Support	0

etc.

Any ideas on how to do this in R in an efficient way?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

讽刺将军 2025-01-21 14:07:09

如果我正确理解您的描述，对于每一行，我们希望评估以下条件来决定是否 include? = 1：

organization_id-patent_id-date-consultation_mode 的行组大小
为 1organization_id-patent_id-date-consultation_mode 的行组大小大于 1 并且该行对应于：
1. AND 医生是同名医生中的第一个
2. 护士 AND 是具有相同 ID/姓名的护士中的第一个
3. 支持 AND 是支持中的第一个 AND 属于没有医生或护士的organization_id-patent_id-date-consultation_mode 群组的一部分

此逻辑将创建“中间”表。为了创建“最终”表，我们会遍历 Advisory_mode 和 professional_role 的每个类别，如果存在包含 include? 的相应条目，则设置 Nr_consultations_per_Pt_day = 1 = 1 。

基于上述期望，我将这样做：

library(tidyverse)

# For each row, add the size of its 
# organisation_id-patient_id-date-consultation_mode group
df2 <- df %>% group_by(organisation_id, patient_id, date, consultation_mode) %>% 
    mutate(group_size = n()) %>% ungroup()

# For each row, indicate whether it's the first entry of 
# organisation_id-patient_id-date-consultation_mode-professional_role group 
# of people with the SAME NAME but possiblly different ID
df3 <- df2 %>% group_by(organisation_id, patient_id, date, consultation_mode, 
        professional_role, professional_name) %>% 
    mutate(first_by_name = row_number()==1) %>% 
    ungroup()

# For each row, indicate whether it's the first entry of 
# organisation_id-patient_id-date-consultation_mode-professional_role group 
# of people with the SAME ID but possiblly different name
df4 <- df3 %>% group_by(organisation_id, patient_id, date, consultation_mode, 
        professional_role, professional_id) %>% 
    mutate(first_by_id = row_number()==1) %>% 
    ungroup()

# For each row, indicate whether there's no doctor/nurse in its 
# organisation_id-patient_id-date-consultation_mode
# and indicate the first entry in such support-only group
df5 <- df4 %>% group_by(organisation_id, patient_id, date, consultation_mode) %>% 
    mutate(support_only_group = length(intersect(professional_role, c("Doctor", "Nurse"))) == 0) %>% 
    mutate(first_in_support_only = row_number()==1 & support_only_group) %>% 
    ungroup()

# Apply rules to determine the inclusion status of each row
df6 <- df5 %>% mutate(`include?` = if_else(
        group_size == 1 | 
        (professional_role %in% c("Doctor","Nurse") & (first_by_name & first_by_id)) |
        first_in_support_only, 1, 0)) 
df6

转换为最终表格：

# Convert into the final table
df7 <- df6 %>% 
    select(-c(group_size, first_by_name, first_by_id, support_only_group, first_in_support_only)) %>% 
    group_by(organisation_id, patient_id, date) %>% 
    expand(consultation_mode, professional_role) %>%
    left_join(df6) %>%
    mutate(Nr_consultations_per_Pt_day = replace_na(`include?`,0)) %>%
    select(-c(professional_id, professional_name, `include?`)) %>%
    group_by(organisation_id, patient_id, date, consultation_mode, professional_role) %>%
    summarise(Nr_consultations_per_Pt_day = sum(Nr_consultations_per_Pt_day))

df7 %>% filter(patient_id %in% c(2223, 1250, 1230))

If I understand your description correctly, for each row we want to evaluate the following conditions to decide whether include? = 1:

The row's group size for organisation_id-patient_id-date-consultation_mode is 1
The row's group size for organisation_id-patient_id-date-consultation_mode is greater than 1 AND the row corresponds to a:
1. Doctor AND is the first among doctors with the same id/name
2. Nurse AND is the first among nurses with the same id/name
3. Support AND is the first among support AND is part of a organisation_id-patient_id-date-consultation_mode group that has no doctor or nurse

This logic will create the "intermediate" table. To create the "final" table, we go through each category of consultation_mode and professional_role and set Nr_consultations_per_Pt_day = 1 if there's a corresponding entry with include? = 1.

Based on the above expectation, here's how I'd do it:

library(tidyverse)

# For each row, add the size of its 
# organisation_id-patient_id-date-consultation_mode group
df2 <- df %>% group_by(organisation_id, patient_id, date, consultation_mode) %>% 
    mutate(group_size = n()) %>% ungroup()

# For each row, indicate whether it's the first entry of 
# organisation_id-patient_id-date-consultation_mode-professional_role group 
# of people with the SAME NAME but possiblly different ID
df3 <- df2 %>% group_by(organisation_id, patient_id, date, consultation_mode, 
        professional_role, professional_name) %>% 
    mutate(first_by_name = row_number()==1) %>% 
    ungroup()

# For each row, indicate whether it's the first entry of 
# organisation_id-patient_id-date-consultation_mode-professional_role group 
# of people with the SAME ID but possiblly different name
df4 <- df3 %>% group_by(organisation_id, patient_id, date, consultation_mode, 
        professional_role, professional_id) %>% 
    mutate(first_by_id = row_number()==1) %>% 
    ungroup()

# For each row, indicate whether there's no doctor/nurse in its 
# organisation_id-patient_id-date-consultation_mode
# and indicate the first entry in such support-only group
df5 <- df4 %>% group_by(organisation_id, patient_id, date, consultation_mode) %>% 
    mutate(support_only_group = length(intersect(professional_role, c("Doctor", "Nurse"))) == 0) %>% 
    mutate(first_in_support_only = row_number()==1 & support_only_group) %>% 
    ungroup()

# Apply rules to determine the inclusion status of each row
df6 <- df5 %>% mutate(`include?` = if_else(
        group_size == 1 | 
        (professional_role %in% c("Doctor","Nurse") & (first_by_name & first_by_id)) |
        first_in_support_only, 1, 0)) 
df6

Convert into the final table:

# Convert into the final table
df7 <- df6 %>% 
    select(-c(group_size, first_by_name, first_by_id, support_only_group, first_in_support_only)) %>% 
    group_by(organisation_id, patient_id, date) %>% 
    expand(consultation_mode, professional_role) %>%
    left_join(df6) %>%
    mutate(Nr_consultations_per_Pt_day = replace_na(`include?`,0)) %>%
    select(-c(professional_id, professional_name, `include?`)) %>%
    group_by(organisation_id, patient_id, date, consultation_mode, professional_role) %>%
    summarise(Nr_consultations_per_Pt_day = sum(Nr_consultations_per_Pt_day))

df7 %>% filter(patient_id %in% c(2223, 1250, 1230))

回复收藏 0 原文

~没有更多了~