在 R 中创建规则来计算每个患者每天的咨询次数
我已经使用实际数据集中的关键场景创建了以下数据集:
df <- data.frame (organisation_id = c("1","1","2","2","2","2","2","2","3","3","3","3","3","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4"),
patient_id = c("1230","1230","1222","1222","1244","1244","987","987","2223","2223","2247","2247","2247","1234","1234","1234","1234","1234","1234","1234","1234","1239","1239","1239","3322","3322","3322","5434","5434","4488","4488","4488","1250","1250"),
date = c("08-02-2018","08-02-2018","12-01-2018","12-01-2018","12-01-2018","22-02-2018","12-01-2018","22-02-2018","01-03-2019","01-03-2019","01-03-2019","01-03-2019","01-03-2019","12-07-2020","12-07-2020","12-07-2020","12-07-2020","12-07-2020","12-07-2020","12-07-2020","12-07-2020","13-07-2020","13-07-2020","13-07-2020","16-06-2021","16-06-2021","16-06-2021","14-05-2019","14-05-2019","17-03-2020","17-03-2020","17-03-2020","03-02-2019","03-02-2019"),
consultation_mode = c("Telephone","Face-to-Face","Telephone","Telephone","Face-to-Face","Face-to-Face","Telephone","Telephone","Home visit","Home visit","Face-to-Face","Face-to-Face","Face-to-Face","Telephone","Telephone","Telephone","Telephone","Face-to-Face","Face-to-Face","Face-to-Face","Face-to-Face","Home visit","Home visit","Home visit","Face-to-Face","Telephone","Face-to-Face","Telephone","Face-to-Face","Face-to-Face","Telephone","Telephone","Face-to-Face","Face-to-Face"),
professional_id = c("24","11","123","110","123","110","123","333","444","444","444","444","444","1133","12","25","26","12","34","35","38","44","44","5556","443","443","445","29","29","555","5556","12","1133","113663"),
professional_role = c("Doctor","Support","Doctor","Support","Doctor","Support","Doctor","Nurse","Doctor","Doctor","Doctor","Doctor","Doctor","Support","Support","Nurse","Nurse","Support","Doctor","Doctor","Nurse","Nurse","Nurse","Doctor","Doctor","Doctor","Doctor","Doctor","Doctor","Doctor","Doctor","Support","Support","Support"),
professional_name = c("Dr John Taylor","Mary Wright","Dr Patricia Jones","James Davies","Dr Patricia Jones","James Davies","Dr Patricia Jones","Peter Hall","Dr Mary Wilson","Dr Mary Wilson","Dr Mary Wilson","Dr Mary Wilson","Dr Mary Wilson","Mary Wright","Anthony Patel","Jennifer Walker","Jennifer Walker","Anthony Patel","Dr Carol Bell","Dr Carol Bell","Deborah Dixon","Kevin R Collins","Kevin Collins","Dr Robert Brown","Dr Mary Wilson","Dr Mary Wilson","Dr John Snow","Dr John Taylor","Dr John Taylor","Dr James Smith","Dr Robert Brown","Anthony Patel","Mary Wright","Mary TEST Wright")
)
df$organisation_id <- as.factor(df$organisation_id)
df$patient_id <- as.factor(df$patient_id)
df$date <- as.Date(df$date, "%d-%m-%Y")
df$consultation_mode <- as.factor(df$consultation_mode)
df$professional_id <- as.factor(df$professional_id)
df$professional_role <- as.factor(df$professional_role)
我想创建两个额外的列(include?
和 Nr_consultations_per_Pt_day
),如下所示
:每个 organization_id
、patent_id
、date
和 consultation_mode
检查:
1- 如果有只有 1 个行,对于该professional_role
,include?
= 1 且 Nr_consultations_per_Pt_day
= 1。
2- 如果多于 1 行,则对于每个不同的 professional_id
和 professional_name
,include?
= 1,其中 <代码>consultation_role =“医生”或“护士”。
注意:如果“医生”或“护士”有 2 个以上具有不同 professional_id
但 professional_name
相同的条目,则第一行将显示 include?
= 1 以及以下行 include?
= 0。例如 Jennifer Walker 的 25 / 26 个 ID。同样,如果“Doctor”或“Nurse”有 2 个以上具有相同 professional_id
但 professional_name
不同的条目,则第一行将显示 include?
= 1 以及以下行包括?
= 0。例如 Kevin R Collins / Kevin Collins 的 44 ID。
2.1- 如果有 0 个“医生”或“护士”(全部为“支持”),则第一行获取 include?
= 1,随后的行 include?
= 0,该professional_role
的Nr_consultations_per_Pt_day
= 1。
中间数据集:
组织_id | 病人_id | 日期 | 咨询_模式 | 专业_id专业 | _角色 | 专业_名称 | 包括? |
---|---|---|---|---|---|---|---|
1 | 1230 | 08-02-2018 | 电话 | 24 | 医生 | John Taylor 博士 | 1 |
1 | 1230 | 08-02-2018 | 面对面 | 11 | 支持 | Mary Wright | 1 |
2 | 1222 | 12-01-2018 | 电话 | 123 | 医生 | Patricia Jones 博士 | 1 |
2 | 1222 | 12-01 -2018年 | 电话 | 110 | 支持 | James Davies | 0 |
2 | 1244 | 12-01-2018 | 面对面 | 123 | 医生 | Dr Patricia Jones | 1 |
2 | 1244 | 22-02-2018 | 面对面 | 110 | 支持 | James Davies | 1 |
2 | 987 | 12-01-2018 | 电话 | 123 | 医生 | 帕特里夏·琼斯博士 | 1 |
2 | 987 | 22-02-2018 | 电话 | 333 | 护士 | Peter Hall | 1 |
3 | 2223 | 01-03-2019 | 家访 | 444 | 医生 | Mary Wilson | 博士 1 |
3 | 2223 | 01-03-2019 | 家访 | 444 | 医生 | Dr Mary Wilson | 0 |
3 | 2247 | 01-03-2019 | 面部 章 | 第444 | 博士 | 医生 玛丽·威尔逊 | 1 |
3 | 2247 | 01-03-2019 | 面对面 | 444 | 医生 | 玛丽·威尔逊博士 | 0 |
3 | 2247 | 01-03-2019 | 面对面 | 444 | 医生 | 玛丽·威尔逊博士 | 0 |
4 | 1234 | 12-07-2020 | 电话 | 1133 | 支持 | 玛丽赖特 | 0 |
4 | 1234 | 12-07-2020 | 电话 | 12 | 支持 | Anthony Patel | 0 |
4 | 1234 | 12-07-2020 | 电话 | 25 | 护士 | Jennifer Walker | 1 |
4 | 1234 | 12-07-2020 | 电话 | 26 | 护士 | Jennifer Walker | 0 |
4 | 1234 | 12-07-2020 | 面对面 | 12 | 支持 | 安东尼Patel | 0 |
4 | 1234 | 12-07-2020 | 面对面 | 34 | 医生 | Carol Bell 博士 | 1 |
4 | 1234 | 12-07-2020 | 面对面 | 35 | 医生 | Carol Bell 博士 | 0 |
4 | 1234 | 12-07-2020 | 面对面 | 38 | 护士 | 黛博拉迪克森 | 1 |
4 | 1239 | 13-07-2020 | 家访 | 44 | 护士 | Kevin R Collins | 1 |
4 | 1239 | 13-07-2020 | 家访 | 44 | 护士 | Kevin Collins | 0 |
4 | 1239 | 13-07-2020 | 家访 | 5556 | 医生 | Dr Robert Brown | 1 |
4 | 3322 | 16-06-2021 | 面容 章 | 第443 | 博士 | 玛丽 Wilson | 1 |
4 | 3322 | 16-06-2021 | 电话 | 443 | 医生 | Mary Wilson 博士 | 1 |
4 | 3322 | 16-06-2021 | 面对面 | 445 | 医生 | John Snow | 博士 1 |
4 | 5434 | 14-05-2019 | 电话 | 29 | 医生 | John Taylor 博士 | 1 |
4 | 5434 | 2019年5月14日 | 面对面 | 29 | 医生 | Dr John Taylor | 1 |
4 | 4488 | 17-03-2020 | 面对面 | 555 | 医生 | Dr James Smith | 1 |
4 | 4488 | 17-03-2020 | 电话 | 5556 | 医生 | Dr Robert Brown | 1 |
4 | 4488 | 17-03-2020 | 电话 | 12 | 支持 | 安东尼·帕特尔 | 0 |
4 | 1250 | 03-02-2019 | 面对面 | 1133 | 支持 | Mary Wright | 1 |
4 | 1250 | 03-02-2019 | 面对面 | 113663 | 支持 | Mary TEST Wright | 0 |
最终数据集: 一个 organization_id
、patent_id
、date
以及每个类别 consultation_mode
和 professional_role
的示例>。
organization_id | 患者_id | 日期 | consulting_mode | professional_role | Nr_consultations_per_Pt_day |
---|---|---|---|---|---|
1 | 1230 | 08-02-2018 | 面对面 | 医生 | 0 |
1 | 1230 | 08-02-2018 | 面对面 | 护士 | 0 |
1 | 1230 | 08-02-2018 | 面对面 | 支持 | 1 |
1 | 1230 | 08-02-2018 | 电话 | 医生 | 1 |
1 | 1230 | 08-02-2018 | 电话 | 护士 | 0 |
1 | 1230 | 08-02-2018 | 电话 | 支持 | 0 |
1 | 1230 | 08-02-2018 | 家访 | 医生 | 0 |
1 | 1230 | 08-02-2018 | 家访 | 护士 | 0 |
1 | 1230 | 08-02-2018 | 首页 访问 | 支持 | 0 |
等
关于如何在 R 中以有效的方式做到这一点有什么想法吗?
I have created the following dataset with key scenarios that I have in my actual dataset:
df <- data.frame (organisation_id = c("1","1","2","2","2","2","2","2","3","3","3","3","3","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4","4"),
patient_id = c("1230","1230","1222","1222","1244","1244","987","987","2223","2223","2247","2247","2247","1234","1234","1234","1234","1234","1234","1234","1234","1239","1239","1239","3322","3322","3322","5434","5434","4488","4488","4488","1250","1250"),
date = c("08-02-2018","08-02-2018","12-01-2018","12-01-2018","12-01-2018","22-02-2018","12-01-2018","22-02-2018","01-03-2019","01-03-2019","01-03-2019","01-03-2019","01-03-2019","12-07-2020","12-07-2020","12-07-2020","12-07-2020","12-07-2020","12-07-2020","12-07-2020","12-07-2020","13-07-2020","13-07-2020","13-07-2020","16-06-2021","16-06-2021","16-06-2021","14-05-2019","14-05-2019","17-03-2020","17-03-2020","17-03-2020","03-02-2019","03-02-2019"),
consultation_mode = c("Telephone","Face-to-Face","Telephone","Telephone","Face-to-Face","Face-to-Face","Telephone","Telephone","Home visit","Home visit","Face-to-Face","Face-to-Face","Face-to-Face","Telephone","Telephone","Telephone","Telephone","Face-to-Face","Face-to-Face","Face-to-Face","Face-to-Face","Home visit","Home visit","Home visit","Face-to-Face","Telephone","Face-to-Face","Telephone","Face-to-Face","Face-to-Face","Telephone","Telephone","Face-to-Face","Face-to-Face"),
professional_id = c("24","11","123","110","123","110","123","333","444","444","444","444","444","1133","12","25","26","12","34","35","38","44","44","5556","443","443","445","29","29","555","5556","12","1133","113663"),
professional_role = c("Doctor","Support","Doctor","Support","Doctor","Support","Doctor","Nurse","Doctor","Doctor","Doctor","Doctor","Doctor","Support","Support","Nurse","Nurse","Support","Doctor","Doctor","Nurse","Nurse","Nurse","Doctor","Doctor","Doctor","Doctor","Doctor","Doctor","Doctor","Doctor","Support","Support","Support"),
professional_name = c("Dr John Taylor","Mary Wright","Dr Patricia Jones","James Davies","Dr Patricia Jones","James Davies","Dr Patricia Jones","Peter Hall","Dr Mary Wilson","Dr Mary Wilson","Dr Mary Wilson","Dr Mary Wilson","Dr Mary Wilson","Mary Wright","Anthony Patel","Jennifer Walker","Jennifer Walker","Anthony Patel","Dr Carol Bell","Dr Carol Bell","Deborah Dixon","Kevin R Collins","Kevin Collins","Dr Robert Brown","Dr Mary Wilson","Dr Mary Wilson","Dr John Snow","Dr John Taylor","Dr John Taylor","Dr James Smith","Dr Robert Brown","Anthony Patel","Mary Wright","Mary TEST Wright")
)
df$organisation_id <- as.factor(df$organisation_id)
df$patient_id <- as.factor(df$patient_id)
df$date <- as.Date(df$date, "%d-%m-%Y")
df$consultation_mode <- as.factor(df$consultation_mode)
df$professional_id <- as.factor(df$professional_id)
df$professional_role <- as.factor(df$professional_role)
I want to create two extra columns (include?
and Nr_consultations_per_Pt_day
) as per the below:
For each organisation_id
, patient_id
, date
and consultation_mode
check:
1- If there is only 1 row, include?
= 1 and Nr_consultations_per_Pt_day
= 1 for that professional_role
.
2- If there is more than 1 row, include?
= 1 for each different professional_id
and professional_name
with consultation_role
= 'Doctor' or 'Nurse'.
Note: if there are 2+ entries for ‘Doctor’ or ‘Nurse’ with different professional_id
but same professional_name
, the first row gets include?
= 1 and the following rows include?
= 0. E.g. 25 / 26 IDs for Jennifer Walker. Similarly, if there are 2+ entries for ‘Doctor’ or ‘Nurse’ with same professional_id
but different professional_name
, the first row gets include?
= 1 and the following rows include?
= 0. E.g. 44 ID for Kevin R Collins / Kevin Collins.
2.1- If there is 0 'Doctor' or 'Nurse' (all ‘Support’), then the first row gets include?
= 1 and the following rows include?
= 0, with Nr_consultations_per_Pt_day
= 1 for that professional_role
.
Intermediate dataset:
organisation_id | patient_id | date | consultation_mode | professional_id | professional_role | professional_name | include? |
---|---|---|---|---|---|---|---|
1 | 1230 | 08-02-2018 | Telephone | 24 | Doctor | Dr John Taylor | 1 |
1 | 1230 | 08-02-2018 | Face-to-Face | 11 | Support | Mary Wright | 1 |
2 | 1222 | 12-01-2018 | Telephone | 123 | Doctor | Dr Patricia Jones | 1 |
2 | 1222 | 12-01-2018 | Telephone | 110 | Support | James Davies | 0 |
2 | 1244 | 12-01-2018 | Face-to-Face | 123 | Doctor | Dr Patricia Jones | 1 |
2 | 1244 | 22-02-2018 | Face-to-Face | 110 | Support | James Davies | 1 |
2 | 987 | 12-01-2018 | Telephone | 123 | Doctor | Dr Patricia Jones | 1 |
2 | 987 | 22-02-2018 | Telephone | 333 | Nurse | Peter Hall | 1 |
3 | 2223 | 01-03-2019 | Home visit | 444 | Doctor | Dr Mary Wilson | 1 |
3 | 2223 | 01-03-2019 | Home visit | 444 | Doctor | Dr Mary Wilson | 0 |
3 | 2247 | 01-03-2019 | Face-to-Face | 444 | Doctor | Dr Mary Wilson | 1 |
3 | 2247 | 01-03-2019 | Face-to-Face | 444 | Doctor | Dr Mary Wilson | 0 |
3 | 2247 | 01-03-2019 | Face-to-Face | 444 | Doctor | Dr Mary Wilson | 0 |
4 | 1234 | 12-07-2020 | Telephone | 1133 | Support | Mary Wright | 0 |
4 | 1234 | 12-07-2020 | Telephone | 12 | Support | Anthony Patel | 0 |
4 | 1234 | 12-07-2020 | Telephone | 25 | Nurse | Jennifer Walker | 1 |
4 | 1234 | 12-07-2020 | Telephone | 26 | Nurse | Jennifer Walker | 0 |
4 | 1234 | 12-07-2020 | Face-to-Face | 12 | Support | Anthony Patel | 0 |
4 | 1234 | 12-07-2020 | Face-to-Face | 34 | Doctor | Dr Carol Bell | 1 |
4 | 1234 | 12-07-2020 | Face-to-Face | 35 | Doctor | Dr Carol Bell | 0 |
4 | 1234 | 12-07-2020 | Face-to-Face | 38 | Nurse | Deborah Dixon | 1 |
4 | 1239 | 13-07-2020 | Home visit | 44 | Nurse | Kevin R Collins | 1 |
4 | 1239 | 13-07-2020 | Home visit | 44 | Nurse | Kevin Collins | 0 |
4 | 1239 | 13-07-2020 | Home visit | 5556 | Doctor | Dr Robert Brown | 1 |
4 | 3322 | 16-06-2021 | Face-to-Face | 443 | Doctor | Dr Mary Wilson | 1 |
4 | 3322 | 16-06-2021 | Telephone | 443 | Doctor | Dr Mary Wilson | 1 |
4 | 3322 | 16-06-2021 | Face-to-Face | 445 | Doctor | Dr John Snow | 1 |
4 | 5434 | 14-05-2019 | Telephone | 29 | Doctor | Dr John Taylor | 1 |
4 | 5434 | 14-05-2019 | Face-to-Face | 29 | Doctor | Dr John Taylor | 1 |
4 | 4488 | 17-03-2020 | Face-to-Face | 555 | Doctor | Dr James Smith | 1 |
4 | 4488 | 17-03-2020 | Telephone | 5556 | Doctor | Dr Robert Brown | 1 |
4 | 4488 | 17-03-2020 | Telephone | 12 | Support | Anthony Patel | 0 |
4 | 1250 | 03-02-2019 | Face-to-Face | 1133 | Support | Mary Wright | 1 |
4 | 1250 | 03-02-2019 | Face-to-Face | 113663 | Support | Mary TEST Wright | 0 |
Final dataset:
Example for one organisation_id
,patient_id
,date
and for each category of consultation_mode
and professional_role
.
organisation_id | patient_id | date | consultation_mode | professional_role | Nr_consultations_per_Pt_day |
---|---|---|---|---|---|
1 | 1230 | 08-02-2018 | Face-to-Face | Doctor | 0 |
1 | 1230 | 08-02-2018 | Face-to-Face | Nurse | 0 |
1 | 1230 | 08-02-2018 | Face-to-Face | Support | 1 |
1 | 1230 | 08-02-2018 | Telephone | Doctor | 1 |
1 | 1230 | 08-02-2018 | Telephone | Nurse | 0 |
1 | 1230 | 08-02-2018 | Telephone | Support | 0 |
1 | 1230 | 08-02-2018 | Home visit | Doctor | 0 |
1 | 1230 | 08-02-2018 | Home visit | Nurse | 0 |
1 | 1230 | 08-02-2018 | Home visit | Support | 0 |
etc.
Any ideas on how to do this in R in an efficient way?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果我正确理解您的描述,对于每一行,我们希望评估以下条件来决定是否
include? = 1
:此逻辑将创建“中间”表。为了创建“最终”表,我们会遍历 Advisory_mode 和 professional_role 的每个类别,如果存在包含
include? 的相应条目,则设置
Nr_consultations_per_Pt_day = 1
= 1 。基于上述期望,我将这样做:
转换为最终表格:
If I understand your description correctly, for each row we want to evaluate the following conditions to decide whether
include? = 1
:This logic will create the "intermediate" table. To create the "final" table, we go through each category of consultation_mode and professional_role and set
Nr_consultations_per_Pt_day = 1
if there's a corresponding entry withinclude? = 1
.Based on the above expectation, here's how I'd do it:
Convert into the final table: