当日期重叠时，将药物组合在一起

发布于 2025-02-04 05:33:04 字数 1562 浏览 2 评论 0原文

解释这有点棘手，所以请忍受我，如果我没有意义，请问问题。

这是我的数据

mydata <- data.frame(id = c(1,1,1,1,1,1,1,1),
                    drug = c("let", "per", "pac", "tra","chem", "tem", "cap", "nem"),
                    type = c("type1", "type2", "type1","type1","type1", "type2", "type1", "type2"), 
                    startdate = c("2016-05-12","2016-05-30","2016-05-31","2016-05-31",  "2018-01-18","2018-04-01", "2020-11-05","2020-11-04"),   
                    enddate =c("2016-05-12", "2018-04-05","2017-11-08", "2018-04-05", "2018-01-18", "2020-11-06", "2021-08-18", "2021-08-11"))

，我的目标是将日期彼此重叠的药物分组。但是，即使在两种药物之间存在重叠的日期，但是药物的类型切换到Type2，我还是希望这触发另一行的起点和结束日期。

我能够使用以下代码来实现分组日期相互重叠，

mydata <-  mydata %>%
  arrange(id, startdate,drug) %>% 
  group_by(id) %>%
  mutate(indx = c(0, cumsum(as.numeric(lead(startdate)) >
                              cummax(as.numeric(enddate)))[-n()])) %>%
  group_by(id, indx) %>%
  mutate(drugs = paste0(drug, collapse = ", "))%>%
  summarise(startDate = min(startdate), endDate = max(enddate), drugs=drugs) %>% distinct()

但是您可以在毒品“ let”之后看到；所有其他行分组在一起。相反，我想要“ TEM”和“ NEM”的新行，因为它们是2型药物。

输出！


mydata1 <- data.frame(id = c(1,1,1,1),
                     drugs = c("let", "per,pac,tra,chem", "tem", "cap, nem"),
                    startdate = c("2016-05-12","2016-05-30","2018-04-01","2020-11-04"),   
                    enddate =c("2016-05-12","2018-01-18", "2020-11-06","2021-08-11"))

这是我希望获得任何帮助的

原文

This is a bit tricky to explain so please bear with me and ask questions if I am not making sense.

Here is my data

mydata <- data.frame(id = c(1,1,1,1,1,1,1,1),
                    drug = c("let", "per", "pac", "tra","chem", "tem", "cap", "nem"),
                    type = c("type1", "type2", "type1","type1","type1", "type2", "type1", "type2"), 
                    startdate = c("2016-05-12","2016-05-30","2016-05-31","2016-05-31",  "2018-01-18","2018-04-01", "2020-11-05","2020-11-04"),   
                    enddate =c("2016-05-12", "2018-04-05","2017-11-08", "2018-04-05", "2018-01-18", "2020-11-06", "2021-08-18", "2021-08-11"))

My goal is to group the drugs whose dates overlap with each other. But even if there is an overlap with dates between two drugs, but the type of drug switches to type2, I want that to trigger another row with its own start and end dates.

I was able to achieve grouping dates overlapping with each other using the following code

mydata <-  mydata %>%
  arrange(id, startdate,drug) %>% 
  group_by(id) %>%
  mutate(indx = c(0, cumsum(as.numeric(lead(startdate)) >
                              cummax(as.numeric(enddate)))[-n()])) %>%
  group_by(id, indx) %>%
  mutate(drugs = paste0(drug, collapse = ", "))%>%
  summarise(startDate = min(startdate), endDate = max(enddate), drugs=drugs) %>% distinct()

But as you can see after drug "let"; all other rows get grouped together. Where instead I want a new row for "tem" and "nem" as they are type 2 drugs.

This is the output I am hoping to get


mydata1 <- data.frame(id = c(1,1,1,1),
                     drugs = c("let", "per,pac,tra,chem", "tem", "cap, nem"),
                    startdate = c("2016-05-12","2016-05-30","2018-04-01","2020-11-04"),   
                    enddate =c("2016-05-12","2018-01-18", "2020-11-06","2021-08-11"))

Any help is appreciated!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

苍白女子 2025-02-11 05:33:04

我将数据框分为每种药物的单独数据框，然后使用您现有的代码。然后，我将两个新的数据范围放回一个数据范围中。

我还通过日期转换来获得NA，因此我使用橄榄酸酯转换了日期。


#change dates to the Date format
require(lubridate)

mydata$startdate <- as.Date(mydata$startdate)
mydata$enddate <- as.Date(mydata$enddate)


# create two seperate dataframes, one for each drug type
type1 <- mydata %>%
  filter(type == "type1")


type2 <- mydata %>%
  filter(type == "type2")


#use your code on both the dataframes

type1_grouped <-type1 %>%
  arrange(id, startdate,drug) %>% 
  group_by(id) %>%
  mutate(indx = c(0, cumsum(as.numeric(lead(startdate)) > cummax(as.numeric(enddate)))[-n()])) %>%
  group_by(id, indx) %>%
  mutate(drugs = paste0(drug, collapse = ", "))%>%
  summarise(startDate = min(startdate), endDate = max(enddate), drugs=drugs) %>%
  distinct()

type2_grouped <- type2 %>%
  arrange(id, startdate,drug) %>% 
  group_by(id) %>%
  mutate(indx = c(0, cumsum(as.numeric(lead(startdate)) > cummax(as.numeric(enddate)))[-n()])) %>%
  group_by(id, indx) %>%
  mutate(drugs = paste0(drug, collapse = ", "))%>%
  summarise(startDate = min(startdate), endDate = max(enddate), drugs=drugs) %>%
  distinct()
 
 
 

 
# put the two dataframes back together
mydata2 <- rbind(type1_grouped,type2_grouped)
 

# Change the format to match mydata1
mydata2 %>% relocate(drugs, .before=startDate) %>% ungroup() %>% select(-indx)

I split the dataframe into separate dataframes for each drug and then used your existing code. Then I put the two new dataframes back together into one dataframe.

I was also getting NA by conversion from the dates, so I converted the dates using lubridate.


#change dates to the Date format
require(lubridate)

mydata$startdate <- as.Date(mydata$startdate)
mydata$enddate <- as.Date(mydata$enddate)


# create two seperate dataframes, one for each drug type
type1 <- mydata %>%
  filter(type == "type1")


type2 <- mydata %>%
  filter(type == "type2")


#use your code on both the dataframes

type1_grouped <-type1 %>%
  arrange(id, startdate,drug) %>% 
  group_by(id) %>%
  mutate(indx = c(0, cumsum(as.numeric(lead(startdate)) > cummax(as.numeric(enddate)))[-n()])) %>%
  group_by(id, indx) %>%
  mutate(drugs = paste0(drug, collapse = ", "))%>%
  summarise(startDate = min(startdate), endDate = max(enddate), drugs=drugs) %>%
  distinct()

type2_grouped <- type2 %>%
  arrange(id, startdate,drug) %>% 
  group_by(id) %>%
  mutate(indx = c(0, cumsum(as.numeric(lead(startdate)) > cummax(as.numeric(enddate)))[-n()])) %>%
  group_by(id, indx) %>%
  mutate(drugs = paste0(drug, collapse = ", "))%>%
  summarise(startDate = min(startdate), endDate = max(enddate), drugs=drugs) %>%
  distinct()
 
 
 

 
# put the two dataframes back together
mydata2 <- rbind(type1_grouped,type2_grouped)
 

# Change the format to match mydata1
mydata2 %>% relocate(drugs, .before=startDate) %>% ungroup() %>% select(-indx)

回复收藏 0 原文

~没有更多了~