当满足条件时,重置组的暨
第一次在这里发帖,如有错误请见谅。
我有一个当前看起来像这样的数据表:
dt_achieved<-data.table(last_nm=c("raus","stroper", "degroat","degroat","degroat","degroat","degroat","degroat","piya","mazzy","mazzy","mazzy"),intake_date=c("2021-03-04","2021-06-18","2021-04-14","2021-06-10","2021-07-08","2021-08-09","2021-11-09","2021-12-08","2021-09-16","2021-04-15","2021-08-02","2021-08-09"))
dt_achieved$intake_date<-as.Date(dt_achieved$intake_date)
我希望它看起来像这样,其中数据按last_nm分组,然后,如果一行的摄入日期在该组的初始日期之后至少90天,则将其标记为/其组内计数滚动增加 1。
dt_ideal<-data.table(last_nm=c("raus","stroper", "degroat","degroat","degroat","degroat","degroat","degroat","piya",
"mazzy","mazzy","mazzy"),intake_date=c("2021-03-04","2021-06-18","2021-04-14","2021-06-10","2021-07-08","2021-08-09","2021-11-09","2021-12-08","2021-09-16","2021-04-15","2021-08-02","2021-08-09"),intake_round=c(1,1,1,1,1,2,3,3,1,1,2,2), Running_intake=c(1,1,1,2,3,4,5,6,1,1,2,3))
dt_ideal$intake_date<-as.Date(dt_ideal$intake_date)
我已经了解到这里了:
# order by last name and intake date (the real data is randomized)
setkeyv(dt_achieved, c("last_nm", "intake_date"))
dt_achieved[, intake_round := cumsum(c(TRUE, diff(as.Date(intake_date)) >= 90)),
.(last_nm)][, Running_intake := as.numeric(seq_len(.N)), .(last_nm)]
问题是,intake_round 列当前反映了与上述日期的差异,而不是查看该组的初始 Intake_date 来查看是否至少晚于 90 天。 ...我只是无法弄清楚让 cumsum(c(TRUE, ...) 函数在组内滚动执行此操作的脚本。
已咨询了许多问题,包括:
请,任何建议将非常非常感谢!
First time poster here, so apologies if I get something wrong.
I have a data table that looks like this currently:
dt_achieved<-data.table(last_nm=c("raus","stroper", "degroat","degroat","degroat","degroat","degroat","degroat","piya","mazzy","mazzy","mazzy"),intake_date=c("2021-03-04","2021-06-18","2021-04-14","2021-06-10","2021-07-08","2021-08-09","2021-11-09","2021-12-08","2021-09-16","2021-04-15","2021-08-02","2021-08-09"))
dt_achieved$intake_date<-as.Date(dt_achieved$intake_date)
I would like it to look like this, where the data are grouped by last_nm and then, if a row's intake date is at least 90 days after the group's initial one, it's flagged/its count increases by one on a rolling basis, within group.
dt_ideal<-data.table(last_nm=c("raus","stroper", "degroat","degroat","degroat","degroat","degroat","degroat","piya",
"mazzy","mazzy","mazzy"),intake_date=c("2021-03-04","2021-06-18","2021-04-14","2021-06-10","2021-07-08","2021-08-09","2021-11-09","2021-12-08","2021-09-16","2021-04-15","2021-08-02","2021-08-09"),intake_round=c(1,1,1,1,1,2,3,3,1,1,2,2), Running_intake=c(1,1,1,2,3,4,5,6,1,1,2,3))
dt_ideal$intake_date<-as.Date(dt_ideal$intake_date)
I have gotten this far:
# order by last name and intake date (the real data is randomized)
setkeyv(dt_achieved, c("last_nm", "intake_date"))
dt_achieved[, intake_round := cumsum(c(TRUE, diff(as.Date(intake_date)) >= 90)),
.(last_nm)][, Running_intake := as.numeric(seq_len(.N)), .(last_nm)]
The issue is that the intake_round column currently reflects the difference from the date above rather than looking to the group's initial intake_date to see if it's at least 90 days later. ...I just can't figure out the script to get the cumsum(c(TRUE, ...) function to do this on a rolling basis, within group.
Have consulted a number of questions including:
incremental counter within dataframe only when a condition is met in r
Resetting the cumulative sum when a condition is met in R
Split into groups based on (multiple) conditions?
Cumulative sum that resets when the condition is no longer met
Please, any suggestions would be greatly, greatly appreciated!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论