当满足条件时，重置组的暨

发布于 2025-01-17 22:59:27 字数 2102 浏览 1 评论 0原文

第一次在这里发帖，如有错误请见谅。

我有一个当前看起来像这样的数据表：

dt_achieved<-data.table(last_nm=c("raus","stroper", "degroat","degroat","degroat","degroat","degroat","degroat","piya","mazzy","mazzy","mazzy"),intake_date=c("2021-03-04","2021-06-18","2021-04-14","2021-06-10","2021-07-08","2021-08-09","2021-11-09","2021-12-08","2021-09-16","2021-04-15","2021-08-02","2021-08-09"))

dt_achieved$intake_date<-as.Date(dt_achieved$intake_date)

我希望它看起来像这样，其中数据按last_nm分组，然后，如果一行的摄入日期在该组的初始日期之后至少90天，则将其标记为/其组内计数滚动增加 1。

dt_ideal<-data.table(last_nm=c("raus","stroper", "degroat","degroat","degroat","degroat","degroat","degroat","piya",
           "mazzy","mazzy","mazzy"),intake_date=c("2021-03-04","2021-06-18","2021-04-14","2021-06-10","2021-07-08","2021-08-09","2021-11-09","2021-12-08","2021-09-16","2021-04-15","2021-08-02","2021-08-09"),intake_round=c(1,1,1,1,1,2,3,3,1,1,2,2), Running_intake=c(1,1,1,2,3,4,5,6,1,1,2,3))

dt_ideal$intake_date<-as.Date(dt_ideal$intake_date)

我已经了解到这里了：

# order by last name and intake date (the real data is randomized)
setkeyv(dt_achieved, c("last_nm", "intake_date"))

dt_achieved[, intake_round := cumsum(c(TRUE, diff(as.Date(intake_date)) >= 90)), 
    .(last_nm)][, Running_intake := as.numeric(seq_len(.N)), .(last_nm)]

问题是，intake_round 列当前反映了与上述日期的差异，而不是查看该组的初始 Intake_date 来查看是否至少晚于 90 天。 ...我只是无法弄清楚让 cumsum(c(TRUE, ...) 函数在组内滚动执行此操作的脚本。

已咨询了许多问题，包括：

仅当 r 中满足条件时，数据帧内的增量计数器< /a>

请，任何建议将非常非常感谢！

First time poster here, so apologies if I get something wrong.

I have a data table that looks like this currently:

dt_achieved<-data.table(last_nm=c("raus","stroper", "degroat","degroat","degroat","degroat","degroat","degroat","piya","mazzy","mazzy","mazzy"),intake_date=c("2021-03-04","2021-06-18","2021-04-14","2021-06-10","2021-07-08","2021-08-09","2021-11-09","2021-12-08","2021-09-16","2021-04-15","2021-08-02","2021-08-09"))

dt_achieved$intake_date<-as.Date(dt_achieved$intake_date)

I would like it to look like this, where the data are grouped by last_nm and then, if a row's intake date is at least 90 days after the group's initial one, it's flagged/its count increases by one on a rolling basis, within group.

dt_ideal<-data.table(last_nm=c("raus","stroper", "degroat","degroat","degroat","degroat","degroat","degroat","piya",
           "mazzy","mazzy","mazzy"),intake_date=c("2021-03-04","2021-06-18","2021-04-14","2021-06-10","2021-07-08","2021-08-09","2021-11-09","2021-12-08","2021-09-16","2021-04-15","2021-08-02","2021-08-09"),intake_round=c(1,1,1,1,1,2,3,3,1,1,2,2), Running_intake=c(1,1,1,2,3,4,5,6,1,1,2,3))

dt_ideal$intake_date<-as.Date(dt_ideal$intake_date)

I have gotten this far:

# order by last name and intake date (the real data is randomized)
setkeyv(dt_achieved, c("last_nm", "intake_date"))

dt_achieved[, intake_round := cumsum(c(TRUE, diff(as.Date(intake_date)) >= 90)), 
    .(last_nm)][, Running_intake := as.numeric(seq_len(.N)), .(last_nm)]

The issue is that the intake_round column currently reflects the difference from the date above rather than looking to the group's initial intake_date to see if it's at least 90 days later. ...I just can't figure out the script to get the cumsum(c(TRUE, ...) function to do this on a rolling basis, within group.

Have consulted a number of questions including:

incremental counter within dataframe only when a condition is met in r

Resetting the cumulative sum when a condition is met in R

Split into groups based on (multiple) conditions?

Cumulative sum that resets when the condition is no longer met

Please, any suggestions would be greatly, greatly appreciated!

分享到QQ

分享到微博