当满足条件时,重置组的暨

发布于 2025-01-17 22:59:27 字数 2102 浏览 1 评论 0原文

第一次在这里发帖,如有错误请见谅。

我有一个当前看起来像这样的数据表:

dt_achieved<-data.table(last_nm=c("raus","stroper", "degroat","degroat","degroat","degroat","degroat","degroat","piya","mazzy","mazzy","mazzy"),intake_date=c("2021-03-04","2021-06-18","2021-04-14","2021-06-10","2021-07-08","2021-08-09","2021-11-09","2021-12-08","2021-09-16","2021-04-15","2021-08-02","2021-08-09"))

dt_achieved$intake_date<-as.Date(dt_achieved$intake_date)

我希望它看起来像这样,其中数据按last_nm分组,然后,如果一行的摄入日期在该组的初始日期之后至少90天,则将其标记为/其组内计数滚动增加 1。

dt_ideal<-data.table(last_nm=c("raus","stroper", "degroat","degroat","degroat","degroat","degroat","degroat","piya",
           "mazzy","mazzy","mazzy"),intake_date=c("2021-03-04","2021-06-18","2021-04-14","2021-06-10","2021-07-08","2021-08-09","2021-11-09","2021-12-08","2021-09-16","2021-04-15","2021-08-02","2021-08-09"),intake_round=c(1,1,1,1,1,2,3,3,1,1,2,2), Running_intake=c(1,1,1,2,3,4,5,6,1,1,2,3))

dt_ideal$intake_date<-as.Date(dt_ideal$intake_date)

我已经了解到这里了:

# order by last name and intake date (the real data is randomized)
setkeyv(dt_achieved, c("last_nm", "intake_date"))

dt_achieved[, intake_round := cumsum(c(TRUE, diff(as.Date(intake_date)) >= 90)), 
    .(last_nm)][, Running_intake := as.numeric(seq_len(.N)), .(last_nm)]

问题是,intake_round 列当前反映了与上述日期的差异,而不是查看该组的初始 Intake_date 来查看是否至少晚于 90 天。 ...我只是无法弄清楚让 cumsum(c(TRUE, ...) 函数在组内滚动执行此操作的脚本。

已咨询了许多问题,包括:

仅当 r 中满足条件时,数据帧内的增量计数器< /a>

在满足条件时重置累积总和R

根据(多个)分成组条件?

重置的累计总和当不再满足条件时

请,任何建议将非常非常感谢!

First time poster here, so apologies if I get something wrong.

I have a data table that looks like this currently:

dt_achieved<-data.table(last_nm=c("raus","stroper", "degroat","degroat","degroat","degroat","degroat","degroat","piya","mazzy","mazzy","mazzy"),intake_date=c("2021-03-04","2021-06-18","2021-04-14","2021-06-10","2021-07-08","2021-08-09","2021-11-09","2021-12-08","2021-09-16","2021-04-15","2021-08-02","2021-08-09"))

dt_achieved$intake_date<-as.Date(dt_achieved$intake_date)

I would like it to look like this, where the data are grouped by last_nm and then, if a row's intake date is at least 90 days after the group's initial one, it's flagged/its count increases by one on a rolling basis, within group.

dt_ideal<-data.table(last_nm=c("raus","stroper", "degroat","degroat","degroat","degroat","degroat","degroat","piya",
           "mazzy","mazzy","mazzy"),intake_date=c("2021-03-04","2021-06-18","2021-04-14","2021-06-10","2021-07-08","2021-08-09","2021-11-09","2021-12-08","2021-09-16","2021-04-15","2021-08-02","2021-08-09"),intake_round=c(1,1,1,1,1,2,3,3,1,1,2,2), Running_intake=c(1,1,1,2,3,4,5,6,1,1,2,3))

dt_ideal$intake_date<-as.Date(dt_ideal$intake_date)

I have gotten this far:

# order by last name and intake date (the real data is randomized)
setkeyv(dt_achieved, c("last_nm", "intake_date"))

dt_achieved[, intake_round := cumsum(c(TRUE, diff(as.Date(intake_date)) >= 90)), 
    .(last_nm)][, Running_intake := as.numeric(seq_len(.N)), .(last_nm)]

The issue is that the intake_round column currently reflects the difference from the date above rather than looking to the group's initial intake_date to see if it's at least 90 days later. ...I just can't figure out the script to get the cumsum(c(TRUE, ...) function to do this on a rolling basis, within group.

Have consulted a number of questions including:

incremental counter within dataframe only when a condition is met in r

Resetting the cumulative sum when a condition is met in R

Split into groups based on (multiple) conditions?

Cumulative sum that resets when the condition is no longer met

Please, any suggestions would be greatly, greatly appreciated!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文