小组按日期差异

发布于 2025-01-19 09:39:11 字数 1378 浏览 1 评论 0原文

我想按日期差异进行分组。

例如,如果设施 A 有 7 个病例,但前 5 个病例发生在最后 2 个病例的 14 天之前,我希望它们位于两个不同的组中(参见下面的示例)

位置地址start_datestart_date_diffGroup
Facility A123 main 2/7/2022101
设施 A主街2/11/20224123
设施 A123 主街2/11/202201
设施 Amain st2/11/202201
设施 A123123 main st 2/12/202211
设施 A123 main st3/12/2022282
设施 A123 main st3/ 17/202252
设施 B55福特 路3/16/202203
设施 B55 福特 路3/16/202203
设施 C1 阶梯大道3/16/202204
设施 C1 阶梯大道3/20/202244
设施 C1 阶梯大道3/22/202224

这是我的代码,所以到目前为止:

我被困在如何通过个体观察之间的日期差异进一步对它们进行分组。

I want to group within a group by date difference.

For example, if there are 7 cases in facility A, but the first 5 cases happened before 14 days of the last 2 cases, I want them to be in two different groups (see below example)

locationaddressstart_datestart_date_diffGroup
Facility A123 main st2/7/202201
Facility A123 main st2/11/202241
Facility A123 main st2/11/202201
Facility A123 main st2/11/202201
Facility A123 main st2/12/202211
Facility A123 main st3/12/2022282
Facility A123 main st3/17/202252
Facility B55 ford rd3/16/202203
Facility B55 ford rd3/16/202203
Facility C1 step ave3/16/202204
Facility C1 step ave3/20/202244
Facility C1 step ave3/22/202224

here is my code so far:

I am stuck on how to group them further by the date difference between individual observations.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

离鸿 2025-01-26 09:39:11

假设我们还没有计算 diff,并且我们需要将 start_date 转换为算术上有用的东西。

data.table

library(data.table)
as.data.table(dat)[, start_date := as.Date(start_date, format = "%m/%d/%Y")
  ][, diff14 := cumsum(c(0, diff(start_date)) > 14), by = location
  ][, Group2 := rleid(location, diff14)][]
#       location     address start_date start_date_diff Group diff14 Group2
#         <char>      <char>     <Date>           <int> <int>  <int>  <int>
#  1: Facility A 123 main st 2022-02-07               0     1      0      1
#  2: Facility A 123 main st 2022-02-11               4     1      0      1
#  3: Facility A 123 main st 2022-02-11               0     1      0      1
#  4: Facility A 123 main st 2022-02-11               0     1      0      1
#  5: Facility A 123 main st 2022-02-12               1     1      0      1
#  6: Facility A 123 main st 2022-03-12              28     2      1      2
#  7: Facility A 123 main st 2022-03-17               5     2      1      2
#  8: Facility B  55 ford rd 2022-03-16               0     3      0      3
#  9: Facility B  55 ford rd 2022-03-16               0     3      0      3
# 10: Facility C  1 step ave 2022-03-16               0     4      0      4
# 11: Facility C  1 step ave 2022-03-20               4     4      0      4
# 12: Facility C  1 step ave 2022-03-22               2     4      0      4

dplyr

library(dplyr)
dat %>%
  mutate(start_date = as.Date(start_date, format = "%m/%d/%Y")) %>%
  group_by(location) %>%
  mutate(diff14 = cumsum(c(0, diff(start_date)) > 14)) %>%
  group_by(location, diff14) %>%
  mutate(Group2 = cur_group_id()) %>%
  ungroup()
# # A tibble: 12 x 7
#    location   address     start_date start_date_diff Group diff14 Group2
#    <chr>      <chr>       <date>               <int> <int>  <int>  <int>
#  1 Facility A 123 main st 2022-02-07               0     1      0      1
#  2 Facility A 123 main st 2022-02-11               4     1      0      1
#  3 Facility A 123 main st 2022-02-11               0     1      0      1
#  4 Facility A 123 main st 2022-02-11               0     1      0      1
#  5 Facility A 123 main st 2022-02-12               1     1      0      1
#  6 Facility A 123 main st 2022-03-12              28     2      1      2
#  7 Facility A 123 main st 2022-03-17               5     2      1      2
#  8 Facility B 55 ford rd  2022-03-16               0     3      0      3
#  9 Facility B 55 ford rd  2022-03-16               0     3      0      3
# 10 Facility C 1 step ave  2022-03-16               0     4      0      4
# 11 Facility C 1 step ave  2022-03-20               4     4      0      4
# 12 Facility C 1 step ave  2022-03-22               2     4      0      4

数据

read.md <- structure(list(location = c("Facility A", "Facility A", "Facility A", "Facility A", "Facility A", "Facility A", "Facility A", "Facility B", "Facility B", "Facility C", "Facility C", "Facility C"), address = c("123 main st", "123 main st", "123 main st", "123 main st", "123 main st", "123 main st", "123 main st", "55 ford rd", "55 ford rd", "1 step ave", "1 step ave", "1 step ave"), start_date = c("2/7/2022", "2/11/2022", "2/11/2022", "2/11/2022", "2/12/2022", "3/12/2022", "3/17/2022", "3/16/2022",  "3/16/2022", "3/16/2022", "3/20/2022", "3/22/2022"), start_date_diff = c(0L, 4L, 0L, 0L, 1L, 28L, 5L, 0L, 0L, 0L, 4L, 2L), Group = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 4L)), class = "data.frame", row.names = c(NA, -12L))

Assuming we don't already diff calculated, and that we need to convert start_date into something arithmetically useful.

data.table

library(data.table)
as.data.table(dat)[, start_date := as.Date(start_date, format = "%m/%d/%Y")
  ][, diff14 := cumsum(c(0, diff(start_date)) > 14), by = location
  ][, Group2 := rleid(location, diff14)][]
#       location     address start_date start_date_diff Group diff14 Group2
#         <char>      <char>     <Date>           <int> <int>  <int>  <int>
#  1: Facility A 123 main st 2022-02-07               0     1      0      1
#  2: Facility A 123 main st 2022-02-11               4     1      0      1
#  3: Facility A 123 main st 2022-02-11               0     1      0      1
#  4: Facility A 123 main st 2022-02-11               0     1      0      1
#  5: Facility A 123 main st 2022-02-12               1     1      0      1
#  6: Facility A 123 main st 2022-03-12              28     2      1      2
#  7: Facility A 123 main st 2022-03-17               5     2      1      2
#  8: Facility B  55 ford rd 2022-03-16               0     3      0      3
#  9: Facility B  55 ford rd 2022-03-16               0     3      0      3
# 10: Facility C  1 step ave 2022-03-16               0     4      0      4
# 11: Facility C  1 step ave 2022-03-20               4     4      0      4
# 12: Facility C  1 step ave 2022-03-22               2     4      0      4

dplyr

library(dplyr)
dat %>%
  mutate(start_date = as.Date(start_date, format = "%m/%d/%Y")) %>%
  group_by(location) %>%
  mutate(diff14 = cumsum(c(0, diff(start_date)) > 14)) %>%
  group_by(location, diff14) %>%
  mutate(Group2 = cur_group_id()) %>%
  ungroup()
# # A tibble: 12 x 7
#    location   address     start_date start_date_diff Group diff14 Group2
#    <chr>      <chr>       <date>               <int> <int>  <int>  <int>
#  1 Facility A 123 main st 2022-02-07               0     1      0      1
#  2 Facility A 123 main st 2022-02-11               4     1      0      1
#  3 Facility A 123 main st 2022-02-11               0     1      0      1
#  4 Facility A 123 main st 2022-02-11               0     1      0      1
#  5 Facility A 123 main st 2022-02-12               1     1      0      1
#  6 Facility A 123 main st 2022-03-12              28     2      1      2
#  7 Facility A 123 main st 2022-03-17               5     2      1      2
#  8 Facility B 55 ford rd  2022-03-16               0     3      0      3
#  9 Facility B 55 ford rd  2022-03-16               0     3      0      3
# 10 Facility C 1 step ave  2022-03-16               0     4      0      4
# 11 Facility C 1 step ave  2022-03-20               4     4      0      4
# 12 Facility C 1 step ave  2022-03-22               2     4      0      4

Data

read.md <- structure(list(location = c("Facility A", "Facility A", "Facility A", "Facility A", "Facility A", "Facility A", "Facility A", "Facility B", "Facility B", "Facility C", "Facility C", "Facility C"), address = c("123 main st", "123 main st", "123 main st", "123 main st", "123 main st", "123 main st", "123 main st", "55 ford rd", "55 ford rd", "1 step ave", "1 step ave", "1 step ave"), start_date = c("2/7/2022", "2/11/2022", "2/11/2022", "2/11/2022", "2/12/2022", "3/12/2022", "3/17/2022", "3/16/2022",  "3/16/2022", "3/16/2022", "3/20/2022", "3/22/2022"), start_date_diff = c(0L, 4L, 0L, 0L, 1L, 28L, 5L, 0L, 0L, 0L, 4L, 2L), Group = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 4L)), class = "data.frame", row.names = c(NA, -12L))
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文