R - 在给定时间段内超过行重复阈值

发布于 2025-01-16 13:48:19 字数 709 浏览 1 评论 0原文

我想知道哪些ID在给定时间段(例如:≤3年)内至少重复一定次数(例如:≥3)。 我以下表为例:

ID  Date
1   2001-01-03
2   2001-02-28
3   2001-06-13
4   2002-04-05
5   2002-09-12
1   2002-12-12
3   2003-05-05
3   2003-05-06
4   2003-05-07
1   2003-06-04
2   2006-12-29
3   2007-04-05
1   2007-04-08
4   2007-09-12
1   2008-12-12
2   2009-01-23
3   2009-01-30
2   2009-04-05
1   2009-12-08
2   2010-01-04
2   2010-05-07
4   2012-01-02
5   2013-03-03
6   2014-01-01

我想获得以下结果:

ID  Rep
1   TRUE
2   TRUE
3   TRUE
4   FALSE
5   FALSE
6   FALSE

如果ID在不到3年的时间内重复至少3次,无论它重复了多少次以及何时重复,我想得到一个真实的结果。如果 ID 重复次数少于 3 次,或超过 3 次但在 3 年内从未出现过,我希望得到 FALSE 结果。

我想这对你们许多人来说可能是一个相当简单的问题。不过,我将非常感谢您的帮助。

I would like to know which IDs are repeated at least a certain amount of times (eg: ≥3) in a given period of time (eg: ≤3 years).
I have the following table as an example:

ID  Date
1   2001-01-03
2   2001-02-28
3   2001-06-13
4   2002-04-05
5   2002-09-12
1   2002-12-12
3   2003-05-05
3   2003-05-06
4   2003-05-07
1   2003-06-04
2   2006-12-29
3   2007-04-05
1   2007-04-08
4   2007-09-12
1   2008-12-12
2   2009-01-23
3   2009-01-30
2   2009-04-05
1   2009-12-08
2   2010-01-04
2   2010-05-07
4   2012-01-02
5   2013-03-03
6   2014-01-01

I would like to obtain the following result:

ID  Rep
1   TRUE
2   TRUE
3   TRUE
4   FALSE
5   FALSE
6   FALSE

If the ID is repeated at least 3 times in less than 3 years, no matter how many times it did so and when it did so, I want to get a TRUE result. If the ID is repeated less than 3 times, or more than 3 times but never in a period of less than 3 years, I would like to get a FALSE result.

I imagine this might be a fairly simple question for many of you. However, I will highly appreciate your help.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

瞳孔里扚悲伤 2025-01-23 13:48:20

您可以将 data.tabledplyrdiff() 结合使用;按ID 计算小于years(3) * 365.25 的差异数量。如果满足或超过 num,则返回 TRUE

yrs <- num <- 3

library(data.table)
setDT(data)[order(ID,Date)][,.("Rep" = sum(diff(Date)<(yrs*365.25))>=num),by="ID"]

# OR

library(dplyr)
data %>% 
  arrange(Date) %>% 
  group_by(ID) %>% 
  summarize(Rep = sum(diff(Date)<(yrs*365.25))>=num)

      ID    Rep
   <num> <lgcl>
1:     1   TRUE
2:     2   TRUE
3:     3   TRUE
4:     4  FALSE
5:     5  FALSE
6:     6  FALSE

You can use data.table or dplyr combined with diff(); count the number of differences that are less than years(3) * 365.25, by ID. If this meets or exceeds num, return TRUE

yrs <- num <- 3

library(data.table)
setDT(data)[order(ID,Date)][,.("Rep" = sum(diff(Date)<(yrs*365.25))>=num),by="ID"]

# OR

library(dplyr)
data %>% 
  arrange(Date) %>% 
  group_by(ID) %>% 
  summarize(Rep = sum(diff(Date)<(yrs*365.25))>=num)

      ID    Rep
   <num> <lgcl>
1:     1   TRUE
2:     2   TRUE
3:     3   TRUE
4:     4  FALSE
5:     5  FALSE
6:     6  FALSE
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文