R - 在给定时间段内超过行重复阈值
我想知道哪些ID在给定时间段(例如:≤3年)内至少重复一定次数(例如:≥3)。 我以下表为例:
ID Date
1 2001-01-03
2 2001-02-28
3 2001-06-13
4 2002-04-05
5 2002-09-12
1 2002-12-12
3 2003-05-05
3 2003-05-06
4 2003-05-07
1 2003-06-04
2 2006-12-29
3 2007-04-05
1 2007-04-08
4 2007-09-12
1 2008-12-12
2 2009-01-23
3 2009-01-30
2 2009-04-05
1 2009-12-08
2 2010-01-04
2 2010-05-07
4 2012-01-02
5 2013-03-03
6 2014-01-01
我想获得以下结果:
ID Rep
1 TRUE
2 TRUE
3 TRUE
4 FALSE
5 FALSE
6 FALSE
如果ID在不到3年的时间内重复至少3次,无论它重复了多少次以及何时重复,我想得到一个真实的结果。如果 ID 重复次数少于 3 次,或超过 3 次但在 3 年内从未出现过,我希望得到 FALSE 结果。
我想这对你们许多人来说可能是一个相当简单的问题。不过,我将非常感谢您的帮助。
I would like to know which IDs are repeated at least a certain amount of times (eg: ≥3) in a given period of time (eg: ≤3 years).
I have the following table as an example:
ID Date
1 2001-01-03
2 2001-02-28
3 2001-06-13
4 2002-04-05
5 2002-09-12
1 2002-12-12
3 2003-05-05
3 2003-05-06
4 2003-05-07
1 2003-06-04
2 2006-12-29
3 2007-04-05
1 2007-04-08
4 2007-09-12
1 2008-12-12
2 2009-01-23
3 2009-01-30
2 2009-04-05
1 2009-12-08
2 2010-01-04
2 2010-05-07
4 2012-01-02
5 2013-03-03
6 2014-01-01
I would like to obtain the following result:
ID Rep
1 TRUE
2 TRUE
3 TRUE
4 FALSE
5 FALSE
6 FALSE
If the ID is repeated at least 3 times in less than 3 years, no matter how many times it did so and when it did so, I want to get a TRUE result. If the ID is repeated less than 3 times, or more than 3 times but never in a period of less than 3 years, I would like to get a FALSE result.
I imagine this might be a fairly simple question for many of you. However, I will highly appreciate your help.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以将
data.table
或dplyr
与diff()
结合使用;按ID
计算小于years(3) * 365.25 的差异数量。如果满足或超过num
,则返回TRUE
You can use
data.table
ordplyr
combined withdiff()
; count the number of differences that are less than years(3) * 365.25, byID
. If this meets or exceedsnum
, returnTRUE