给定一系列日期和出生日,是否有一种方法可以在每个日期进入年龄以及使用橄榄油软件包的最后一个年龄?
我有一个与随着时间的时间观察到的个人有关的信息数据库。每当拍摄记录时,我想找到一种方法来获得这些人的年龄。假设出生的值为0,我想在几天或几个月内获得访问的年龄。为每个人获得最后一个年龄(**未包含在代码中)也将很有帮助。例如,对于ID(a),最后一个年龄将为10个月。我想使用润滑脂功能,因为它的内日期功能使使用日期更容易。对此的任何帮助将不胜感激。
date<-c("2000-01-01","2000-01-14","2000-01-25","2000-02-12","2000-02-27","2000-06-05","2000-10-30",
"2001-02-04","2001-06-15","2001-12-26","2002-05-22","2002-06-04",
"2000-01-08","2000-07-11","2000-08-18","2000-11-27")
ID<-c("A","A","A","A","A","A","A",
"B","B","B","B","B",
"C","C","C","C")
status<-c("BIRTH","ETC","ETC","ETC","ETC","ETC","ETC",
"BIRTH","ETC","ETC","ETC","ETC",
"BIRTH","ETC","ETC","ETC")
df1<-data.frame(date,ID,status)
print(df1)
date ID status
1 2000-01-01 A BIRTH
2 2000-01-14 A ETC
3 2000-01-25 A ETC
4 2000-02-12 A ETC
5 2000-02-27 A ETC
6 2000-06-05 A ETC
7 2000-10-30 A ETC
8 2001-02-04 B BIRTH
9 2001-06-15 B ETC
10 2001-12-26 B ETC
11 2002-05-22 B ETC
12 2002-06-04 B ETC
13 2000-01-08 C BIRTH
14 2000-07-11 C ETC
15 2000-08-18 C ETC
16 2000-11-27 C ETC
date.new<-c("2000-01-01","2000-01-14","2000-01-25","2000-02-12","2000-02-27","2000-06-05","2000-10-30",
"2001-02-04","2001-06-15","2001-12-26","2002-05-22","2001-02-04",
"2000-01-08","2000-07-11","2000-08-18","2000-11-27")
ID.new<-c("A","A","A","A","A","A","A",
"B","B","B","B","B",
"C","C","C","C")
status.new<-c("BIRTH","ETC","ETC","ETC","ETC","ETC","ETC",
"BIRTH","ETC","ETC","ETC","ETC",
"BIRTH","ETC","ETC","ETC")
age<-c(0,1,1,2,2,6,10,
0,4,10,15,16,
0,6,7,10)
df2<-data.frame(date.new,ID.new,status.new,age)
print(df2)
date.new ID.new status.new age
1 2000-01-01 A BIRTH 0
2 2000-01-14 A ETC 1
3 2000-01-25 A ETC 1
4 2000-02-12 A ETC 2
5 2000-02-27 A ETC 2
6 2000-06-05 A ETC 6
7 2000-10-30 A ETC 10
8 2001-02-04 B BIRTH 0
9 2001-06-15 B ETC 4
10 2001-12-26 B ETC 10
11 2002-05-22 B ETC 15
12 2001-02-04 B ETC 16
13 2000-01-08 C BIRTH 0
14 2000-07-11 C ETC 6
15 2000-08-18 C ETC 7
16 2000-11-27 C ETC 10
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
对于年龄或几个月中的年龄相关的计算,我想鼓励您尝试时钟套件而不是润滑。卢比特是一个很棒的软件包,但是如果您不确定自己在做什么,则会通过此类计算产生一些意外的结果。在时钟中,执行此操作的功能是
date_count_between()
。请注意,在这里时钟和润滑剂之间的结果之一是不同的:时钟说
2001-02-04
to2002-06-04
是16个月,而润滑脂方法则是16个月这里只说这是15个月。这与以下事实有关,即润纤维计算使用平均月的长度,这并不总是准确地反映我们对几个月的看法。考虑到这个简单的例子,我认为大多数人都会同意2月份出生的孩子被认为是“ 1个月和1天”。但是橄榄酸酯显示0个月!
问题在于,卢比特使用计算中平均每个月的长度,即
30.4375
天。但是这两个日期之间只有30天,因此不整整一个月。另一方面,时钟使用开始日期的日期组成部分来确定“全月”是否过去了。换句话说,由于我们已经通过了3月28日,所以时钟决定过去的1个月已经过去,这与我们对年龄的看法一致。
For calculations related to age in years or months, I'd like to encourage you to try the clock package rather than lubridate. lubridate is a great package, but produces some unexpected results with these kinds of calculations if you aren't 100% sure of what you are doing. In clock, the function to do this is
date_count_between()
. Notice that one of the results is different between clock and lubridate here:clock says that
2001-02-04
to2002-06-04
is 16 months, while the lubridate method here only says it is 15 months. This has to do with the fact that the lubridate calculation uses the length of an average month, which doesn't always accurately reflect how we think about months.Consider this simple example, I think most people would agree that a child born on this date in February is considered "1 month and 1 day" old. But lubridate shows 0 months!
The issue is that lubridate uses the length of an average month in the computation, which is
30.4375
days. But there are only 30 days between these two dates, so it isn't considered a full month.clock, on the other hand, uses the day component of the starting date to determine if a "full month" has passed or not. In other words, because we have passed the 28th of March, clock decides that 1 month has passed, which is consistent with how we generally think about age.
使用 dplyr 和 lubridate,我们可以执行以下操作。我们首先将
date
列转换为日期。然后,我们按ID
进行分组,找到出生日期,并通过一些lubridate
魔法计算自该日期以来的月数(请参阅 如何使用 lubridate 包来计算月数两个日期向量之间其中向量之一具有 NA 值?)。其中给出:
除了一些舍入差异之外,您的预期输出是多少。请参阅我对您的问题的评论。
Using
dplyr
andlubridate
, we can do the following. We first turn thedate
column into a date. Then we group byID
, find the birth date and calculate the number of months since that date via somelubridate
magic (see How do I use the lubridate package to calculate the number of months between two date vectors where one of the vectors has NA values?).Which gives:
Which is your expected output except for some rounding differences. See my comment on your question.