给定一系列日期和出生日,是否有一种方法可以在每个日期进入年龄以及使用橄榄油软件包的最后一个年龄?

发布于 2025-01-18 01:34:57 字数 2344 浏览 2 评论 0 原文

我有一个与随着时间的时间观察到的个人有关的信息数据库。每当拍摄记录时,我想找到一种方法来获得这些人的年龄。假设出生的值为0,我想在几天或几个月内获得访问的年龄。为每个人获得最后一个年龄(**未包含在代码中)也将很有帮助。例如,对于ID(a),最后一个年龄将为10个月。我想使用润滑脂功能,因为它的内日期功能使使用日期更容易。对此的任何帮助将不胜感激。

date<-c("2000-01-01","2000-01-14","2000-01-25","2000-02-12","2000-02-27","2000-06-05","2000-10-30",
        "2001-02-04","2001-06-15","2001-12-26","2002-05-22","2002-06-04",
        "2000-01-08","2000-07-11","2000-08-18","2000-11-27")
ID<-c("A","A","A","A","A","A","A",
      "B","B","B","B","B",
      "C","C","C","C")
status<-c("BIRTH","ETC","ETC","ETC","ETC","ETC","ETC",
          "BIRTH","ETC","ETC","ETC","ETC",
          "BIRTH","ETC","ETC","ETC")

df1<-data.frame(date,ID,status)
print(df1)
         date ID status
1  2000-01-01  A  BIRTH
2  2000-01-14  A    ETC
3  2000-01-25  A    ETC
4  2000-02-12  A    ETC
5  2000-02-27  A    ETC
6  2000-06-05  A    ETC
7  2000-10-30  A    ETC
8  2001-02-04  B  BIRTH
9  2001-06-15  B    ETC
10 2001-12-26  B    ETC
11 2002-05-22  B    ETC
12 2002-06-04  B    ETC
13 2000-01-08  C  BIRTH
14 2000-07-11  C    ETC
15 2000-08-18  C    ETC
16 2000-11-27  C    ETC

date.new<-c("2000-01-01","2000-01-14","2000-01-25","2000-02-12","2000-02-27","2000-06-05","2000-10-30",
        "2001-02-04","2001-06-15","2001-12-26","2002-05-22","2001-02-04",
        "2000-01-08","2000-07-11","2000-08-18","2000-11-27")
ID.new<-c("A","A","A","A","A","A","A",
      "B","B","B","B","B",
      "C","C","C","C")
status.new<-c("BIRTH","ETC","ETC","ETC","ETC","ETC","ETC",
          "BIRTH","ETC","ETC","ETC","ETC",
          "BIRTH","ETC","ETC","ETC")

age<-c(0,1,1,2,2,6,10,
       0,4,10,15,16,
       0,6,7,10)

df2<-data.frame(date.new,ID.new,status.new,age)

print(df2)
     date.new ID.new status.new age
1  2000-01-01      A      BIRTH   0
2  2000-01-14      A        ETC   1
3  2000-01-25      A        ETC   1
4  2000-02-12      A        ETC   2
5  2000-02-27      A        ETC   2
6  2000-06-05      A        ETC   6
7  2000-10-30      A        ETC  10
8  2001-02-04      B      BIRTH   0
9  2001-06-15      B        ETC   4
10 2001-12-26      B        ETC  10
11 2002-05-22      B        ETC  15
12 2001-02-04      B        ETC  16
13 2000-01-08      C      BIRTH   0
14 2000-07-11      C        ETC   6
15 2000-08-18      C        ETC   7
16 2000-11-27      C        ETC  10

I have a database of information pertaining to individuals observed over time. I would like to find a way to obtain the age of these individuals whenever a record was taken. Assuming the BIRTH assigns a value of 0, I would like to obtain the age either in days or months for the visits after. It would also be helpful to obtain a final age (either day or month) for each individual (*not included in the code). For example, for ID (A), the final age would be 10 months. I would like to use the lubridate function as it's in-built date feature makes it easier to work with dates. Any help with this is much appreciated.

date<-c("2000-01-01","2000-01-14","2000-01-25","2000-02-12","2000-02-27","2000-06-05","2000-10-30",
        "2001-02-04","2001-06-15","2001-12-26","2002-05-22","2002-06-04",
        "2000-01-08","2000-07-11","2000-08-18","2000-11-27")
ID<-c("A","A","A","A","A","A","A",
      "B","B","B","B","B",
      "C","C","C","C")
status<-c("BIRTH","ETC","ETC","ETC","ETC","ETC","ETC",
          "BIRTH","ETC","ETC","ETC","ETC",
          "BIRTH","ETC","ETC","ETC")

df1<-data.frame(date,ID,status)
print(df1)
         date ID status
1  2000-01-01  A  BIRTH
2  2000-01-14  A    ETC
3  2000-01-25  A    ETC
4  2000-02-12  A    ETC
5  2000-02-27  A    ETC
6  2000-06-05  A    ETC
7  2000-10-30  A    ETC
8  2001-02-04  B  BIRTH
9  2001-06-15  B    ETC
10 2001-12-26  B    ETC
11 2002-05-22  B    ETC
12 2002-06-04  B    ETC
13 2000-01-08  C  BIRTH
14 2000-07-11  C    ETC
15 2000-08-18  C    ETC
16 2000-11-27  C    ETC

date.new<-c("2000-01-01","2000-01-14","2000-01-25","2000-02-12","2000-02-27","2000-06-05","2000-10-30",
        "2001-02-04","2001-06-15","2001-12-26","2002-05-22","2001-02-04",
        "2000-01-08","2000-07-11","2000-08-18","2000-11-27")
ID.new<-c("A","A","A","A","A","A","A",
      "B","B","B","B","B",
      "C","C","C","C")
status.new<-c("BIRTH","ETC","ETC","ETC","ETC","ETC","ETC",
          "BIRTH","ETC","ETC","ETC","ETC",
          "BIRTH","ETC","ETC","ETC")

age<-c(0,1,1,2,2,6,10,
       0,4,10,15,16,
       0,6,7,10)

df2<-data.frame(date.new,ID.new,status.new,age)

print(df2)
     date.new ID.new status.new age
1  2000-01-01      A      BIRTH   0
2  2000-01-14      A        ETC   1
3  2000-01-25      A        ETC   1
4  2000-02-12      A        ETC   2
5  2000-02-27      A        ETC   2
6  2000-06-05      A        ETC   6
7  2000-10-30      A        ETC  10
8  2001-02-04      B      BIRTH   0
9  2001-06-15      B        ETC   4
10 2001-12-26      B        ETC  10
11 2002-05-22      B        ETC  15
12 2001-02-04      B        ETC  16
13 2000-01-08      C      BIRTH   0
14 2000-07-11      C        ETC   6
15 2000-08-18      C        ETC   7
16 2000-11-27      C        ETC  10

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

梦里的微风 2025-01-25 01:34:57

对于年龄或几个月中的年龄相关的计算,我想鼓励您尝试时钟套件而不是润滑。卢比特是一个很棒的软件包,但是如果您不确定自己在做什么,则会通过此类计算产生一些意外的结果。在时钟中,执行此操作的功能是 date_count_between()。请注意,在这里时钟和润滑剂之间的结果之一是不同的:

library(clock)
library(lubridate, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)

df <- tibble(
  date = c("2000-01-01","2000-01-14",
           "2000-01-25","2000-02-12","2000-02-27","2000-06-05",
           "2000-10-30","2001-02-04","2001-06-15","2001-12-26",
           "2002-05-22","2002-06-04","2000-01-08","2000-07-11",
           "2000-08-18","2000-11-27"),
  ID = c("A","A","A","A","A","A",
         "A","B","B","B","B","B","C","C","C","C"),
  status = c("BIRTH","ETC","ETC","ETC",
             "ETC","ETC","ETC","BIRTH","ETC","ETC","ETC","ETC",
             "BIRTH","ETC","ETC","ETC")
)

df %>% 
  mutate(date = date_parse(date)) %>% 
  group_by(ID) %>% 
  mutate(birth_date = date[status == "BIRTH"]) %>% 
  ungroup() %>%
  mutate(
    age_clock = date_count_between(birth_date, date, "month"),
    age_lubridate = as.period(date - birth_date) %/% months(1))
#> # A tibble: 16 × 6
#>    date       ID    status birth_date age_clock age_lubridate
#>    <date>     <chr> <chr>  <date>         <int>         <dbl>
#>  1 2000-01-01 A     BIRTH  2000-01-01         0             0
#>  2 2000-01-14 A     ETC    2000-01-01         0             0
#>  3 2000-01-25 A     ETC    2000-01-01         0             0
#>  4 2000-02-12 A     ETC    2000-01-01         1             1
#>  5 2000-02-27 A     ETC    2000-01-01         1             1
#>  6 2000-06-05 A     ETC    2000-01-01         5             5
#>  7 2000-10-30 A     ETC    2000-01-01         9             9
#>  8 2001-02-04 B     BIRTH  2001-02-04         0             0
#>  9 2001-06-15 B     ETC    2001-02-04         4             4
#> 10 2001-12-26 B     ETC    2001-02-04        10            10
#> 11 2002-05-22 B     ETC    2001-02-04        15            15
#> 12 2002-06-04 B     ETC    2001-02-04        16            15
#> 13 2000-01-08 C     BIRTH  2000-01-08         0             0
#> 14 2000-07-11 C     ETC    2000-01-08         6             6
#> 15 2000-08-18 C     ETC    2000-01-08         7             7
#> 16 2000-11-27 C     ETC    2000-01-08        10            10

时钟说 2001-02-04 to 2002-06-04 是16个月,而润滑脂方法则是16个月这里只说这是15个月。这与以下事实有关,即润纤维计算使用平均月的长度,这并不总是准确地反映我们对几个月的看法。

考虑到这个简单的例子,我认为大多数人都会同意2月份出生的孩子被认为是“ 1个月和1天”。但是橄榄酸酯显示0个月!

library(clock)
library(lubridate, warn.conflicts = FALSE)

# "1 month and 1 day apart"
feb <- as.Date("2020-02-28")
mar <- as.Date("2020-03-29")

# As expected when thinking about age in months
date_count_between(feb, mar, "month")
#> [1] 1

# Not expected
as.period(mar - feb) %/% months(1)
#> [1] 0

secs_in_day <- 86400
secs_in_month <- as.numeric(months(1))
secs_in_month / secs_in_day
#> [1] 30.4375

# Less than 30.4375 days, so not 1 month
mar - feb
#> Time difference of 30 days

问题在于,卢比特使用计算中平均每个月的长度,即 30.4375 天。但是这两个日期之间只有30天,因此不整整一个月。

另一方面,时钟使用开始日期的日期组成部分来确定“全月”是否过去了。换句话说,由于我们已经通过了3月28日,所以时钟决定过去的1个月已经过去,这与我们对年龄的看法一致。

For calculations related to age in years or months, I'd like to encourage you to try the clock package rather than lubridate. lubridate is a great package, but produces some unexpected results with these kinds of calculations if you aren't 100% sure of what you are doing. In clock, the function to do this is date_count_between(). Notice that one of the results is different between clock and lubridate here:

library(clock)
library(lubridate, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)

df <- tibble(
  date = c("2000-01-01","2000-01-14",
           "2000-01-25","2000-02-12","2000-02-27","2000-06-05",
           "2000-10-30","2001-02-04","2001-06-15","2001-12-26",
           "2002-05-22","2002-06-04","2000-01-08","2000-07-11",
           "2000-08-18","2000-11-27"),
  ID = c("A","A","A","A","A","A",
         "A","B","B","B","B","B","C","C","C","C"),
  status = c("BIRTH","ETC","ETC","ETC",
             "ETC","ETC","ETC","BIRTH","ETC","ETC","ETC","ETC",
             "BIRTH","ETC","ETC","ETC")
)

df %>% 
  mutate(date = date_parse(date)) %>% 
  group_by(ID) %>% 
  mutate(birth_date = date[status == "BIRTH"]) %>% 
  ungroup() %>%
  mutate(
    age_clock = date_count_between(birth_date, date, "month"),
    age_lubridate = as.period(date - birth_date) %/% months(1))
#> # A tibble: 16 × 6
#>    date       ID    status birth_date age_clock age_lubridate
#>    <date>     <chr> <chr>  <date>         <int>         <dbl>
#>  1 2000-01-01 A     BIRTH  2000-01-01         0             0
#>  2 2000-01-14 A     ETC    2000-01-01         0             0
#>  3 2000-01-25 A     ETC    2000-01-01         0             0
#>  4 2000-02-12 A     ETC    2000-01-01         1             1
#>  5 2000-02-27 A     ETC    2000-01-01         1             1
#>  6 2000-06-05 A     ETC    2000-01-01         5             5
#>  7 2000-10-30 A     ETC    2000-01-01         9             9
#>  8 2001-02-04 B     BIRTH  2001-02-04         0             0
#>  9 2001-06-15 B     ETC    2001-02-04         4             4
#> 10 2001-12-26 B     ETC    2001-02-04        10            10
#> 11 2002-05-22 B     ETC    2001-02-04        15            15
#> 12 2002-06-04 B     ETC    2001-02-04        16            15
#> 13 2000-01-08 C     BIRTH  2000-01-08         0             0
#> 14 2000-07-11 C     ETC    2000-01-08         6             6
#> 15 2000-08-18 C     ETC    2000-01-08         7             7
#> 16 2000-11-27 C     ETC    2000-01-08        10            10

clock says that 2001-02-04 to 2002-06-04 is 16 months, while the lubridate method here only says it is 15 months. This has to do with the fact that the lubridate calculation uses the length of an average month, which doesn't always accurately reflect how we think about months.

Consider this simple example, I think most people would agree that a child born on this date in February is considered "1 month and 1 day" old. But lubridate shows 0 months!

library(clock)
library(lubridate, warn.conflicts = FALSE)

# "1 month and 1 day apart"
feb <- as.Date("2020-02-28")
mar <- as.Date("2020-03-29")

# As expected when thinking about age in months
date_count_between(feb, mar, "month")
#> [1] 1

# Not expected
as.period(mar - feb) %/% months(1)
#> [1] 0

secs_in_day <- 86400
secs_in_month <- as.numeric(months(1))
secs_in_month / secs_in_day
#> [1] 30.4375

# Less than 30.4375 days, so not 1 month
mar - feb
#> Time difference of 30 days

The issue is that lubridate uses the length of an average month in the computation, which is 30.4375 days. But there are only 30 days between these two dates, so it isn't considered a full month.

clock, on the other hand, uses the day component of the starting date to determine if a "full month" has passed or not. In other words, because we have passed the 28th of March, clock decides that 1 month has passed, which is consistent with how we generally think about age.

别低头,皇冠会掉 2025-01-25 01:34:57

使用 dplyr 和 lubridate,我们可以执行以下操作。我们首先将 date 列转换为日期。然后,我们按 ID 进行分组,找到出生日期,并通过一些 lubridate 魔法计算自该日期以来的月数(请参阅 如何使用 lubridate 包来计算月数两个日期向量之间其中向量之一具有 NA 值?)。

library(dplyr)
library(lubridate)

df1 %>% 
  mutate(date = as_date(date)) %>% 
  group_by(ID) %>% 
  mutate(birth_date = date[status == "BIRTH"],
         age = as.period(date - birth_date) %/% months(1)) %>% 
  ungroup()

其中给出:

   date       ID    status birth_date   age
   <date>     <fct> <fct>  <date>     <dbl>
 1 2000-01-01 A     BIRTH  2000-01-01     0
 2 2000-01-14 A     ETC    2000-01-01     0
 3 2000-01-25 A     ETC    2000-01-01     0
 4 2000-02-12 A     ETC    2000-01-01     1
 5 2000-02-27 A     ETC    2000-01-01     1
 6 2000-06-05 A     ETC    2000-01-01     5
 7 2000-10-30 A     ETC    2000-01-01     9
 8 2001-02-04 B     BIRTH  2001-02-04     0
 9 2001-06-15 B     ETC    2001-02-04     4
10 2001-12-26 B     ETC    2001-02-04    10
11 2002-05-22 B     ETC    2001-02-04    15
12 2002-06-04 B     ETC    2001-02-04    15
13 2000-01-08 C     BIRTH  2000-01-08     0
14 2000-07-11 C     ETC    2000-01-08     6
15 2000-08-18 C     ETC    2000-01-08     7
16 2000-11-27 C     ETC    2000-01-08    10

除了一些舍入差异之外,您的预期输出是多少。请参阅我对您的问题的评论。

Using dplyr and lubridate, we can do the following. We first turn the date column into a date. Then we group by ID, find the birth date and calculate the number of months since that date via some lubridate magic (see How do I use the lubridate package to calculate the number of months between two date vectors where one of the vectors has NA values?).

library(dplyr)
library(lubridate)

df1 %>% 
  mutate(date = as_date(date)) %>% 
  group_by(ID) %>% 
  mutate(birth_date = date[status == "BIRTH"],
         age = as.period(date - birth_date) %/% months(1)) %>% 
  ungroup()

Which gives:

   date       ID    status birth_date   age
   <date>     <fct> <fct>  <date>     <dbl>
 1 2000-01-01 A     BIRTH  2000-01-01     0
 2 2000-01-14 A     ETC    2000-01-01     0
 3 2000-01-25 A     ETC    2000-01-01     0
 4 2000-02-12 A     ETC    2000-01-01     1
 5 2000-02-27 A     ETC    2000-01-01     1
 6 2000-06-05 A     ETC    2000-01-01     5
 7 2000-10-30 A     ETC    2000-01-01     9
 8 2001-02-04 B     BIRTH  2001-02-04     0
 9 2001-06-15 B     ETC    2001-02-04     4
10 2001-12-26 B     ETC    2001-02-04    10
11 2002-05-22 B     ETC    2001-02-04    15
12 2002-06-04 B     ETC    2001-02-04    15
13 2000-01-08 C     BIRTH  2000-01-08     0
14 2000-07-11 C     ETC    2000-01-08     6
15 2000-08-18 C     ETC    2000-01-08     7
16 2000-11-27 C     ETC    2000-01-08    10

Which is your expected output except for some rounding differences. See my comment on your question.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文