如何使用R中的跳过规则为最低和最高日期创建指标变量?

发布于 2025-01-24 11:23:51 字数 1716 浏览 1 评论 0原文

我想要一个指标变量,它告诉我日期是按ID组的最低还是最高。但是,我不希望它将任何具有练习的东西都计数。这就是数据框架现在的外观以及我想要的外观。

ID日期练习
102-27-20201
104-21-20210
106-24-20220
203-21-20190
209-19-20200
201-21-20210

结束产品:

ID日期练习最低03-21-2019
102-27-2020100
104-21-2021010
1 0 106-24-2022001
20010
209-19-9-2020000
201 -21-2021001

I want an indicator variable that tells me if the date is the lowest or highest when its group by ID. However, I don't want it to count anything that has practice as 1. This is what the data frame looks now and what I want it to look like.

IDdatepractice
102-27-20201
104-21-20210
106-24-20220
203-21-20190
209-19-20200
201-21-20210

End product:

IDdatepracticelowesthighest
102-27-2020100
104-21-2021010
106-24-2022001
203-21-2019010
209-19-2020000
201-21-2021001

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

腻橙味 2025-01-31 11:23:51

该代码使用tidyverse提供了所需的解决方案。请注意,我必须将日期强制为日期格式;默认情况下,mm-dd-yyyy格式被读取为字符,这会导致min(date) and max(date)读取不同的值,为min和max。

x<-'
ID  date    practice
1   02-27-2020  1
1   04-21-2021  0
1   06-24-2022  0
2   03-21-2019  0
2   09-19-2020  0
2   01-21-2021  0'

df1 <- read.table(textConnection(x), header = TRUE)
library(tidyverse)

df1$date <- as.Date(df1$date, format = "%m-%d-%Y")

desired_result <- df1 %>%
  group_by(ID) %>%
  mutate(
    lowest = ifelse(date == min(date[practice == 0]), 1, 0),
    highest = ifelse(date == max(date[practice == 0]), 1, 0)
  )

desired_result
# A tibble: 6 × 5
# Groups:   ID [2]
     ID date       practice lowest highest
  <int> <date>        <int>  <dbl>   <dbl>
1     1 2020-02-27        1      0       0
2     1 2021-04-21        0      1       0
3     1 2022-06-24        0      0       1
4     2 2019-03-21        0      1       0
5     2 2020-09-19        0      0       0
6     2 2021-01-21        0      0       1

This code delivered the desired solution using the tidyverse. Note that I had to force date into a date format; by default the mm-dd-YYYY format gets read as character, which causes min(date) and max(date) to read different values as min and max.

x<-'
ID  date    practice
1   02-27-2020  1
1   04-21-2021  0
1   06-24-2022  0
2   03-21-2019  0
2   09-19-2020  0
2   01-21-2021  0'

df1 <- read.table(textConnection(x), header = TRUE)
library(tidyverse)

df1$date <- as.Date(df1$date, format = "%m-%d-%Y")

desired_result <- df1 %>%
  group_by(ID) %>%
  mutate(
    lowest = ifelse(date == min(date[practice == 0]), 1, 0),
    highest = ifelse(date == max(date[practice == 0]), 1, 0)
  )

desired_result
# A tibble: 6 × 5
# Groups:   ID [2]
     ID date       practice lowest highest
  <int> <date>        <int>  <dbl>   <dbl>
1     1 2020-02-27        1      0       0
2     1 2021-04-21        0      1       0
3     1 2022-06-24        0      0       1
4     2 2019-03-21        0      1       0
5     2 2020-09-19        0      0       0
6     2 2021-01-21        0      0       1
海未深 2025-01-31 11:23:51

这是带有AVE的基本R解决方案。

x<-'
ID  date    practice
1   02-27-2020  1
1   04-21-2021  0
1   06-24-2022  0
2   03-21-2019  0
2   09-19-2020  0
2   01-21-2021  0'

df1 <- read.table(textConnection(x), header = TRUE)
df1$date <- as.Date(df1$date, "%m-%d-%Y")

y1 <- with(df1, ave(as.integer(date), ID, practice, FUN = \(x) {
  if(length(x))
    min(x) == x
  else NULL
}))
y2 <- with(df1, ave(as.integer(date), ID, practice, FUN = \(x) {
  if(length(x))
    max(x) == x
  else NULL
}))

df1$lowest <- as.integer(y1 & (df1$practice != 1))
df1$highest <- as.integer(y2 & (df1$practice != 1))
df1
#>   ID       date practice lowest highest
#> 1  1 2020-02-27        1      0       0
#> 2  1 2021-04-21        0      1       0
#> 3  1 2022-06-24        0      0       1
#> 4  2 2019-03-21        0      1       0
#> 5  2 2020-09-19        0      0       0
#> 6  2 2021-01-21        0      0       1

Here is a base R solution with ave.

x<-'
ID  date    practice
1   02-27-2020  1
1   04-21-2021  0
1   06-24-2022  0
2   03-21-2019  0
2   09-19-2020  0
2   01-21-2021  0'

df1 <- read.table(textConnection(x), header = TRUE)
df1$date <- as.Date(df1$date, "%m-%d-%Y")

y1 <- with(df1, ave(as.integer(date), ID, practice, FUN = \(x) {
  if(length(x))
    min(x) == x
  else NULL
}))
y2 <- with(df1, ave(as.integer(date), ID, practice, FUN = \(x) {
  if(length(x))
    max(x) == x
  else NULL
}))

df1$lowest <- as.integer(y1 & (df1$practice != 1))
df1$highest <- as.integer(y2 & (df1$practice != 1))
df1
#>   ID       date practice lowest highest
#> 1  1 2020-02-27        1      0       0
#> 2  1 2021-04-21        0      1       0
#> 3  1 2022-06-24        0      0       1
#> 4  2 2019-03-21        0      1       0
#> 5  2 2020-09-19        0      0       0
#> 6  2 2021-01-21        0      0       1

Created on 2022-04-25 by the reprex package (v2.0.1)

眼眸里的那抹悲凉 2025-01-31 11:23:51

数据。表方法:

f <- function(x,p) list(1*(x==min(x[p!=1])), 1*(x==max(x[p!=1])))
setDT(df)[,date:=as.IDate(date, "%m-%d-%Y")][,c("lowest","highest"):=f(date,practice), by=ID][]

输出:

      ID       date practice lowest highest
   <int>     <IDat>    <int>  <num>   <num>
1:     1 2020-02-27        1      0       0
2:     1 2021-04-21        0      1       0
3:     1 2022-06-24        0      0       1
4:     2 2019-03-21        0      1       0
5:     2 2020-09-19        0      0       0
6:     2 2021-01-21        0      0       1

输入:

structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L), date = c("02-27-2020", 
"04-21-2021", "06-24-2022", "03-21-2019", "09-19-2020", "01-21-2021"
), practice = c(1L, 0L, 0L, 0L, 0L, 0L)), row.names = c(NA, -6L
), class = "data.frame")

A data.table approach:

f <- function(x,p) list(1*(x==min(x[p!=1])), 1*(x==max(x[p!=1])))
setDT(df)[,date:=as.IDate(date, "%m-%d-%Y")][,c("lowest","highest"):=f(date,practice), by=ID][]

Output:

      ID       date practice lowest highest
   <int>     <IDat>    <int>  <num>   <num>
1:     1 2020-02-27        1      0       0
2:     1 2021-04-21        0      1       0
3:     1 2022-06-24        0      0       1
4:     2 2019-03-21        0      1       0
5:     2 2020-09-19        0      0       0
6:     2 2021-01-21        0      0       1

Input:

structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L), date = c("02-27-2020", 
"04-21-2021", "06-24-2022", "03-21-2019", "09-19-2020", "01-21-2021"
), practice = c(1L, 0L, 0L, 0L, 0L, 0L)), row.names = c(NA, -6L
), class = "data.frame")
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文