r中单个状态变量的变化之间的总和累计时间
我一直在寻找答案,并用代码弄乱了几个小时。我有一个数据集,看起来像以下特定ID:
# A tibble: 14 × 3
ID state orderDate
<dbl> <chr> <dttm>
1 4227631 1 2022-03-14 19:00:00
2 4227631 1 2022-03-14 20:00:00
3 4227631 1 2022-03-15 11:00:00
4 4227631 0 2022-03-15 11:00:00
5 4227631 1 2022-03-15 20:00:00
6 4227631 1 2022-03-16 04:00:00
7 4227631 0 2022-03-16 04:00:00
8 4227631 1 2022-03-16 05:00:00
9 4227631 0 2022-03-16 13:00:00
10 4227631 1 2022-03-16 15:00:00
这是数百个ID发生的。在此示例中,我将dplyr用于group_by
id。我只在乎何时状态在值之间变化,而不是在保持不变的情况下。
我想计算每个ID在状态1中保留的累积时间。在应忽略更改之前多次重复状态1的实例。我一直计划使用橄榄酸和DPLYR进行分析。
tibble我在此示例中使用了:
structure(list(ID = c(4227631, 4227631, 4227631, 4227631, 4227631,
4227631, 4227631, 4227631, 4227631, 4227631), state = c("1",
"1", "1", "0", "1", "1", "0", "1", "0", "1"), orderDate = structure(c(1647284400,
1647288000, 1647342000, 1647342000, 1647374400, 1647403200, 1647403200,
1647406800, 1647435600, 1647442800), tzone = "UTC", class = c("POSIXct",
"POSIXt"))), row.names = c(NA, -10L), class = c("tbl_df", "tbl",
"data.frame"))
我尝试了各种解决方案,例如累积时间累积时间但是,我在lag
上遇到了麻烦,并将其纳入此特定分析中。
预期的输出可能看起来像这样:
,然后我计划一起总结所有现状,以找出花费的时间在这个状态。
邀请所有更优雅的解决方案,或者如果某人与先前的问题有链接。
编辑 使用下面的解决方案,我弄清楚了! 该解决方案没有查看状态0立即遵循状态1的情况,我们想查看这些状态之间的总时间。
df %>%
group_by(ID) %>%
mutate(max = cumsum(ifelse(orderName == lag(orderName, default = "1"), 0, 1))) %>%
mutate(hours1 = ifelse(max == lag(max) &
orderName=="1", difftime(orderDate, lag(orderDate), units = "h"), NA)) %>%
mutate(hours2 = ifelse(orderName=="0" & lag(orderName)=="1",
difftime(orderDate, lag(orderDate), units = "h"), NA)) %>%
mutate(hours1 = replace_na(hours1, 0),
hours2 = replace_na(hours2, 0)) %>%
mutate(hours = hours1+hours2) %>%
select(-hours1, -hours2) %>%
summarise(total_hours = sum(hours, na.rm = TRUE)) %>%
filter(total_hours!=0)
I've been looking for answers and messing around with my code for a couple hours. I have a dataset that looks like the following for a specific ID:
# A tibble: 14 × 3
ID state orderDate
<dbl> <chr> <dttm>
1 4227631 1 2022-03-14 19:00:00
2 4227631 1 2022-03-14 20:00:00
3 4227631 1 2022-03-15 11:00:00
4 4227631 0 2022-03-15 11:00:00
5 4227631 1 2022-03-15 20:00:00
6 4227631 1 2022-03-16 04:00:00
7 4227631 0 2022-03-16 04:00:00
8 4227631 1 2022-03-16 05:00:00
9 4227631 0 2022-03-16 13:00:00
10 4227631 1 2022-03-16 15:00:00
This occurs for hundreds of IDs. For this example, I am using dplyr to group_by
ID. I only care when status changes between values, not if it stays the same.
I want to calculate the cumulative time each ID remains in status 1. The instances where status 1 is repeated multiple times before it changes should be ignored. I have been planning to use lubridate and dplyr to perform the analysis.
Tibble I am using for this example:
structure(list(ID = c(4227631, 4227631, 4227631, 4227631, 4227631,
4227631, 4227631, 4227631, 4227631, 4227631), state = c("1",
"1", "1", "0", "1", "1", "0", "1", "0", "1"), orderDate = structure(c(1647284400,
1647288000, 1647342000, 1647342000, 1647374400, 1647403200, 1647403200,
1647406800, 1647435600, 1647442800), tzone = "UTC", class = c("POSIXct",
"POSIXt"))), row.names = c(NA, -10L), class = c("tbl_df", "tbl",
"data.frame"))
I've tried various solutions such as Cumulative time with reset however I'm having trouble with lag
and incorporating it into this specific analysis.
The expected output would maybe look something like this:
And then I would plan to sum all statusOne together to figure out cumulative time spent in this state.
Invite all more elegant solutions or if someone has a link to a prior question.
EDIT
Using solution below I figured it out!
The solution didn't look at the situations where state 0 immediately followed state 1 and we wanted to look at the total time elapsed between these states.
df %>%
group_by(ID) %>%
mutate(max = cumsum(ifelse(orderName == lag(orderName, default = "1"), 0, 1))) %>%
mutate(hours1 = ifelse(max == lag(max) &
orderName=="1", difftime(orderDate, lag(orderDate), units = "h"), NA)) %>%
mutate(hours2 = ifelse(orderName=="0" & lag(orderName)=="1",
difftime(orderDate, lag(orderDate), units = "h"), NA)) %>%
mutate(hours1 = replace_na(hours1, 0),
hours2 = replace_na(hours2, 0)) %>%
mutate(hours = hours1+hours2) %>%
select(-hours1, -hours2) %>%
summarise(total_hours = sum(hours, na.rm = TRUE)) %>%
filter(total_hours!=0)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这远非优雅,但至少它似乎提供了正确的答案:
由 reprex软件包(v2.0.1)
编辑
要获得每个ID的总_hours状态= 1:
在2022-04-04创建的the Total_hours state = 1,由 reprex软件包(v2.0.1)
This is far from elegant, but at least it appears to provide the correct answer:
Created on 2022-04-04 by the reprex package (v2.0.1)
Edit
It's a lot more straightforward to get the total_hours state=1 for each ID:
Created on 2022-04-04 by the reprex package (v2.0.1)