按日期和时间过滤数据(HH:MM:SS),并将数据汇总为每月5分钟间隔的行
问题
我有一个非常大的数据框(几乎 20,000 行),其中大约每 1-3 分钟记录一次数据。不幸的是,我无法将任何真实数据上传到这篇文章。
目标
目标是通过按日期和时间将行过滤为每月 5 分钟的间隔来减小数据框的大小。数据跨度超过4年。
在过去的几天里,我尝试在 R 中使用不同的包和函数来弄清楚如何执行此操作,但没有成功,例如 dplyr()、aggregate() 和 tidyverse() ,我就是无法解决这个难题。
数据框的结构
我有一个这样的数据框
ID Date Time
1 9/15/16 6:48:00 AM
2 9/15/16 6:54:00 AM
3 9/15/16 6:57:00 AM
4 9/15/16 6:59:00 AM
5 9/15/16 7:03:00 AM
6 9/15/16 7:05:00 AM
我想通过计算分钟数将此数据转换为如下例所示的数据框以及每个后续“ID”行之间的秒数。
ID Date Start_Time End_Time Minutes Seconds
1 9/15/16 6:48:00 AM 6:54:00 AM 5.0 300.00
2 9/15/16 6:54:00 AM 6:57:00 AM 3.0 180.00
3 9/15/16 6:57:00 AM 6:59:00 AM 2.0 120.00
4 9/15/16 6:59:00 AM 7:03:00 AM 4.0 240.00
5 9/15/16 7:03:00 AM 7:05:00 AM 2.0 120.00
6 9/15/16 7:05:00 AM etc
之后,我想按日期和时间过滤包含后续行“ID” 之间的新计算的数据框时间为 5 分钟或 300.0 秒每月的时间间隔来减少数据帧的大小。
输出应该是这样的,除非有人有更有效的方法。
ID Date Start_Time End_Time Minutes Seconds
1 9/15/16 6:48:00 AM 6:54:00 AM 5.0 300.00
我很欣赏你对此的想法,
非常感谢。
进展
非常感谢这个解决方案,它运行得非常好。对于所有这些问题,我很抱歉,我是 R 新手。请问下面的警告消息是什么意思(ℹ 232 failed to parse)以及计算分钟和秒的问题是什么?我的数据框中的行('New_Track' - 见下文)?所有名为“分钟”和“秒”的列中的分钟和秒完全相同。此外,对于 ID 7 和 8,新计算显示差异为 950520 分钟和秒,而正确计算约为 2 分钟或 120 秒(见下文)。
第 7 行
7 7 9/15/16 7:07:00 AM 2016-09-15 07:07:00 2016-09-26 07:09:00 950520M 0S 950520
8 8 9/26/16 7:09:00 AM 2016-09-26 07:09:00 2016-09-26 07:11:00 120M 0S 120
警告消息:
##My data frame is called 'track' and the columns are: (1) ID; (2) Date; and (3) Time
##Code:
New_Track <- data.frame(
+ stringsAsFactors = FALSE,
+ ID = track$ID,
+ Date = track$Date,
+ Time = track$Time
+ ) %>%
+ mutate(Start_Time = mdy_hms(paste(Date, Time)),
+ End_Time = lead(Start_Time),
+ Minutes = minutes(End_Time-Start_Time),
+ Seconds = (End_Time-Start_Time) / dseconds(1))
Warning message:
Problem while computing `Start_Time = mdy_hms(paste(Date, Time))`.
ℹ 232 failed to parse.
新数据框布局 - 'New_Track'
ID Date Time Start_Time End_Time Minutes Seconds
1 1 9/15/16 6:48:00 AM 2016-09-15 06:48:00 2016-09-15 06:54:00 360M 0S 360
2 2 9/15/16 6:54:00 AM 2016-09-15 06:54:00 2016-09-15 06:57:00 180M 0S 180
3 3 9/15/16 6:57:00 AM 2016-09-15 06:57:00 2016-09-15 06:59:00 120M 0S 120
4 4 9/15/16 6:59:00 AM 2016-09-15 06:59:00 2016-09-15 07:03:00 240M 0S 240
5 5 9/15/16 7:03:00 AM 2016-09-15 07:03:00 2016-09-15 07:05:00 120M 0S 120
6 6 9/15/16 7:05:00 AM 2016-09-15 07:05:00 2016-09-15 07:07:00 120M 0S 120
7 7 9/15/16 7:07:00 AM 2016-09-15 07:07:00 2016-09-26 07:09:00 950520M 0S 950520
8 8 9/26/16 7:09:00 AM 2016-09-26 07:09:00 2016-09-26 07:11:00 120M 0S 120
9 9 9/26/16 7:11:00 AM 2016-09-26 07:11:00 2016-09-26 07:13:00 120M 0S 120
Problem
I have a very large data frame (almost 20,000 rows) where data was documented approximately every 1-3 minutes. Unfortunately, I am unable to upload any authentic data to this post.
Aim
The aim is to reduce the size of the data frame by filtering the rows by date and time into 5-minute intervals for each month. The data spans over 4 years.
I have tried using different packages and functions in R to figure out how to do this over the last couple of days to no avail such as dplyr(), aggregate(), and tidyverse(), and I just can't solve this conundrum.
Structure of the data frame
I have a data frame like this
ID Date Time
1 9/15/16 6:48:00 AM
2 9/15/16 6:54:00 AM
3 9/15/16 6:57:00 AM
4 9/15/16 6:59:00 AM
5 9/15/16 7:03:00 AM
6 9/15/16 7:05:00 AM
I would like to convert this data into a data frame like the example below by calculating the number of minutes and seconds between each subsequent 'ID' row.
ID Date Start_Time End_Time Minutes Seconds
1 9/15/16 6:48:00 AM 6:54:00 AM 5.0 300.00
2 9/15/16 6:54:00 AM 6:57:00 AM 3.0 180.00
3 9/15/16 6:57:00 AM 6:59:00 AM 2.0 120.00
4 9/15/16 6:59:00 AM 7:03:00 AM 4.0 240.00
5 9/15/16 7:03:00 AM 7:05:00 AM 2.0 120.00
6 9/15/16 7:05:00 AM etc
Afterwards, I'd like to filter the data frame containing the new calculations between the subsequent rows of 'IDs' by date and time into 5-minute or 300.0-second time intervals per month to reduce the size of the data frame.
The output should be something like this unless someone has a more efficient method.
ID Date Start_Time End_Time Minutes Seconds
1 9/15/16 6:48:00 AM 6:54:00 AM 5.0 300.00
I appreciate your thoughts on this
Many thanks in advance.
Progress
Many many thanks in advance for this solution, it worked really well. Sorry for all these questions, I am a novice with R. Could I please query what the warning message below means (ℹ 232 failed to parse) and what went wrong with the calculations of minutes and seconds for the rows in my data frame(' New_Track' - see below)? The minutes and seconds are exactly the same in all columns called 'Minutes' and 'Seconds'. In addition, for IDs 7 and 8, the new calculations show the difference is 950520 minutes and seconds when the correct calculation is approximately 2 minutes or 120 seconds (see below).
Row 7
7 7 9/15/16 7:07:00 AM 2016-09-15 07:07:00 2016-09-26 07:09:00 950520M 0S 950520
8 8 9/26/16 7:09:00 AM 2016-09-26 07:09:00 2016-09-26 07:11:00 120M 0S 120
Warning Message:
##My data frame is called 'track' and the columns are: (1) ID; (2) Date; and (3) Time
##Code:
New_Track <- data.frame(
+ stringsAsFactors = FALSE,
+ ID = track$ID,
+ Date = track$Date,
+ Time = track$Time
+ ) %>%
+ mutate(Start_Time = mdy_hms(paste(Date, Time)),
+ End_Time = lead(Start_Time),
+ Minutes = minutes(End_Time-Start_Time),
+ Seconds = (End_Time-Start_Time) / dseconds(1))
Warning message:
Problem while computing `Start_Time = mdy_hms(paste(Date, Time))`.
ℹ 232 failed to parse.
New Data frame Layout - 'New_Track'
ID Date Time Start_Time End_Time Minutes Seconds
1 1 9/15/16 6:48:00 AM 2016-09-15 06:48:00 2016-09-15 06:54:00 360M 0S 360
2 2 9/15/16 6:54:00 AM 2016-09-15 06:54:00 2016-09-15 06:57:00 180M 0S 180
3 3 9/15/16 6:57:00 AM 2016-09-15 06:57:00 2016-09-15 06:59:00 120M 0S 120
4 4 9/15/16 6:59:00 AM 2016-09-15 06:59:00 2016-09-15 07:03:00 240M 0S 240
5 5 9/15/16 7:03:00 AM 2016-09-15 07:03:00 2016-09-15 07:05:00 120M 0S 120
6 6 9/15/16 7:05:00 AM 2016-09-15 07:05:00 2016-09-15 07:07:00 120M 0S 120
7 7 9/15/16 7:07:00 AM 2016-09-15 07:07:00 2016-09-26 07:09:00 950520M 0S 950520
8 8 9/26/16 7:09:00 AM 2016-09-26 07:09:00 2016-09-26 07:11:00 120M 0S 120
9 9 9/26/16 7:11:00 AM 2016-09-26 07:11:00 2016-09-26 07:13:00 120M 0S 120
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
结果
对我来说仍然不清楚您要为此寻找什么:
Result
It's still unclear to me what you're looking for for this: