按日期和时间过滤数据(HH:MM:SS),并将数据汇总为每月5分钟间隔的行

发布于 2025-01-18 02:58:09 字数 3532 浏览 0 评论 0原文

问题

我有一个非常大的数据框(几乎 20,000 行),其中大约每 1-3 分钟记录一次数据。不幸的是,我无法将任何真实数据上传到这篇文章。

目标

目标是通过按日期和时间将行过滤为每月 5 分钟的间隔来减小数据框的大小。数据跨度超过4年。

在过去的几天里,我尝试在 R 中使用不同的包和函数来弄清楚如何执行此操作,但没有成功,例如 dplyr()、aggregate() 和 tidyverse() ,我就是无法解决这个难题。

数据框的结构

我有一个这样的数据框

ID    Date       Time    
 1 9/15/16 6:48:00 AM 
 2 9/15/16 6:54:00 AM 
 3 9/15/16 6:57:00 AM 
 4 9/15/16 6:59:00 AM 
 5 9/15/16 7:03:00 AM 
 6 9/15/16 7:05:00 AM 

我想通过计算分钟数将此数据转换为如下例所示的数据框以及每个后续“ID”行之间的秒数。

    ID    Date Start_Time    End_Time Minutes Seconds
     1 9/15/16 6:48:00 AM  6:54:00 AM     5.0  300.00
     2 9/15/16 6:54:00 AM  6:57:00 AM     3.0  180.00
     3 9/15/16 6:57:00 AM  6:59:00 AM     2.0  120.00
     4 9/15/16 6:59:00 AM  7:03:00 AM     4.0  240.00
     5 9/15/16 7:03:00 AM  7:05:00 AM     2.0  120.00
     6 9/15/16 7:05:00 AM  etc

之后,我想按日期和时间过滤包含后续行“ID” 之间的新计算的数据框时间为 5 分钟或 300.0 秒每月的时间间隔来减少数据帧的大小。

输出应该是这样的,除非有人有更有效的方法。

    ID    Date Start_Time    End_Time Minutes Seconds
     1 9/15/16 6:48:00 AM  6:54:00 AM     5.0  300.00
     

我很欣赏你对此的想法,

非常感谢。

进展

非常感谢这个解决方案,它运行得非常好。对于所有这些问题,我很抱歉,我是 R 新手。请问下面的警告消息是什么意思(ℹ 232 failed to parse)以及计算分钟和秒的问题是什么?我的数据框中的行('New_Track' - 见下文)?所有名为“分钟”和“秒”的列中的分钟和秒完全相同。此外,对于 ID 7 和 8,新计算显示差异为 950520 分钟和秒,而正确计算约为 2 分钟或 120 秒(见下文)。

第 7 行

7     7  9/15/16  7:07:00 AM 2016-09-15 07:07:00 2016-09-26 07:09:00  950520M 0S  950520
8     8  9/26/16  7:09:00 AM 2016-09-26 07:09:00 2016-09-26 07:11:00     120M 0S     120

警告消息:

##My data frame is called 'track' and the columns are: (1) ID; (2) Date; and (3) Time

##Code:

New_Track <- data.frame(
+                       stringsAsFactors = FALSE,
+                       ID = track$ID,
+                       Date = track$Date,
+                       Time = track$Time
+                        ) %>%
+                             mutate(Start_Time = mdy_hms(paste(Date, Time)),
+                             End_Time = lead(Start_Time),
+                             Minutes = minutes(End_Time-Start_Time),
+                             Seconds = (End_Time-Start_Time) / dseconds(1))
Warning message:
Problem while computing `Start_Time = mdy_hms(paste(Date, Time))`.
ℹ  232 failed to parse. 

新数据框布局 - 'New_Track'

  ID     Date        Time          Start_Time            End_Time     Minutes Seconds
1     1  9/15/16  6:48:00 AM 2016-09-15 06:48:00 2016-09-15 06:54:00     360M 0S     360
2     2  9/15/16  6:54:00 AM 2016-09-15 06:54:00 2016-09-15 06:57:00     180M 0S     180
3     3  9/15/16  6:57:00 AM 2016-09-15 06:57:00 2016-09-15 06:59:00     120M 0S     120
4     4  9/15/16  6:59:00 AM 2016-09-15 06:59:00 2016-09-15 07:03:00     240M 0S     240
5     5  9/15/16  7:03:00 AM 2016-09-15 07:03:00 2016-09-15 07:05:00     120M 0S     120
6     6  9/15/16  7:05:00 AM 2016-09-15 07:05:00 2016-09-15 07:07:00     120M 0S     120
7     7  9/15/16  7:07:00 AM 2016-09-15 07:07:00 2016-09-26 07:09:00  950520M 0S  950520
8     8  9/26/16  7:09:00 AM 2016-09-26 07:09:00 2016-09-26 07:11:00     120M 0S     120
9     9  9/26/16  7:11:00 AM 2016-09-26 07:11:00 2016-09-26 07:13:00     120M 0S     120 

Problem

I have a very large data frame (almost 20,000 rows) where data was documented approximately every 1-3 minutes. Unfortunately, I am unable to upload any authentic data to this post.

Aim

The aim is to reduce the size of the data frame by filtering the rows by date and time into 5-minute intervals for each month. The data spans over 4 years.

I have tried using different packages and functions in R to figure out how to do this over the last couple of days to no avail such as dplyr(), aggregate(), and tidyverse(), and I just can't solve this conundrum.

Structure of the data frame

I have a data frame like this

ID    Date       Time    
 1 9/15/16 6:48:00 AM 
 2 9/15/16 6:54:00 AM 
 3 9/15/16 6:57:00 AM 
 4 9/15/16 6:59:00 AM 
 5 9/15/16 7:03:00 AM 
 6 9/15/16 7:05:00 AM 

I would like to convert this data into a data frame like the example below by calculating the number of minutes and seconds between each subsequent 'ID' row.

    ID    Date Start_Time    End_Time Minutes Seconds
     1 9/15/16 6:48:00 AM  6:54:00 AM     5.0  300.00
     2 9/15/16 6:54:00 AM  6:57:00 AM     3.0  180.00
     3 9/15/16 6:57:00 AM  6:59:00 AM     2.0  120.00
     4 9/15/16 6:59:00 AM  7:03:00 AM     4.0  240.00
     5 9/15/16 7:03:00 AM  7:05:00 AM     2.0  120.00
     6 9/15/16 7:05:00 AM  etc

Afterwards, I'd like to filter the data frame containing the new calculations between the subsequent rows of 'IDs' by date and time into 5-minute or 300.0-second time intervals per month to reduce the size of the data frame.

The output should be something like this unless someone has a more efficient method.

    ID    Date Start_Time    End_Time Minutes Seconds
     1 9/15/16 6:48:00 AM  6:54:00 AM     5.0  300.00
     

I appreciate your thoughts on this

Many thanks in advance.

Progress

Many many thanks in advance for this solution, it worked really well. Sorry for all these questions, I am a novice with R. Could I please query what the warning message below means (ℹ 232 failed to parse) and what went wrong with the calculations of minutes and seconds for the rows in my data frame(' New_Track' - see below)? The minutes and seconds are exactly the same in all columns called 'Minutes' and 'Seconds'. In addition, for IDs 7 and 8, the new calculations show the difference is 950520 minutes and seconds when the correct calculation is approximately 2 minutes or 120 seconds (see below).

Row 7

7     7  9/15/16  7:07:00 AM 2016-09-15 07:07:00 2016-09-26 07:09:00  950520M 0S  950520
8     8  9/26/16  7:09:00 AM 2016-09-26 07:09:00 2016-09-26 07:11:00     120M 0S     120

Warning Message:

##My data frame is called 'track' and the columns are: (1) ID; (2) Date; and (3) Time

##Code:

New_Track <- data.frame(
+                       stringsAsFactors = FALSE,
+                       ID = track$ID,
+                       Date = track$Date,
+                       Time = track$Time
+                        ) %>%
+                             mutate(Start_Time = mdy_hms(paste(Date, Time)),
+                             End_Time = lead(Start_Time),
+                             Minutes = minutes(End_Time-Start_Time),
+                             Seconds = (End_Time-Start_Time) / dseconds(1))
Warning message:
Problem while computing `Start_Time = mdy_hms(paste(Date, Time))`.
ℹ  232 failed to parse. 

New Data frame Layout - 'New_Track'

  ID     Date        Time          Start_Time            End_Time     Minutes Seconds
1     1  9/15/16  6:48:00 AM 2016-09-15 06:48:00 2016-09-15 06:54:00     360M 0S     360
2     2  9/15/16  6:54:00 AM 2016-09-15 06:54:00 2016-09-15 06:57:00     180M 0S     180
3     3  9/15/16  6:57:00 AM 2016-09-15 06:57:00 2016-09-15 06:59:00     120M 0S     120
4     4  9/15/16  6:59:00 AM 2016-09-15 06:59:00 2016-09-15 07:03:00     240M 0S     240
5     5  9/15/16  7:03:00 AM 2016-09-15 07:03:00 2016-09-15 07:05:00     120M 0S     120
6     6  9/15/16  7:05:00 AM 2016-09-15 07:05:00 2016-09-15 07:07:00     120M 0S     120
7     7  9/15/16  7:07:00 AM 2016-09-15 07:07:00 2016-09-26 07:09:00  950520M 0S  950520
8     8  9/26/16  7:09:00 AM 2016-09-26 07:09:00 2016-09-26 07:11:00     120M 0S     120
9     9  9/26/16  7:11:00 AM 2016-09-26 07:11:00 2016-09-26 07:13:00     120M 0S     120 

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

述情 2025-01-25 02:58:09
library(dplyr); library(lubridate)
data.frame(
  stringsAsFactors = FALSE,
                ID = c(1L, 2L, 3L, 4L, 5L, 6L),
              Date = c("9/15/16","9/15/16","9/15/16",
                       "9/15/16","9/15/16","9/15/16"),
              Time = c("6:48:00","6:54:00","6:57:00",
                       "6:59:00","7:03:00","7:05:00")
) %>%
  mutate(Start_Time = mdy_hms(paste(Date, Time)),
         End_Time = lead(Start_Time),
         Minutes = minutes(End_Time-Start_Time),
         Seconds = (End_Time-Start_Time) / dseconds(1))

结果

  ID    Date    Time          Start_Time            End_Time Minutes Seconds
1  1 9/15/16 6:48:00 2016-09-15 06:48:00 2016-09-15 06:54:00   6M 0S     360
2  2 9/15/16 6:54:00 2016-09-15 06:54:00 2016-09-15 06:57:00   3M 0S     180
3  3 9/15/16 6:57:00 2016-09-15 06:57:00 2016-09-15 06:59:00   2M 0S     120
4  4 9/15/16 6:59:00 2016-09-15 06:59:00 2016-09-15 07:03:00   4M 0S     240
5  5 9/15/16 7:03:00 2016-09-15 07:03:00 2016-09-15 07:05:00   2M 0S     120
6  6 9/15/16 7:05:00 2016-09-15 07:05:00                <NA>    <NA>      NA

对我来说仍然不清楚您要为此寻找什么:

我想过滤包含新计算的数据框
按日期和时间的“ IDS”的随后行之间到5分钟或
每月300.0秒的时间间隔以减小数据框的大小。

library(dplyr); library(lubridate)
data.frame(
  stringsAsFactors = FALSE,
                ID = c(1L, 2L, 3L, 4L, 5L, 6L),
              Date = c("9/15/16","9/15/16","9/15/16",
                       "9/15/16","9/15/16","9/15/16"),
              Time = c("6:48:00","6:54:00","6:57:00",
                       "6:59:00","7:03:00","7:05:00")
) %>%
  mutate(Start_Time = mdy_hms(paste(Date, Time)),
         End_Time = lead(Start_Time),
         Minutes = minutes(End_Time-Start_Time),
         Seconds = (End_Time-Start_Time) / dseconds(1))

Result

  ID    Date    Time          Start_Time            End_Time Minutes Seconds
1  1 9/15/16 6:48:00 2016-09-15 06:48:00 2016-09-15 06:54:00   6M 0S     360
2  2 9/15/16 6:54:00 2016-09-15 06:54:00 2016-09-15 06:57:00   3M 0S     180
3  3 9/15/16 6:57:00 2016-09-15 06:57:00 2016-09-15 06:59:00   2M 0S     120
4  4 9/15/16 6:59:00 2016-09-15 06:59:00 2016-09-15 07:03:00   4M 0S     240
5  5 9/15/16 7:03:00 2016-09-15 07:03:00 2016-09-15 07:05:00   2M 0S     120
6  6 9/15/16 7:05:00 2016-09-15 07:05:00                <NA>    <NA>      NA

It's still unclear to me what you're looking for for this:

I'd like to filter the data frame containing the new calculations
between the subsequent rows of 'IDs' by date and time into 5-minute or
300.0-second time intervals per month to reduce the size of the data frame.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文