R 使用函数、case_when 和数据屏蔽进行变异来解析时间戳

发布于 2025-01-16 14:23:56 字数 3543 浏览 0 评论 0原文

我正在尝试使用 R mutatecase_when 将一些时间戳(字符向量)解析为日期时间。

虚拟数据:

p_id = c(1,2,3,4,5,6)
ActualStartTime = c("2020-05-21 19:04:36 +01:00", "21/09/2020 14:14", "2020-08-18 10:11:08 +01:00", "12/10/2020 21:25", "09/11/2020 17:02","2020-05-16 11:50:58 +02:00")
ActualEndTime = c("2020-05-21 19:29:42 +01:00", "21/09/2020 14:19", "2020-08-18 10:14:26 +01:00", "12/10/2020 21:29", "09/11/2020 17:06", "2020-05-16 11:56:10 +02:00")
df <- data.frame(p_id,ActualStartTime, ActualEndTime)

df

  p_id            ActualStartTime              ActualEndTime
1    1 2020-05-21 19:04:36 +01:00 2020-05-21 19:29:42 +01:00
2    2           21/09/2020 14:14           21/09/2020 14:19
3    3 2020-08-18 10:11:08 +01:00 2020-08-18 10:14:26 +01:00
4    4           12/10/2020 21:25           12/10/2020 21:29
5    5           09/11/2020 17:02           09/11/2020 17:06
6    6 2020-05-16 11:50:58 +02:00 2020-05-16 11:56:10 +02:00

时间戳有两种不同的格式,因此我创建了一个函数,但没有对其进行矢量化来测试它。如果长度 == 26,那么它会使用一种格式进行解析,如果长度是其他值,它会解析为替代格式。

parse_mydate_novec <- function(time_var) {
  if (nchar(time_var) == 26) { 
    parse_date_time(time_var, orders = "%Y-%m-%d %H:%M:%S %z", tz = "UTC")
  } else {
    parse_date_time(time_var, orders = "%d/%m/%Y %H:%M", tz = "UTC")
  }
}

parse_mydate_novec(df$ActualStartTime[1]) # this works, class is POSIXct
[1] "2020-05-21 18:04:36 UTC"

> parse_mydate_novec(df$ActualStartTime[2]) # this works, class is POSIXct
[1] "2020-09-21 14:14:00 UTC"

到目前为止,一切都很好。然后,我尝试使用数据屏蔽指南对函数进行矢量化 https://dplyr.tidyverse.org/reference /dplyr_data_masking.html,所以我可以将它与 mutate 一起使用,并使用 case_when 而不是 if else:

parse_mydate <- function(time_var) {
 case_when (
   nchar({{time_var}}) == 26 ~ parse_date_time({{time_var}}, orders = "%Y-%m-%d %H:%M:%S %z", tz = "UTC"),
   nchar({{time_var}}) == 16 ~ parse_date_time({{time_var}}, orders = "%d/%m/%Y %H:%M", tz = "UTC"),
   TRUE ~ {{time_var}})
} 

然后我使用 mutate 传递此函数,首先在一列上进行测试它然后使用 mutate(across()):

df_test <- df %>%
  mutate(ActualStartTime = parse_mydate(ActualStartTime))

df_test <- df %>%
  mutate(across(c(ActualStartTime, ActualEndTime), ~parse_mydate(.x)))

但是我收到以下错误:

Error in `mutate_cols()`:
! Problem with `mutate()` column `ActualStartTime`.
ℹ `ActualStartTime = parse_um_date(ActualStartTime)`.
x must be a `POSIXct/POSIXt` object, not a character vector.
Caused by error in `glubort()`:
! must be a `POSIXct/POSIXt` object, not a character vector.

Warning messages:
1: Problem with `mutate()` column `ActualStartTime`.
ℹ `ActualStartTime = parse_um_date(ActualStartTime)`.
ℹ  3 failed to parse. 
2: Problem with `mutate()` column `ActualStartTime`.
ℹ `ActualStartTime = parse_um_date(ActualStartTime)`.
ℹ  3 failed to parse. 

这没有意义,因为我已经编写了函数来传递字符向量并返回日期时间对象。

所需的输出是一个数据帧,其中 ActualStartTime 和 ActualEndTime 中的所有对象均采用 POSIXct 格式,即“2020-05-21 18:04:36 UTC”

我查看过: R dplyr 有效地使用 across() 和 mutate() 以及case_when()R - 如何将参数传递给“mutate”中的函数跨越”? 以及有关解析日期时间的其他几个问题。

不知道是不是我函数的逻辑错了,是case_when的使用,mutate的使用还是别的什么。我已经绕圈子转了好几个小时了。感谢所有帮助!带着感谢。

I am trying to parse some timestamps (character vectors) as datetimes using R mutate and case_when.

Dummy data:

p_id = c(1,2,3,4,5,6)
ActualStartTime = c("2020-05-21 19:04:36 +01:00", "21/09/2020 14:14", "2020-08-18 10:11:08 +01:00", "12/10/2020 21:25", "09/11/2020 17:02","2020-05-16 11:50:58 +02:00")
ActualEndTime = c("2020-05-21 19:29:42 +01:00", "21/09/2020 14:19", "2020-08-18 10:14:26 +01:00", "12/10/2020 21:29", "09/11/2020 17:06", "2020-05-16 11:56:10 +02:00")
df <- data.frame(p_id,ActualStartTime, ActualEndTime)

df

  p_id            ActualStartTime              ActualEndTime
1    1 2020-05-21 19:04:36 +01:00 2020-05-21 19:29:42 +01:00
2    2           21/09/2020 14:14           21/09/2020 14:19
3    3 2020-08-18 10:11:08 +01:00 2020-08-18 10:14:26 +01:00
4    4           12/10/2020 21:25           12/10/2020 21:29
5    5           09/11/2020 17:02           09/11/2020 17:06
6    6 2020-05-16 11:50:58 +02:00 2020-05-16 11:56:10 +02:00

The timestamps are in two different formats, so I create a function without vectorising it to test it. If the length == 26 then it parses with one format, if the length is anything else it parses to the alternative format.

parse_mydate_novec <- function(time_var) {
  if (nchar(time_var) == 26) { 
    parse_date_time(time_var, orders = "%Y-%m-%d %H:%M:%S %z", tz = "UTC")
  } else {
    parse_date_time(time_var, orders = "%d/%m/%Y %H:%M", tz = "UTC")
  }
}

parse_mydate_novec(df$ActualStartTime[1]) # this works, class is POSIXct
[1] "2020-05-21 18:04:36 UTC"

> parse_mydate_novec(df$ActualStartTime[2]) # this works, class is POSIXct
[1] "2020-09-21 14:14:00 UTC"

So far, so good. I then try vectorising the function using the data masking guidance https://dplyr.tidyverse.org/reference/dplyr_data_masking.html, so I can use it with mutate and using case_when instead of if else:

parse_mydate <- function(time_var) {
 case_when (
   nchar({{time_var}}) == 26 ~ parse_date_time({{time_var}}, orders = "%Y-%m-%d %H:%M:%S %z", tz = "UTC"),
   nchar({{time_var}}) == 16 ~ parse_date_time({{time_var}}, orders = "%d/%m/%Y %H:%M", tz = "UTC"),
   TRUE ~ {{time_var}})
} 

I then pass this function using mutate, first on one column to test it and then using mutate(across()):

df_test <- df %>%
  mutate(ActualStartTime = parse_mydate(ActualStartTime))

df_test <- df %>%
  mutate(across(c(ActualStartTime, ActualEndTime), ~parse_mydate(.x)))

However I get the following errors:

Error in `mutate_cols()`:
! Problem with `mutate()` column `ActualStartTime`.
ℹ `ActualStartTime = parse_um_date(ActualStartTime)`.
x must be a `POSIXct/POSIXt` object, not a character vector.
Caused by error in `glubort()`:
! must be a `POSIXct/POSIXt` object, not a character vector.

Warning messages:
1: Problem with `mutate()` column `ActualStartTime`.
ℹ `ActualStartTime = parse_um_date(ActualStartTime)`.
ℹ  3 failed to parse. 
2: Problem with `mutate()` column `ActualStartTime`.
ℹ `ActualStartTime = parse_um_date(ActualStartTime)`.
ℹ  3 failed to parse. 

This doesn't make sense as I've written the function to pass in a character vector and return a datetime object.

The desired output is a dataframe where all the objects in ActualStartTime and ActualEndTime are in POSIXct format i.e. "2020-05-21 18:04:36 UTC"

I've looked at:
R dplyr using across() efficiently with mutate() and case_when()
and R - How to pass parameters to function in "mutate across"?
and several other questions on parsing datetimes.

I don't know whether I have the logic of the function wrong, the use of case_when, the use of mutate or something else. I've been going round in circles for hours. All help appreciated! With thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

软的没边 2025-01-23 14:23:56

函数 lubridate::fast_strptime 允许指定更多格式,这些格式将依次应用直到成功。

library(dplyr)
library(lubridate)

df %>%
  mutate(across(matches("Time"), ~fast_strptime(.x,
                                              format = c("%Y-%m-%d %H:%M:%S %z",
                                                         "%d/%m/%Y %H:%M"),
                                              tz = "UTC")))


##>   p_id     ActualStartTime       ActualEndTime
##> 1    1 2020-05-21 18:04:36 2020-05-21 18:29:42
##> 2    2 2020-09-21 14:14:00 2020-09-21 14:19:00
##> 3    3 2020-08-18 09:11:08 2020-08-18 09:14:26
##> 4    4 2020-10-12 21:25:00 2020-10-12 21:29:00
##> 5    5 2020-11-09 17:02:00 2020-11-09 17:06:00
##> 6    6 2020-05-16 09:50:58 2020-05-16 09:56:10

The function lubridate::fast_strptime allows the specification of more formats that will be applied in turn till success.

library(dplyr)
library(lubridate)

df %>%
  mutate(across(matches("Time"), ~fast_strptime(.x,
                                              format = c("%Y-%m-%d %H:%M:%S %z",
                                                         "%d/%m/%Y %H:%M"),
                                              tz = "UTC")))


##>   p_id     ActualStartTime       ActualEndTime
##> 1    1 2020-05-21 18:04:36 2020-05-21 18:29:42
##> 2    2 2020-09-21 14:14:00 2020-09-21 14:19:00
##> 3    3 2020-08-18 09:11:08 2020-08-18 09:14:26
##> 4    4 2020-10-12 21:25:00 2020-10-12 21:29:00
##> 5    5 2020-11-09 17:02:00 2020-11-09 17:06:00
##> 6    6 2020-05-16 09:50:58 2020-05-16 09:56:10
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文