R 使用函数、case_when 和数据屏蔽进行变异来解析时间戳
我正在尝试使用 R mutate
和 case_when
将一些时间戳(字符向量)解析为日期时间。
虚拟数据:
p_id = c(1,2,3,4,5,6)
ActualStartTime = c("2020-05-21 19:04:36 +01:00", "21/09/2020 14:14", "2020-08-18 10:11:08 +01:00", "12/10/2020 21:25", "09/11/2020 17:02","2020-05-16 11:50:58 +02:00")
ActualEndTime = c("2020-05-21 19:29:42 +01:00", "21/09/2020 14:19", "2020-08-18 10:14:26 +01:00", "12/10/2020 21:29", "09/11/2020 17:06", "2020-05-16 11:56:10 +02:00")
df <- data.frame(p_id,ActualStartTime, ActualEndTime)
df
p_id ActualStartTime ActualEndTime
1 1 2020-05-21 19:04:36 +01:00 2020-05-21 19:29:42 +01:00
2 2 21/09/2020 14:14 21/09/2020 14:19
3 3 2020-08-18 10:11:08 +01:00 2020-08-18 10:14:26 +01:00
4 4 12/10/2020 21:25 12/10/2020 21:29
5 5 09/11/2020 17:02 09/11/2020 17:06
6 6 2020-05-16 11:50:58 +02:00 2020-05-16 11:56:10 +02:00
时间戳有两种不同的格式,因此我创建了一个函数,但没有对其进行矢量化来测试它。如果长度 == 26,那么它会使用一种格式进行解析,如果长度是其他值,它会解析为替代格式。
parse_mydate_novec <- function(time_var) {
if (nchar(time_var) == 26) {
parse_date_time(time_var, orders = "%Y-%m-%d %H:%M:%S %z", tz = "UTC")
} else {
parse_date_time(time_var, orders = "%d/%m/%Y %H:%M", tz = "UTC")
}
}
parse_mydate_novec(df$ActualStartTime[1]) # this works, class is POSIXct
[1] "2020-05-21 18:04:36 UTC"
> parse_mydate_novec(df$ActualStartTime[2]) # this works, class is POSIXct
[1] "2020-09-21 14:14:00 UTC"
到目前为止,一切都很好。然后,我尝试使用数据屏蔽指南对函数进行矢量化 https://dplyr.tidyverse.org/reference /dplyr_data_masking.html,所以我可以将它与 mutate 一起使用,并使用 case_when 而不是 if else:
parse_mydate <- function(time_var) {
case_when (
nchar({{time_var}}) == 26 ~ parse_date_time({{time_var}}, orders = "%Y-%m-%d %H:%M:%S %z", tz = "UTC"),
nchar({{time_var}}) == 16 ~ parse_date_time({{time_var}}, orders = "%d/%m/%Y %H:%M", tz = "UTC"),
TRUE ~ {{time_var}})
}
然后我使用 mutate 传递此函数,首先在一列上进行测试它然后使用 mutate(across()):
df_test <- df %>%
mutate(ActualStartTime = parse_mydate(ActualStartTime))
df_test <- df %>%
mutate(across(c(ActualStartTime, ActualEndTime), ~parse_mydate(.x)))
但是我收到以下错误:
Error in `mutate_cols()`:
! Problem with `mutate()` column `ActualStartTime`.
ℹ `ActualStartTime = parse_um_date(ActualStartTime)`.
x must be a `POSIXct/POSIXt` object, not a character vector.
Caused by error in `glubort()`:
! must be a `POSIXct/POSIXt` object, not a character vector.
Warning messages:
1: Problem with `mutate()` column `ActualStartTime`.
ℹ `ActualStartTime = parse_um_date(ActualStartTime)`.
ℹ 3 failed to parse.
2: Problem with `mutate()` column `ActualStartTime`.
ℹ `ActualStartTime = parse_um_date(ActualStartTime)`.
ℹ 3 failed to parse.
这没有意义,因为我已经编写了函数来传递字符向量并返回日期时间对象。
所需的输出是一个数据帧,其中 ActualStartTime 和 ActualEndTime 中的所有对象均采用 POSIXct 格式,即“2020-05-21 18:04:36 UTC”
我查看过: R dplyr 有效地使用 across() 和 mutate() 以及case_when() 和 R - 如何将参数传递给“mutate”中的函数跨越”? 以及有关解析日期时间的其他几个问题。
不知道是不是我函数的逻辑错了,是case_when的使用,mutate的使用还是别的什么。我已经绕圈子转了好几个小时了。感谢所有帮助!带着感谢。
I am trying to parse some timestamps (character vectors) as datetimes using R mutate
and case_when
.
Dummy data:
p_id = c(1,2,3,4,5,6)
ActualStartTime = c("2020-05-21 19:04:36 +01:00", "21/09/2020 14:14", "2020-08-18 10:11:08 +01:00", "12/10/2020 21:25", "09/11/2020 17:02","2020-05-16 11:50:58 +02:00")
ActualEndTime = c("2020-05-21 19:29:42 +01:00", "21/09/2020 14:19", "2020-08-18 10:14:26 +01:00", "12/10/2020 21:29", "09/11/2020 17:06", "2020-05-16 11:56:10 +02:00")
df <- data.frame(p_id,ActualStartTime, ActualEndTime)
df
p_id ActualStartTime ActualEndTime
1 1 2020-05-21 19:04:36 +01:00 2020-05-21 19:29:42 +01:00
2 2 21/09/2020 14:14 21/09/2020 14:19
3 3 2020-08-18 10:11:08 +01:00 2020-08-18 10:14:26 +01:00
4 4 12/10/2020 21:25 12/10/2020 21:29
5 5 09/11/2020 17:02 09/11/2020 17:06
6 6 2020-05-16 11:50:58 +02:00 2020-05-16 11:56:10 +02:00
The timestamps are in two different formats, so I create a function without vectorising it to test it. If the length == 26 then it parses with one format, if the length is anything else it parses to the alternative format.
parse_mydate_novec <- function(time_var) {
if (nchar(time_var) == 26) {
parse_date_time(time_var, orders = "%Y-%m-%d %H:%M:%S %z", tz = "UTC")
} else {
parse_date_time(time_var, orders = "%d/%m/%Y %H:%M", tz = "UTC")
}
}
parse_mydate_novec(df$ActualStartTime[1]) # this works, class is POSIXct
[1] "2020-05-21 18:04:36 UTC"
> parse_mydate_novec(df$ActualStartTime[2]) # this works, class is POSIXct
[1] "2020-09-21 14:14:00 UTC"
So far, so good. I then try vectorising the function using the data masking guidance https://dplyr.tidyverse.org/reference/dplyr_data_masking.html, so I can use it with mutate and using case_when instead of if else:
parse_mydate <- function(time_var) {
case_when (
nchar({{time_var}}) == 26 ~ parse_date_time({{time_var}}, orders = "%Y-%m-%d %H:%M:%S %z", tz = "UTC"),
nchar({{time_var}}) == 16 ~ parse_date_time({{time_var}}, orders = "%d/%m/%Y %H:%M", tz = "UTC"),
TRUE ~ {{time_var}})
}
I then pass this function using mutate, first on one column to test it and then using mutate(across()):
df_test <- df %>%
mutate(ActualStartTime = parse_mydate(ActualStartTime))
df_test <- df %>%
mutate(across(c(ActualStartTime, ActualEndTime), ~parse_mydate(.x)))
However I get the following errors:
Error in `mutate_cols()`:
! Problem with `mutate()` column `ActualStartTime`.
ℹ `ActualStartTime = parse_um_date(ActualStartTime)`.
x must be a `POSIXct/POSIXt` object, not a character vector.
Caused by error in `glubort()`:
! must be a `POSIXct/POSIXt` object, not a character vector.
Warning messages:
1: Problem with `mutate()` column `ActualStartTime`.
ℹ `ActualStartTime = parse_um_date(ActualStartTime)`.
ℹ 3 failed to parse.
2: Problem with `mutate()` column `ActualStartTime`.
ℹ `ActualStartTime = parse_um_date(ActualStartTime)`.
ℹ 3 failed to parse.
This doesn't make sense as I've written the function to pass in a character vector and return a datetime object.
The desired output is a dataframe where all the objects in ActualStartTime and ActualEndTime are in POSIXct format i.e. "2020-05-21 18:04:36 UTC"
I've looked at:
R dplyr using across() efficiently with mutate() and case_when()
and R - How to pass parameters to function in "mutate across"?
and several other questions on parsing datetimes.
I don't know whether I have the logic of the function wrong, the use of case_when, the use of mutate or something else. I've been going round in circles for hours. All help appreciated! With thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
函数 lubridate::fast_strptime 允许指定更多格式,这些格式将依次应用直到成功。
The function
lubridate::fast_strptime
allows the specification of more formats that will be applied in turn till success.