如何在R中使用两种不同类型的日期格式将字符转换为日期?

发布于 01-20 20:53 字数 228 浏览 2 评论 0原文

我有一个巨大的数据集,其中超过200万个OBS,所有专栏的类都是字符类型。我需要将其中一种转换为迄今为止的格式DD/mm/yyyy,但是日期是这样写的:

dates <- c("2022-04-08", "26/01/2021", "14/07/2021", "2021-12-27")

我已经尝试了在其他帖子中发现的一些解释,但它们似乎都不适合我。一组日期总是变成NA。

I have a huge dataset with over 2 million obs and all column's classes are character type. I need to convert one of them to date format dd/mm/yyyy, but the dates are written like this:

dates <- c("2022-04-08", "26/01/2021", "14/07/2021", "2021-12-27")

I've already tried some explanations I found in other posts but none of them seemed to work for me. One groupe of dates always turns into NA.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

笑梦风尘2025-01-27 20:53:39

您可以做类似的事情:

format_ymd  <- as.Date(dates, format = "%Y-%m-%d")
format_dmy  <- as.Date(dates, format = "%d/%m/%Y")
as.Date(ifelse(is.na(format_ymd), format_dmy, format_ymd), origin = "1970-01-01")
# [1] "2022-04-08" "2021-01-26" "2021-07-14" "2021-12-27"

You can do something like:

format_ymd  <- as.Date(dates, format = "%Y-%m-%d")
format_dmy  <- as.Date(dates, format = "%d/%m/%Y")
as.Date(ifelse(is.na(format_ymd), format_dmy, format_ymd), origin = "1970-01-01")
# [1] "2022-04-08" "2021-01-26" "2021-07-14" "2021-12-27"
So要识趣2025-01-27 20:53:39

1) Base R 使用 as.Date 并显示格式向量。没有使用任何包。

as.Date(dates, format = ifelse(grepl("/", dates), "%d/%m/%Y", "%Y-%m-%d"))
## [1] "2022-04-08" "2021-01-26" "2021-07-14" "2021-12-27"

2) Base R - 2 另一种方法是将 dd/mm/yyyy 转换为 yyyy-mm-dd,然后仅使用 as.Date。没有使用任何包。

as.Date(sub("(..)/(..)/(....)", "\\3-\\2-\\1", dates))
## [1] "2022-04-08" "2021-01-26" "2021-07-14" "2021-12-27"

3) Base R - 3 这个使用 tryFormats= 参数。

do.call("c", lapply(dates, as.Date, tryFormats = c("%d/%m/%Y", "%Y-%m-%d")))
## [1] "2022-04-08" "2021-01-26" "2021-07-14" "2021-12-27"

4) lubridate 使用 lubridate 使用 parse_date_time ,然后将其转换为 Date 类。

library(lubridate)

as.Date(parse_date_time(dates, c("ymd", "dmy")))
## [1] "2022-04-08" "2021-01-26" "2021-07-14" "2021-12-27"

5)coalesce 我们可以在dplyr中使用coalesce。它采用它找到的第一个非 NA。

library(dplyr)

coalesce(as.Date(dates), as.Date(dates, "%d/%m/%Y"))
## [1] "2022-04-08" "2021-01-26" "2021-07-14" "2021-12-27"

添加更新

(3)。

1) Base R Use as.Date with the format vector shown. No packages are used.

as.Date(dates, format = ifelse(grepl("/", dates), "%d/%m/%Y", "%Y-%m-%d"))
## [1] "2022-04-08" "2021-01-26" "2021-07-14" "2021-12-27"

2) Base R - 2 Another approach is to convert the dd/mm/yyyy to yyyy-mm-dd and then just use as.Date. No packages are used.

as.Date(sub("(..)/(..)/(....)", "\\3-\\2-\\1", dates))
## [1] "2022-04-08" "2021-01-26" "2021-07-14" "2021-12-27"

3) Base R - 3 This one uses the tryFormats= argument.

do.call("c", lapply(dates, as.Date, tryFormats = c("%d/%m/%Y", "%Y-%m-%d")))
## [1] "2022-04-08" "2021-01-26" "2021-07-14" "2021-12-27"

4) lubridate With lubridate use parse_date_time and then convert that to Date class.

library(lubridate)

as.Date(parse_date_time(dates, c("ymd", "dmy")))
## [1] "2022-04-08" "2021-01-26" "2021-07-14" "2021-12-27"

5) coalesce We can use coalesce in dplyr. It takes the first non-NA it finds.

library(dplyr)

coalesce(as.Date(dates), as.Date(dates, "%d/%m/%Y"))
## [1] "2022-04-08" "2021-01-26" "2021-07-14" "2021-12-27"

Update

Added (3).

人│生佛魔见2025-01-27 20:53:39

类似于Samr的方法

data.table::fifelse(
  grepl("^\\d{4}", dates),
  as.Date(dates,"%Y-%m-%d"),
  as.Date(dates, "%d/%m/%Y")
)

Similar to SamR's approach

data.table::fifelse(
  grepl("^\\d{4}", dates),
  as.Date(dates,"%Y-%m-%d"),
  as.Date(dates, "%d/%m/%Y")
)
美男兮2025-01-27 20:53:39

使用时钟包,您可以向date_parse()提供多种格式,它会按顺序尝试它们。对于像这样格式截然不同的情况来说,这很好。

library(clock)

dates <- c("2022-04-08", "26/01/2021", "14/07/2021", "2021-12-27")

# Tries each `format` in order. Stops on first success.
date_parse(
  dates,
  format = c("%Y-%m-%d", "%d/%m/%Y")
)
#> [1] "2022-04-08" "2021-01-26" "2021-07-14" "2021-12-27"

reprex 包 (v2.0.1) 创建于 2022 年 4 月 12 日

With the clock package, you can supply multiple formats to date_parse() and it will try them in order. It is nice for cases like this where the formats are drastically different.

library(clock)

dates <- c("2022-04-08", "26/01/2021", "14/07/2021", "2021-12-27")

# Tries each `format` in order. Stops on first success.
date_parse(
  dates,
  format = c("%Y-%m-%d", "%d/%m/%Y")
)
#> [1] "2022-04-08" "2021-01-26" "2021-07-14" "2021-12-27"

Created on 2022-04-12 by the reprex package (v2.0.1)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文