从字符串中提取数字,然后将其作为日期

发布于 2025-01-23 12:47:46 字数 246 浏览 0 评论 0原文

您好,我想提取以下字符串

“ 2020y 3m 1d 16h”

的年,月份和日期,并希望像以下输出:

“ 2020-03-01”(或“ 2020-3-1”,但是日期类型)

我尝试搜索Google,但只能得到[提取某些模式 - 大多数在标点符号中具有模式],[提取所有数字 -很难删除16等。

有人可以帮我吗?

非常感谢您!

Hello I am trying to extract Year, Month, and Date from the following string

"2020y 3m 1d 16h"

and desiring for an output like the following:

"2020-03-01" (or "2020-3-1" but a date type)

I've tried searching up Google but was only able to get [extraction with certain patterns- most of them had patterns in punctuation], [extract all the numbers - had a hard time deleting 16 etc].

Can somebody please help me out with this?

Thank you so much in advance!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

不喜欢何必死缠烂打 2025-01-30 12:47:46

我们可以首先从输入中删除“ __H”字符串,然后从lubridate软件包中使用ymd()函数将其转换为日期。

REGEX:

  • \\ s任何白色空间(要匹配“ 16H”之前的空间)
  • \\ d {1,2}任何发生1至2次的数字(因为小时应从00到23或24,最大时只有两个数字)
library(lubridate)

ymd(gsub("\\s\\d{1,2}h", "", "2020y 3m 1d 16h"))
[1] "2020-03-01"

class(ymd(gsub("\\s\\d{1,2}h", "", "2020y 3m 1d 16h")))
[1] "Date"

We can first remove the " __h" string from the input, then use the ymd() function from the lubridate package to turn it into date.

Regex:

  • \\s any white space (to match the space before "16h")
  • \\d{1,2} any digit that occurs 1 to 2 times (since hour should range from 00 to 23 or 24, which only has two digits at max)
library(lubridate)

ymd(gsub("\\s\\d{1,2}h", "", "2020y 3m 1d 16h"))
[1] "2020-03-01"

class(ymd(gsub("\\s\\d{1,2}h", "", "2020y 3m 1d 16h")))
[1] "Date"
给妤﹃绝世温柔 2025-01-30 12:47:46

将y,m,d字符转换为短破折线,然后使用as.posixct将其转换为DateTime类。这些空间可能不在月或日期中或以上是10个。

as.POSIXct( gsub("[y|m|d]( ){0,1}", "-", test),format="%Y-%m-%d-%Hh")
#[1] "2020-03-01 16:00:00 CST"

这也可以通过以下输入成功:

test <- "2020y12m 1d 16h"

....而Benson23的答案失败。如果您打算丢弃小时信息,则格式字符串可能是:

..., format="%Y-%m-%d"

as.POSIXct( gsub("[y|m|d]( ){0,1}", "-", test),format="%Y-%m-%d")

通常应该提供更多可能的输入来支持代码测试。

Convert the y,m,d characters to short dashes and then use as.POSIXct to convert to datetime class. The spaces would possibly not be present of the month or date were at or above 10.

as.POSIXct( gsub("[y|m|d]( ){0,1}", "-", test),format="%Y-%m-%d-%Hh")
#[1] "2020-03-01 16:00:00 CST"

This also succeeds with input like:

test <- "2020y12m 1d 16h"

....whereas the answer from benson23 fails. If you are intending to throw away the hours information, the format string could be:

..., format="%Y-%m-%d"

as.POSIXct( gsub("[y|m|d]( ){0,1}", "-", test),format="%Y-%m-%d")

You should generally offer a larger set of possible input to support testing of code.

转角预定愛 2025-01-30 12:47:46

strsplit在一个或多个非数字\\ d+上使用strsplit之后,可以使用iSodate

r1 <- simplify2array(strsplit(x, '\\D+')) |> t() |> as.data.frame() |> unname() |>
  do.call(what='ISOdate') |> as.Date()
r1
# [1] "2020-03-01" "2020-12-01" "2020-12-12"
    
class(r1)
[1] "Date"

如果您将as.date删除,您甚至会随着时间的推移获得“ posixt”类。

r2 <- simplify2array(strsplit(x, '\\D+')) |> t() |> as.data.frame() |> unname() |>
  do.call(what='ISOdate')
r2
# [1] "2020-03-01 16:00:00 GMT" "2020-12-01 16:00:00 GMT" "2020-12-12 01:00:00 GMT"
    
class(r2)
# [1] "POSIXct" "POSIXt" 

数据:

x <- c("2020y 3m 1d 16h", "2020y 12m 1d 16h", "2020y 12m 12d 1h")

You could use ISOdate after strsplitting the strings on one or more non-digits \\D+.

r1 <- simplify2array(strsplit(x, '\\D+')) |> t() |> as.data.frame() |> unname() |>
  do.call(what='ISOdate') |> as.Date()
r1
# [1] "2020-03-01" "2020-12-01" "2020-12-12"
    
class(r1)
[1] "Date"

If you'd leave out the as.Date you even would get "POSIXt" class with time.

r2 <- simplify2array(strsplit(x, '\\D+')) |> t() |> as.data.frame() |> unname() |>
  do.call(what='ISOdate')
r2
# [1] "2020-03-01 16:00:00 GMT" "2020-12-01 16:00:00 GMT" "2020-12-12 01:00:00 GMT"
    
class(r2)
# [1] "POSIXct" "POSIXt" 

Data:

x <- c("2020y 3m 1d 16h", "2020y 12m 1d 16h", "2020y 12m 12d 1h")
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文