将年月(“yyyy-mm”格式)转换为日期?

发布于 2025-01-12 18:01:43 字数 350 浏览 0 评论 0原文

我有一个如下所示的数据集:

Month    count
2009-01  12
2009-02  310
2009-03  2379
2009-04  234
2009-05  14
2009-08  1
2009-09  34
2009-10  2386

我想绘制数据(月份为 x 值,计数为 y 值)。由于数据存在空白,我想将月份的信息转换为日期。我尝试过:

as.Date("2009-03", "%Y-%m")

但没有成功。怎么了?似乎 as.Date() 也需要一天,并且无法设置该天的标准值?哪个功能可以解决我的问题?

I have a dataset that looks like this:

Month    count
2009-01  12
2009-02  310
2009-03  2379
2009-04  234
2009-05  14
2009-08  1
2009-09  34
2009-10  2386

I want to plot the data (months as x values and counts as y values). Since there are gaps in the data, I want to convert the Information for the Month into a date. I tried:

as.Date("2009-03", "%Y-%m")

But it did not work. Whats wrong? It seems that as.Date() requires also a day and is not able to set a standard value for the day? Which function solves my problem?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

妞丶爷亲个 2025-01-19 18:01:43

由于日期对应于一个数值和一个开始日期,因此您确实需要该日期。如果您确实需要数据采用日期格式,则可以通过将其粘贴到日期来手动将日期固定为每月的第一天:

month <- "2009-03"
as.Date(paste(month, "-01", sep=""))

Since dates correspond to a numeric value and a starting date, you indeed need the day. If you really need your data to be in Date format, you can just fix the day to the first of each month manually by pasting it to the date:

month <- "2009-03"
as.Date(paste(month, "-01", sep=""))
忆沫 2025-01-19 18:01:43

试试这个。 (这里我们使用 text=Lines 来保持示例自包含,但实际上我们会将其替换为文件名。)

Lines <- "2009-01  12
2009-02  310
2009-03  2379
2009-04  234
2009-05  14
2009-08  1
2009-09  34
2009-10  2386"

library(zoo)
z <- read.zoo(text = Lines, FUN = as.yearmon)
plot(z)

X 轴对于此数据来说不太漂亮,但如果您有更多数据现实中可能没问题,或者您可以使用 ?plot.zoo 示例部分中所示的精美 X 轴代码。

上面创建的动物园系列 z 具有 "yearmon" 时间索引,如下所示:

> z
Jan 2009 Feb 2009 Mar 2009 Apr 2009 May 2009 Aug 2009 Sep 2009 Oct 2009 
      12      310     2379      234       14        1       34     2386 

"yearmon" 可以单独使用还有:

> as.yearmon("2000-03")
[1] "Mar 2000"

注意:

  1. "yearmon" 类对象按日历顺序排序。

  2. 这将以等间隔绘制每月点,这可能是想要的;但是,如果需要以与每月天数成比例的不等间隔绘制点,则将 z 的索引转换为 "Date" 类: time(z) <- as.Date(time(z)) .

Try this. (Here we use text=Lines to keep the example self contained but in reality we would replace it with the file name.)

Lines <- "2009-01  12
2009-02  310
2009-03  2379
2009-04  234
2009-05  14
2009-08  1
2009-09  34
2009-10  2386"

library(zoo)
z <- read.zoo(text = Lines, FUN = as.yearmon)
plot(z)

The X axis is not so pretty with this data but if you have more data in reality it might be ok or you can use the code for a fancy X axis shown in the examples section of ?plot.zoo .

The zoo series, z, that is created above has a "yearmon" time index and looks like this:

> z
Jan 2009 Feb 2009 Mar 2009 Apr 2009 May 2009 Aug 2009 Sep 2009 Oct 2009 
      12      310     2379      234       14        1       34     2386 

"yearmon" can be used alone as well:

> as.yearmon("2000-03")
[1] "Mar 2000"

Note:

  1. "yearmon" class objects sort in calendar order.

  2. This will plot the monthly points at equally spaced intervals which is likely what is wanted; however, if it were desired to plot the points at unequally spaced intervals spaced in proportion to the number of days in each month then convert the index of z to "Date" class: time(z) <- as.Date(time(z)) .

你与昨日 2025-01-19 18:01:43

如果您需要日期格式为日期,最简洁的解决方案是:

library(zoo)
month <- "2000-03"
as.Date(as.yearmon(month))
[1] "2000-03-01"

as.Date 将为您将每个月的第一天固定为yearmon 对象。

The most concise solution if you need the dates to be in Date format:

library(zoo)
month <- "2000-03"
as.Date(as.yearmon(month))
[1] "2000-03-01"

as.Date will fix the first day of each month to a yearmon object for you.

没有你我更好 2025-01-19 18:01:43

您还可以使用 lubridate 包中的 parse_date_timefast_strptime 函数来实现此目的:

> parse_date_time(dates1, "ym")
[1] "2009-01-01 UTC" "2009-02-01 UTC" "2009-03-01 UTC"

> fast_strptime(dates1, "%Y-%m")
[1] "2009-01-01 UTC" "2009-02-01 UTC" "2009-03-01 UTC"

这两者之间的区别在于 parse_date_time< /code> 允许 lubridate 风格的格式规范,而 fast_strptime 需要与 strptime 相同的格式规范。

要指定时区,您可以使用 tz 参数:

> parse_date_time(dates1, "ym", tz = "CET")
[1] "2009-01-01 CET" "2009-02-01 CET" "2009-03-01 CET"

当日期时间数据存在不规则情况时,您可以使用 truncated 参数来指定不规则情况的数量允许:

> parse_date_time(dates2, "ymdHMS", truncated = 3)
[1] "2012-06-01 12:23:00 UTC" "2012-06-01 12:00:00 UTC" "2012-06-01 00:00:00 UTC"

使用的数据:

dates1 <- c("2009-01","2009-02","2009-03")
dates2 <- c("2012-06-01 12:23","2012-06-01 12",'2012-06-01")

You could also achieve this with the parse_date_time or fast_strptime functions from the lubridate-package:

> parse_date_time(dates1, "ym")
[1] "2009-01-01 UTC" "2009-02-01 UTC" "2009-03-01 UTC"

> fast_strptime(dates1, "%Y-%m")
[1] "2009-01-01 UTC" "2009-02-01 UTC" "2009-03-01 UTC"

The difference between those two is that parse_date_time allows for lubridate-style format specification, while fast_strptime requires the same format specification as strptime.

For specifying the timezone, you can use the tz-parameter:

> parse_date_time(dates1, "ym", tz = "CET")
[1] "2009-01-01 CET" "2009-02-01 CET" "2009-03-01 CET"

When you have irregularities in your date-time data, you can use the truncated-parameter to specify how many irregularities are allowed:

> parse_date_time(dates2, "ymdHMS", truncated = 3)
[1] "2012-06-01 12:23:00 UTC" "2012-06-01 12:00:00 UTC" "2012-06-01 00:00:00 UTC"

Used data:

dates1 <- c("2009-01","2009-02","2009-03")
dates2 <- c("2012-06-01 12:23","2012-06-01 12",'2012-06-01")
漆黑的白昼 2025-01-19 18:01:43

使用 anytime 包:

library(anytime)

anydate("2009-01")
# [1] "2009-01-01"

Using anytime package:

library(anytime)

anydate("2009-01")
# [1] "2009-01-01"
残月升风 2025-01-19 18:01:43

事实上,正如上面提到的(以及其他地方),为了将字符串转换为日期,您需要月份的特定日期。从 as.Date() 手册页:

如果日期字符串没有完全指定日期,则返回的答案可能是系统特定的。最常见的行为是假设缺失的年、月或日就是当前的年、月或日。如果指定的日期不正确,可靠的实现将给出错误,并且该日期将报告为 NA。不幸的是,一些常见的实现(例如glibc)并不可靠,并且会猜测预期的含义。

一个简单的解决方案是将日期 "01" 粘贴到每个日期,并使用 strptime() 将其指示为该月的第一天。


对于那些寻求在 R 中处理日期和时间的更多背景知识的人:

在 R 中,时间使用 POSIXctPOSIXlt 类和日期使用 Date 类。

日期存储为自 1970 年 1 月 1 日以来的天数,时间存储为自 1970 年 1 月 1 日以来的秒数。

因此,例如:

d <- as.Date("1971-01-01")
unclass(d)  # one year after 1970-01-01
# [1] 365

pct <- Sys.time()  # in POSIXct
unclass(pct)  # number of seconds since 1970-01-01
# [1] 1450276559
plt <- as.POSIXlt(pct)
up <- unclass(plt)  # up is now a list containing the components of time
names(up)
# [1] "sec"    "min"    "hour"   "mday"   "mon"    "year"   "wday"   "yday"   "isdst"  "zone"  
# [11] "gmtoff"
up$hour
# [1] 9

要对日期和时间执行操作:

plt - as.POSIXlt(d)
# Time difference of 16420.61 days

要处理日期,您可以使用 < code>strptime() (从手册页借用这些示例):

strptime("20/2/06 11:16:16.683", "%d/%m/%y %H:%M:%OS")
# [1] "2006-02-20 11:16:16 EST"

# And in vectorized form:
dates <- c("1jan1960", "2jan1960", "31mar1960", "30jul1960")
strptime(dates, "%d%b%Y")
# [1] "1960-01-01 EST" "1960-01-02 EST" "1960-03-31 EST" "1960-07-30 EDT"

Indeed, as has been mentioned above (and elsewhere on SO), in order to convert the string to a date, you need a specific date of the month. From the as.Date() manual page:

If the date string does not specify the date completely, the returned answer may be system-specific. The most common behaviour is to assume that a missing year, month or day is the current one. If it specifies a date incorrectly, reliable implementations will give an error and the date is reported as NA. Unfortunately some common implementations (such as glibc) are unreliable and guess at the intended meaning.

A simple solution would be to paste the date "01" to each date and use strptime() to indicate it as the first day of that month.


For those seeking a little more background on processing dates and times in R:

In R, times use POSIXct and POSIXlt classes and dates use the Date class.

Dates are stored as the number of days since January 1st, 1970 and times are stored as the number of seconds since January 1st, 1970.

So, for example:

d <- as.Date("1971-01-01")
unclass(d)  # one year after 1970-01-01
# [1] 365

pct <- Sys.time()  # in POSIXct
unclass(pct)  # number of seconds since 1970-01-01
# [1] 1450276559
plt <- as.POSIXlt(pct)
up <- unclass(plt)  # up is now a list containing the components of time
names(up)
# [1] "sec"    "min"    "hour"   "mday"   "mon"    "year"   "wday"   "yday"   "isdst"  "zone"  
# [11] "gmtoff"
up$hour
# [1] 9

To perform operations on dates and times:

plt - as.POSIXlt(d)
# Time difference of 16420.61 days

And to process dates, you can use strptime() (borrowing these examples from the manual page):

strptime("20/2/06 11:16:16.683", "%d/%m/%y %H:%M:%OS")
# [1] "2006-02-20 11:16:16 EST"

# And in vectorized form:
dates <- c("1jan1960", "2jan1960", "31mar1960", "30jul1960")
strptime(dates, "%d%b%Y")
# [1] "1960-01-01 EST" "1960-01-02 EST" "1960-03-31 EST" "1960-07-30 EDT"
岛歌少女 2025-01-19 18:01:43

一种使用 lubridate 中的 ym 的方法。

月份可以是数字、缩写月份或带有各种分隔符的完整月份名称(即使没有分隔符),例如

library(lubridate)

ym(c("2012/September", "2012-Aug", "2012.07", 201204))
[1] "2012-09-01" "2012-08-01" "2012-07-01" "2012-04-01"

在给定的数据上:

ym(dat$Month)
[1] "2009-01-01" "2009-02-01" "2009-03-01" "2009-04-01" "2009-05-01"
[6] "2009-08-01" "2009-09-01" "2009-10-01"

请注意,如果您有的话,还有 my反之亦然,例如 Sep/2022

数据

dat <- structure(list(Month = c("2009-01", "2009-02", "2009-03", "2009-04",
"2009-05", "2009-08", "2009-09", "2009-10"), count = c(12L, 310L,
2379L, 234L, 14L, 1L, 34L, 2386L)), class = "data.frame", row.names = c(NA,
-8L))

A way using ym from lubridate.

The month can either be a number, an abbreviated month or a full month name with a variety of separators (even without separator), e.g.

library(lubridate)

ym(c("2012/September", "2012-Aug", "2012.07", 201204))
[1] "2012-09-01" "2012-08-01" "2012-07-01" "2012-04-01"

on the given data:

ym(dat$Month)
[1] "2009-01-01" "2009-02-01" "2009-03-01" "2009-04-01" "2009-05-01"
[6] "2009-08-01" "2009-09-01" "2009-10-01"

Note that there's also my if you have it the other way round, e.g. Sep/2022.

Data

dat <- structure(list(Month = c("2009-01", "2009-02", "2009-03", "2009-04",
"2009-05", "2009-08", "2009-09", "2009-10"), count = c(12L, 310L,
2379L, 234L, 14L, 1L, 34L, 2386L)), class = "data.frame", row.names = c(NA,
-8L))
风吹雪碎 2025-01-19 18:01:43

我认为@ben-rollert 的解决方案是一个很好的解决方案。

如果您想在新包内的函数中使用此解决方案,您必须小心。

开发包时,建议使用语法 packagename::function_name() (请参阅 http://kbroman.org/pkg_primer/pages/depends.html)。

在这种情况下,您必须使用由 zoo 库定义的 as.Date() 版本。

这是一个示例:

> devtools::session_info()
Session info ----------------------------------------------------------------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.3.1 (2016-06-21)
 system   x86_64, linux-gnu           
 ui       RStudio (1.0.35)            
 language (EN)                        
 collate  C                           
 tz       <NA>                        
 date     2016-11-09                  

Packages --------------------------------------------------------------------------------------------------------------------------------------------------------

 package  * version date       source        
 devtools   1.12.0  2016-06-24 CRAN (R 3.3.1)
 digest     0.6.10  2016-08-02 CRAN (R 3.2.3)
 memoise    1.0.0   2016-01-29 CRAN (R 3.2.3)
 withr      1.0.2   2016-06-20 CRAN (R 3.2.3)

> as.Date(zoo::as.yearmon("1989-10", "%Y-%m")) 
Error in as.Date.default(zoo::as.yearmon("1989-10", "%Y-%m")) : 
  do not know how to convert 'zoo::as.yearmon("1989-10", "%Y-%m")' to class “Date”

> zoo::as.Date(zoo::as.yearmon("1989-10", "%Y-%m"))
[1] "1989-10-01"

因此,如果您正在开发一个包,最好的做法是使用:

zoo::as.Date(zoo::as.yearmon("1989-10", "%Y-%m"))

I think @ben-rollert's solution is a good solution.

You just have to be careful if you want to use this solution in a function inside a new package.

When developping packages, it's recommended to use the syntaxe packagename::function_name() (see http://kbroman.org/pkg_primer/pages/depends.html).

In this case, you have to use the version of as.Date() defined by the zoo library.

Here is an example :

> devtools::session_info()
Session info ----------------------------------------------------------------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.3.1 (2016-06-21)
 system   x86_64, linux-gnu           
 ui       RStudio (1.0.35)            
 language (EN)                        
 collate  C                           
 tz       <NA>                        
 date     2016-11-09                  

Packages --------------------------------------------------------------------------------------------------------------------------------------------------------

 package  * version date       source        
 devtools   1.12.0  2016-06-24 CRAN (R 3.3.1)
 digest     0.6.10  2016-08-02 CRAN (R 3.2.3)
 memoise    1.0.0   2016-01-29 CRAN (R 3.2.3)
 withr      1.0.2   2016-06-20 CRAN (R 3.2.3)

> as.Date(zoo::as.yearmon("1989-10", "%Y-%m")) 
Error in as.Date.default(zoo::as.yearmon("1989-10", "%Y-%m")) : 
  do not know how to convert 'zoo::as.yearmon("1989-10", "%Y-%m")' to class “Date”

> zoo::as.Date(zoo::as.yearmon("1989-10", "%Y-%m"))
[1] "1989-10-01"

So if you're developping a package, the good practice is to use :

zoo::as.Date(zoo::as.yearmon("1989-10", "%Y-%m"))
给妤﹃绝世温柔 2025-01-19 18:01:43

tidyverse 最近添加了 clock添加lubridate,它有一些很好的功能:

library(clock)

x <- year_month_day_parse(df$Month, format = "%Y-%m", precision = "month") 
# <year_month_day<month>[8]>
# [1] "2009-01" "2009-02" "2009-03" "2009-04" "2009-05" "2009-08" "2009-09" "2009-10"

日期操作和提取

它的输出是一个年-月-日向量,您可以在其中仍然可以进行日期算术并应用其他常见功能,例如预期:

sort(x, decreasing = T)
# <year_month_day<month>[8]>
# [1] "2009-10" "2009-09" "2009-08" "2009-05" "2009-04" "2009-03" "2009-02" "2009-01"

add_months(x, 3)
# <year_month_day<month>[8]>
# [1] "2009-04" "2009-05" "2009-06" "2009-07" "2009-08" "2009-11" "2009-12" "2010-01"

add_years(x, -2)
# <year_month_day<month>[8]>
# [1] "2007-01" "2007-02" "2007-03" "2007-04" "2007-05" "2007-08" "2007-09" "2007-10"

get_month(x)
# [1]  1  2  3  4  5  8  9 10

如果需要,您还可以使用 set_day 设置日期:

set_day(x, 1)
<year_month_day<day>[8]>
[1] "2009-01-01" "2009-02-01" "2009-03-01" "2009-04-01" "2009-05-01" "2009-08-01"
[7] "2009-09-01" "2009-10-01"

处理无效日期

或者,如果您想使用此结构干净地获取每个月的最后一天,invalid_* 函数集可以提供帮助:

# not 31 days in Feb, Apr, Sep
y <- set_day(x, 31)
# <year_month_day<day>[8]>
# [1] "2009-01-31" "2009-02-31" "2009-03-31" "2009-04-31" "2009-05-31" "2009-08-31"
# [7] "2009-09-31" "2009-10-31"

invalid_any(y)
[1] TRUE

invalid_detect(y)
[1] FALSE  TRUE FALSE  TRUE FALSE FALSE  TRUE FALSE

您可以使用 invalid_resolve 处理无效日期,也可以使用 invalid_remove 删除它们:

invalid_resolve(y, invalid = "previous")
<year_month_day<day>[8]>
[1] "2009-01-31" "2009-02-28" "2009-03-31" "2009-04-30" "2009-05-31" "2009-08-31"
[7] "2009-09-30" "2009-10-31"

来自文档你可以为 invalid 参数指定以下值来处理无效日期:

“上一个”:上一个有效时刻。

“前一天”:前一个有效日期,保留当天的时间。

“下一个”:下一个有效时刻。

“下一天”:下一个有效的时间,保留当天的时间。

“overflow”:溢出输入无效的天数
经过。一天中的时间被删除。

“overflow-day”:溢出输入的天数
无效。保留一天中的时间。

“NA”:用 NA 替换无效日期。

“错误”:无效日期错误。

tidyverse recently added the clock package in addition to lubridate that has some nice functionality for this:

library(clock)

x <- year_month_day_parse(df$Month, format = "%Y-%m", precision = "month") 
# <year_month_day<month>[8]>
# [1] "2009-01" "2009-02" "2009-03" "2009-04" "2009-05" "2009-08" "2009-09" "2009-10"

Date Manipulation and Extraction

The output of this is a year-month-day vector where you can still do date arithmetic and apply other common functions as expected:

sort(x, decreasing = T)
# <year_month_day<month>[8]>
# [1] "2009-10" "2009-09" "2009-08" "2009-05" "2009-04" "2009-03" "2009-02" "2009-01"

add_months(x, 3)
# <year_month_day<month>[8]>
# [1] "2009-04" "2009-05" "2009-06" "2009-07" "2009-08" "2009-11" "2009-12" "2010-01"

add_years(x, -2)
# <year_month_day<month>[8]>
# [1] "2007-01" "2007-02" "2007-03" "2007-04" "2007-05" "2007-08" "2007-09" "2007-10"

get_month(x)
# [1]  1  2  3  4  5  8  9 10

You can also set the day, if you need it, with set_day:

set_day(x, 1)
<year_month_day<day>[8]>
[1] "2009-01-01" "2009-02-01" "2009-03-01" "2009-04-01" "2009-05-01" "2009-08-01"
[7] "2009-09-01" "2009-10-01"

Handling Invalid Dates

Or if you wanted to cleanly get the last day of every month with this structure, the invalid_* set of functions can help:

# not 31 days in Feb, Apr, Sep
y <- set_day(x, 31)
# <year_month_day<day>[8]>
# [1] "2009-01-31" "2009-02-31" "2009-03-31" "2009-04-31" "2009-05-31" "2009-08-31"
# [7] "2009-09-31" "2009-10-31"

invalid_any(y)
[1] TRUE

invalid_detect(y)
[1] FALSE  TRUE FALSE  TRUE FALSE FALSE  TRUE FALSE

You can handle invalid dates with invalid_resolve or you can use drop them with invalid_remove:

invalid_resolve(y, invalid = "previous")
<year_month_day<day>[8]>
[1] "2009-01-31" "2009-02-28" "2009-03-31" "2009-04-30" "2009-05-31" "2009-08-31"
[7] "2009-09-30" "2009-10-31"

From the documentation you can specify the following values for the invalid argument to handle invalid dates:

"previous": The previous valid instant in time.

"previous-day": The previous valid day in time, keeping the time of day.

"next": The next valid instant in time.

"next-day": The next valid day in time, keeping the time of day.

"overflow": Overflow by the number of days that the input is invalid
by. Time of day is dropped.

"overflow-day": Overflow by the number of days that the input is
invalid by. Time of day is kept.

"NA": Replace invalid dates with NA.

"error": Error on invalid dates.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文