为什么 lubridate 包中的 dmy() 不适用于 NA?什么是好的解决方法?
我在 lubridate
包中偶然发现了一个特殊的行为:dmy(NA)
引发错误,而不是仅仅返回 NA。当我想要转换其中某些元素为 NA 的列和某些通常可以顺利转换的日期字符串时,这会给我带来问题。
这是最小的例子:
library(lubridate)
df <- data.frame(ID=letters[1:5],
Datum=c("01.01.1990", NA, "11.01.1990", NA, "01.02.1990"))
df_copy <- df
#Question 1: Why does dmy(NA) not return NA, but throws an error?
df$Datum <- dmy(df$Datum)
Error in function (..., sep = " ", collapse = NULL) : invalid separator
df <- df_copy
#Question 2: What's a work around?
#1. Idea: Only convert those elements that are not NAs
#RHS works, but assigning that to the LHS doesn't work (Most likely problem::
#column "Datum" is still of class factor, while the RHS is of class POSIXct)
df[!is.na(df$Datum), "Datum"] <- dmy(df[!is.na(df$Datum), "Datum"])
Using date format %d.%m.%Y.
Warning message:
In `[<-.factor`(`*tmp*`, iseq, value = c(NA_integer_, NA_integer_, :
invalid factor level, NAs generated
df #Only NAs, apparently problem with class of column "Datum"
ID Datum
1 a <NA>
2 b <NA>
3 c <NA>
4 d <NA>
5 e <NA>
df <- df_copy
#2. Idea: Use mapply and apply dmy only to those elements that are not NA
df[, "Datum"] <- mapply(function(x) {if (is.na(x)) {
return(NA)
} else {
return(dmy(x))
}}, df$Datum)
df #Meaningless numbers returned instead of date-objects
ID Datum
1 a 631152000
2 b NA
3 c 632016000
4 d NA
5 e 633830400
总而言之,我有两个问题:1)为什么 dmy(NA) 不起作用?基于大多数其他函数,我认为 NA
的每个转换(例如 dmy())再次返回 NA
(就像 2 +不适用)?如果是有意为之,如何通过 dmy() 函数转换包含 NA 的
data.frame
列?
I stumbled across a peculiar behavior in the lubridate
package: dmy(NA)
trows an error instead of just returning an NA. This causes me problems when I want to convert a column with some elements being NAs and some date-strings that are normally converted without problems.
Here is the minimal example:
library(lubridate)
df <- data.frame(ID=letters[1:5],
Datum=c("01.01.1990", NA, "11.01.1990", NA, "01.02.1990"))
df_copy <- df
#Question 1: Why does dmy(NA) not return NA, but throws an error?
df$Datum <- dmy(df$Datum)
Error in function (..., sep = " ", collapse = NULL) : invalid separator
df <- df_copy
#Question 2: What's a work around?
#1. Idea: Only convert those elements that are not NAs
#RHS works, but assigning that to the LHS doesn't work (Most likely problem::
#column "Datum" is still of class factor, while the RHS is of class POSIXct)
df[!is.na(df$Datum), "Datum"] <- dmy(df[!is.na(df$Datum), "Datum"])
Using date format %d.%m.%Y.
Warning message:
In `[<-.factor`(`*tmp*`, iseq, value = c(NA_integer_, NA_integer_, :
invalid factor level, NAs generated
df #Only NAs, apparently problem with class of column "Datum"
ID Datum
1 a <NA>
2 b <NA>
3 c <NA>
4 d <NA>
5 e <NA>
df <- df_copy
#2. Idea: Use mapply and apply dmy only to those elements that are not NA
df[, "Datum"] <- mapply(function(x) {if (is.na(x)) {
return(NA)
} else {
return(dmy(x))
}}, df$Datum)
df #Meaningless numbers returned instead of date-objects
ID Datum
1 a 631152000
2 b NA
3 c 632016000
4 d NA
5 e 633830400
To summarize, I have two questions: 1) Why does dmy(NA) not work? Based on most other functions I would assume it is good programming practice that every transformation (such as dmy()) of NA
returns NA
again (just as 2 + NA
does)? If this behavior is intended, how do I convert a data.frame
column that includes NA
s via the dmy()
function?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
函数中的错误 (..., sep = " ",collapse = NULL) : 无效分隔符
是由lubridate:::guess_format()
函数引起的。NA
在调用paste()
时作为sep
传递,特别是在fmts <- unlist(mlply(with_seps) 处,粘贴))
。您可以尝试改进lubridate:::guess_format()
来解决此问题。否则,您可以将
NA
更改为字符 ("NA"
) 吗?The
Error in function (..., sep = " ", collapse = NULL) : invalid separator
is being caused by thelubridate:::guess_format()
function. TheNA
is being passed assep
in a call topaste()
, specifically atfmts <- unlist(mlply(with_seps, paste))
. You can have a go at improving thelubridate:::guess_format()
to fix this.Otherwise, could you just change the
NA
to characters ("NA"
)?由于您的日期采用相当简单的格式,因此仅使用
as.Date
并指定适当的format
参数可能会更简单:要查看日期列表
as.Date
使用的格式化代码,请参阅?strptime
Since your dates are in a reasonably straight-forward format, it might be much simpler to just use
as.Date
and specify the appropriateformat
argument:To see a list of the formatting codes used by
as.Date
, see?strptime