如何将两个数据帧与日期进行比较,返回特定间隔内的匹配日期并为新数据帧中的每一行标记不匹配的日期
我有一个日期框,每行中的每个主题都有多个测量日期,另一个数据框有每行中同一主题的多个访问日期(还包括一些 NA)。
我想要的是提取与特定时间间隔内某个主题的访问日期相匹配的测量日期(例如访问日期的 +/- 10 天),并标记不属于该时间间隔的测量日期(例如,使用'FALSE' 或 -99),并保持 NA 不变。
提出了类似的问题此处,但不允许测量日期在访问日期的间隔期内。
set.seed(1)
# Dataframe with measure dates
df1 <- rbind.data.frame(sort(sample(seq(as.Date("2018-01-01"), as.Date("2019-01-01"), by = "day"), 10)),
c(sort(sample(seq(as.Date("2018-06-01"), as.Date("2019-06-01"), by = "day"), 8)), NA, NA),
c(sort(sample(seq(as.Date("2019-06-01"), as.Date("2020-06-01"), by = "day"), 6)), rep(NA, 4)))
names(df1) <- paste("MEASUREDATE", 1:10, sep = "")
myfun <- function(x) as.Date(x, format = "%Y-%m-%d", origin = "1970-01-01")
df1 <- data.frame(lapply(df1, myfun))
df1
# Dataframe with visit dates
df2 <- rbind.data.frame(as.numeric(df1[1, 2:7]), as.numeric(c(df1[2, 4:6], NA, NA, NA)), as.numeric(c(df1[3, 1:2], rep(NA, 4))))
df2 <- data.frame(lapply(df2, myfun))
names(df2) <- paste("VISIT", 1:6, sep = "")
df2
所以新数据框的第一行是这样的:
# New dataframe
df3 <- df1[1, ]
df3[1] <- FALSE
df3[9:10] <- FALSE
df3
你知道如何解决这个问题吗?非常感谢任何帮助。
I have a dateframe with multiple measuring dates for each subjects in each row, and another dataframe with multiple visit dates for the same subject in each row (also including some NA's).
What I want is to extract the measuring dates that match the visit dates for a certain subject within a specific interval (say +/- 10 days from visit date), and tag the measuring dates that do not fall within this interval (e.g, with a 'FALSE' or -99), and keep the NA's as is.
A similar question was asked here, but did not allow for measuring dates to be within an interval period from visit date.
set.seed(1)
# Dataframe with measure dates
df1 <- rbind.data.frame(sort(sample(seq(as.Date("2018-01-01"), as.Date("2019-01-01"), by = "day"), 10)),
c(sort(sample(seq(as.Date("2018-06-01"), as.Date("2019-06-01"), by = "day"), 8)), NA, NA),
c(sort(sample(seq(as.Date("2019-06-01"), as.Date("2020-06-01"), by = "day"), 6)), rep(NA, 4)))
names(df1) <- paste("MEASUREDATE", 1:10, sep = "")
myfun <- function(x) as.Date(x, format = "%Y-%m-%d", origin = "1970-01-01")
df1 <- data.frame(lapply(df1, myfun))
df1
# Dataframe with visit dates
df2 <- rbind.data.frame(as.numeric(df1[1, 2:7]), as.numeric(c(df1[2, 4:6], NA, NA, NA)), as.numeric(c(df1[3, 1:2], rep(NA, 4))))
df2 <- data.frame(lapply(df2, myfun))
names(df2) <- paste("VISIT", 1:6, sep = "")
df2
So the fist row of the new dataframe would be like this:
# New dataframe
df3 <- df1[1, ]
df3[1] <- FALSE
df3[9:10] <- FALSE
df3
Do you know how to tackle this problem? Any help is very much appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这是
data.table
解决方案。在二次到持久的线路中,缺少的访问日将其设置为1-1-1970(不可能进行NA,或者它们将与当前的NA混合。它必须是日期)。如果日期格式是必需的,则可以切换到charact5er并使用您喜欢的任何值...
here is a
data.table
solution. In the second-to-last line, missing visitdates are set to 1-1-1970 (NA is not possible, or they would mix with the current NA.. and it will have to be a date).If the date-format is nog necessairy, you can switch to charact5er and fill use any value you like...
正如 Wimpel 所说,同一列中不能有逻辑值和日期。所以我将使用 1970-01-01 作为 FALSE 值。
使用 dplyr 的解决方案
输出
存在一些 NA 值,因为某些访问日期是 NA。因此 check_within_10d 函数无法确定缺失的访问日期之一是否在测量日期的 10 个日期之内。
如果您想忽略支票中缺失的访问日期,请使用
输出
As Wimpel said, you cannot have a logical and a Date in the same column. So I will use 1970-01-01 as the FALSE value.
A solution using
dplyr
Output
Some NA values are there because some visit date are NA. So the check_within_10d function cannot be sure that one of the missing visit dates is within 10 dates of a measurement date.
If you want to ignore the missing visit dates in your check, use
Output