在 R 中,如何创建一个新的日期变量,采用索引日期变量之后最接近的日期值?

发布于 2025-01-19 23:48:09 字数 764 浏览 0 评论 0原文

我在R Studio中的数据框如下:

StudyID FITDate.1 ScopeDate.1 ScopeDate.2 ScopeDate.3 ScopeDate.4
1 2014-05-15 2010-06-02 2014-05-28 2014-08-01 2015-10-27
2 2017-11-29 2018-02-27     
3 2015-10-04 2016-06-24 2017-01-18

我有一个变量“ Fitdate.1”指示拟合测试的日期,而几个变量“ scopedate.x”表示多个范围测试的日期。

在我的研究中,一个人只能有一个日期进行拟合测试,但可以有多个范围的日期。在临床上,如果一个人进行了合适的测试,那么他将被转诊为进行示波器测试。但是,由于其他原因,此人可能会收到范围测试。

因此,如果示波器测试的日期是在拟合测试日期之后的,那么我们将定义它们高度相关。

我想创建一个可变的“ FitrelatedScopedate”,以包括相关范围的日期。例如,在研究行== 1中,“ fitdate.1”的日期是2014-05-15,它介于scopedate.1(2010-06-02)和scopedate.2(2014-05-- 28)。因此,scopedate的日期值2014-05-28是我需要的,我将使用2014-05-28作为拟合范围日期,并将其写入新的变量“ FitRelelatedScopedate”。

我认为我必须使用循环语法,但是我没有经验来意识到这一点。您是否有解决类似问题的经验?您知道有什么代码可以意识到吗?谢谢,任何帮助都将受到赞赏。

My dataframe in R studio is as follows:

StudyID FITDate.1 ScopeDate.1 ScopeDate.2 ScopeDate.3 ScopeDate.4
1 2014-05-15 2010-06-02 2014-05-28 2014-08-01 2015-10-27
2 2017-11-29 2018-02-27     
3 2015-10-04 2016-06-24 2017-01-18

I have a variable "FITDate.1" indicates the date for FIT test, and several variables "ScopeDate.x" indicates the dates for multiple scope tests.

In my research, a person can have only one date for FIT test, but can have multiple dates for scope. Clinically, if a person has a FIT test, then he will be referred to undertake scope test. However, this person may receive scope tests for other reasons.

So if the date of a scope test is right after the date of a FIT test, then we will define them highly related.

I want to create a variable "FITrelatedscopedate" to include the dates of FIT related scopes. For example, in the row of StudyID==1, the date of "FITDate.1"is 2014-05-15, which is right between ScopeDate.1 (2010-06-02) and ScopeDate.2 (2014-05-28). So the date value 2014-05-28 of ScopeDate.2 is what i need, and I will use 2014-05-28 as the FIT related scope date and write it in the new variable "FITrelatedscopedate".

I think I have to use loop syntax, but i had no experience to realize it. Do you have any experience to solve similar problem? Do you know any codes to realize it? Thanks, any help are appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

拒绝两难 2025-01-26 23:48:09

这是一种使用tidyverse的方法,假设您从两个长数据开始。框架,一种用于拟合测试,另一种用于内窥镜检查。

df_fit <- data.frame(
  StudyID = 1:3,
  FITDate = as.Date(c("2014-05-15", "2017-11-29", "2015-10-04"))
)
df_fit

  StudyID    FITDate
1       1 2014-05-15
2       2 2017-11-29
3       3 2015-10-04

df_scope <- data.frame(
  StudyID = c(1,1,1,1,2,3,3),
  ScopeDate = as.Date(c("2010-06-02", "2014-05-28", "2014-08-01", "2015-10-27", "2018-02-27", 
                        "2016-06-24", "2017-01-18"))
)
df_scope

  StudyID  ScopeDate
1       1 2010-06-02
2       1 2014-05-28
3       1 2014-08-01
4       1 2015-10-27
5       2 2018-02-27
6       3 2016-06-24
7       3 2017-01-18

首先,您可以通过studingId进行left_join,以将范围日期添加到拟合数据中。然后,您可以filter才能在拟合测试后保持范围日期。对于每个studyId,使用slice仅保留第一行(假设日期按时间顺序排列...如果不是时间,请添加repand> repand> repand> repanding> repanding(scopedate)首先在管道中 - 让我知道您是否需要此帮助)。

然后,您可以right_join返回df_fit,以便没有内窥镜检查的那些适合测试日期将具有scopedatena 。使用突变的最终语句将计算内窥镜检查和拟合测试之间的持续时间。

library(tidyverse)

left_join(
  df_fit,
  df_scope,
  by = "StudyID"
) %>%
  filter(ScopeDate > FITDate) %>%
  group_by(StudyID) %>%
  slice(1) %>%
  right_join(df_fit) %>%
  mutate(Duration = ScopeDate - FITDate)

输出

  StudyID FITDate    ScopeDate  Duration
    <dbl> <date>     <date>     <drtn>  
1       1 2014-05-15 2014-05-28  13 days
2       2 2017-11-29 2018-02-27  90 days
3       3 2015-10-04 2016-06-24 264 days

让我知道这是否适合您。 data.table方法可以考虑使用速度更快并且具有很大的数据集。

如果您需要持续时间作为数字列,则可以使用as.numeric(scopedate -fitdate)

Here is one approach with tidyverse assuming you start with two long data.frames, one for FIT testing, and the other for endoscopy.

df_fit <- data.frame(
  StudyID = 1:3,
  FITDate = as.Date(c("2014-05-15", "2017-11-29", "2015-10-04"))
)
df_fit

  StudyID    FITDate
1       1 2014-05-15
2       2 2017-11-29
3       3 2015-10-04

df_scope <- data.frame(
  StudyID = c(1,1,1,1,2,3,3),
  ScopeDate = as.Date(c("2010-06-02", "2014-05-28", "2014-08-01", "2015-10-27", "2018-02-27", 
                        "2016-06-24", "2017-01-18"))
)
df_scope

  StudyID  ScopeDate
1       1 2010-06-02
2       1 2014-05-28
3       1 2014-08-01
4       1 2015-10-27
5       2 2018-02-27
6       3 2016-06-24
7       3 2017-01-18

First, you can do a left_join by the StudyID to add the scope dates to the FIT data. Then, you can filter to only keep scope dates after FIT testing. For each StudyID, use slice to retain only the first row (this assumes dates are in chronological order...if not, add arrange(ScopeDate) first in the pipe - let me know if you need help with this).

Then, you can right_join back to df_fit so that those FIT testing dates without endoscopy will have NA for the ScopeDate. The final statement with mutate will calculate the time duration between endoscopy and FIT testing.

library(tidyverse)

left_join(
  df_fit,
  df_scope,
  by = "StudyID"
) %>%
  filter(ScopeDate > FITDate) %>%
  group_by(StudyID) %>%
  slice(1) %>%
  right_join(df_fit) %>%
  mutate(Duration = ScopeDate - FITDate)

Output

  StudyID FITDate    ScopeDate  Duration
    <dbl> <date>     <date>     <drtn>  
1       1 2014-05-15 2014-05-28  13 days
2       2 2017-11-29 2018-02-27  90 days
3       3 2015-10-04 2016-06-24 264 days

Let me know if this works for you. A data.table approach can be considered if you need something faster and have a very large dataset.

If you need the Duration as a numeric column, you can use as.numeric(ScopeDate - FITDate).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文