在SAS中识别背靠背日期

发布于 2025-02-11 18:42:01 字数 780 浏览 2 评论 0原文

我有一个看起来像这样的数据集:

ID   start_date   end_date
1    01/01/2022   01/02/2022
1    01/02/2022   01/05/2022
1    01/06/2022   01/07/2022
2    01/09/2019   01/22/2022
2    06/07/2014   09/10/2015
3    11/10/2012   02/01/2013

我正在尝试创建一个虚拟指示器来显示背靠背的事件。到目前为止,我已经能够执行以下操作:

data df_1;
    set df_2;
    by ID end_date;
    lag_epi_e = lag(end_date);
    if not (first.ID) then do;
    date_diff= start_date- lag(end_date);
    end;
    format lag_epi_e date9.;
run;

此代码的问题是,它将创建一个指标,以表明事件背靠背,但并未为第一个事件创建指标,而只有后续事件。这是一个以下外观的示例:

ID   start_date   end_date     b2b_ind
1    01/01/2022   01/02/2022   0
1    01/02/2022   01/05/2022   1
1    01/06/2022   01/07/2022   1

如何重写代码,以便所有事件在背靠背时都符合1的指标?

I have a dataset that looks like this:

ID   start_date   end_date
1    01/01/2022   01/02/2022
1    01/02/2022   01/05/2022
1    01/06/2022   01/07/2022
2    01/09/2019   01/22/2022
2    06/07/2014   09/10/2015
3    11/10/2012   02/01/2013

I am trying to create a dummy indicator to show events that are back-to-back. So far, I have been able to do the following:

data df_1;
    set df_2;
    by ID end_date;
    lag_epi_e = lag(end_date);
    if not (first.ID) then do;
    date_diff= start_date- lag(end_date);
    end;
    format lag_epi_e date9.;
run;

The issue with this code is that it will create an indicator to show that events are back to back but is does not create an indicator for the first event, only the follow up events. Here is an example of how it looks below:

ID   start_date   end_date     b2b_ind
1    01/01/2022   01/02/2022   0
1    01/02/2022   01/05/2022   1
1    01/06/2022   01/07/2022   1

How can I rewrite the code so that all events take on an indicator of 1 when they are back-to-back?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

千と千尋 2025-02-18 18:42:01

您也想要第一张记录吗?

如果是这样,您可以设置该设置,但是如果下一个记录集不背靠背,会发生什么?
可能有助于显示您的预期输出。

注意,您还应在if语句之外使用计算出的滞后变量
否则,您将获得意外的结果。

data df_1;
    set df_2;
    by ID end_date;
    lag_epi_e = lag(end_date);
    if not (first.ID) then do;
    date_diff= start_date- lag_epi_e;
    end;
    else if first.id then date_diff=1;
    format lag_epi_e date9.;
run;

Do you want 1 at first record as well?

If so you can set that, but what happens if the next record set is not back to back?
May help to show your expected output.

Note you should also use the calculated lag variable outside the IF statement
otherwise you'll get unexpected results.

data df_1;
    set df_2;
    by ID end_date;
    lag_epi_e = lag(end_date);
    if not (first.ID) then do;
    date_diff= start_date- lag_epi_e;
    end;
    else if first.id then date_diff=1;
    format lag_epi_e date9.;
run;
渔村楼浪 2025-02-18 18:42:01

在您的情况下,您需要检查领先的事件和滞后事件是否在一起。由于LEAD不是SAS中的功能,因此您可以使用众多方法之一来完成它。我最喜欢的是 sgf论文:SAS®中的铅(和滞后):一个问题,许多解决方案

让我们为您的数据添加潜在客户。此代码正在做三件事:

  1. 打开数据集df_1在“背景”中
  2. 获取 n + 1 th观察start_date并保存它 代码,则将
  3. 如果我们使用最后一个ID

其设置为丢失:

data want;
    set df_1;
    by ID end_date;
    retain _dsid_;

    if(_N_ = 1) then _dsid_ = open("have");
    _lead_rc_ = fetchobs(_dsid_, _N_+1);
    
    lead_start_date = getvarn(_dsid_, varnum(_dsid_, "start_date"));
    lag_end_date    = lag(end_date);

    if(first.id) then call missing(lag_end_date);
    if(last.id) then call missing(lead_start_date);

    b2b_ind = (   (0 LE (lead_start_date - end_date) LE 1) 
               OR (0 LE (start_date - lag_end_date) LE 1)
              );
    
    drop _lead_rc_ _dsid_;

    format lead_start_date lag_end_date mmddyy10.;
run;

输出:

id start_date   end_date    lead_start_date lag_end_date    b2b_ind
1  01/01/2022   01/02/2022  01/02/2022      .               1
1  01/02/2022   01/05/2022  01/06/2022      01/02/2022      1
1  01/06/2022   01/07/2022  .               01/05/2022      1
2  06/07/2014   09/10/2015  01/09/2019      .               0
2  01/09/2019   01/22/2022  .               09/10/2015      0
3  11/10/2012   02/01/2013  .               .               0

如果您有SAS/ETS,则可以在两次通过中执行此操作:

proc expand data=df_1 out=df1_lead(drop=time);
    by id;
    convert start_date = lead_start_date / transform=(lead 1);
run;

In your case, you'll want to check if both a leading and lagging event are butted up together. Since lead is not a function in SAS, you can use one of the many ways to accomplish it. My favorite is from this SGF paper: Calculating Leads (and Lags) in SAS®: One Problem, Many Solutions

Let's add a lead to your data. This code is doing three things:

  1. Opening up your dataset df_1 in the "background"
  2. Fetching the n + 1th observation of start_date and saving it to a variable
  3. Setting it to missing if we're on the last id

Code:

data want;
    set df_1;
    by ID end_date;
    retain _dsid_;

    if(_N_ = 1) then _dsid_ = open("have");
    _lead_rc_ = fetchobs(_dsid_, _N_+1);
    
    lead_start_date = getvarn(_dsid_, varnum(_dsid_, "start_date"));
    lag_end_date    = lag(end_date);

    if(first.id) then call missing(lag_end_date);
    if(last.id) then call missing(lead_start_date);

    b2b_ind = (   (0 LE (lead_start_date - end_date) LE 1) 
               OR (0 LE (start_date - lag_end_date) LE 1)
              );
    
    drop _lead_rc_ _dsid_;

    format lead_start_date lag_end_date mmddyy10.;
run;

Output:

id start_date   end_date    lead_start_date lag_end_date    b2b_ind
1  01/01/2022   01/02/2022  01/02/2022      .               1
1  01/02/2022   01/05/2022  01/06/2022      01/02/2022      1
1  01/06/2022   01/07/2022  .               01/05/2022      1
2  06/07/2014   09/10/2015  01/09/2019      .               0
2  01/09/2019   01/22/2022  .               09/10/2015      0
3  11/10/2012   02/01/2013  .               .               0

You can optionally do this in two passes if you have SAS/ETS:

proc expand data=df_1 out=df1_lead(drop=time);
    by id;
    convert start_date = lead_start_date / transform=(lead 1);
run;
纵性 2025-02-18 18:42:01
data df_2; input ID $  start_date : mmddyy10.  end_date : mmddyy10.;
format start_date end_date date9.;
 pk=_n_;
cards;
1    01/01/2022   01/02/2022
1    01/02/2022   01/05/2022
1    01/06/2022   01/07/2022
2    01/09/2019   01/22/2022
2    06/07/2014   09/10/2015
3    11/10/2012   02/01/2013
;
run;
proc sql;
 create table df_1(drop=pk) as select distinct d1.*, 
  abs(start_date2-end_date)<=1 or abs(start_date-end_date2)<=1 as b2b_ind 
  from df_2 d1 cross join df_2(rename=(start_date=start_date2 end_date=end_date2 
   pk=pk2)) d2
 having b2b_ind=1 and pk^=pk2
order by ID,start_date,end_date;
quit;
data df_2; input ID $  start_date : mmddyy10.  end_date : mmddyy10.;
format start_date end_date date9.;
 pk=_n_;
cards;
1    01/01/2022   01/02/2022
1    01/02/2022   01/05/2022
1    01/06/2022   01/07/2022
2    01/09/2019   01/22/2022
2    06/07/2014   09/10/2015
3    11/10/2012   02/01/2013
;
run;
proc sql;
 create table df_1(drop=pk) as select distinct d1.*, 
  abs(start_date2-end_date)<=1 or abs(start_date-end_date2)<=1 as b2b_ind 
  from df_2 d1 cross join df_2(rename=(start_date=start_date2 end_date=end_date2 
   pk=pk2)) d2
 having b2b_ind=1 and pk^=pk2
order by ID,start_date,end_date;
quit;
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文