使用 SAS 在两个数据集中查找尽可能接近给定观察的记录

发布于 2024-11-28 22:43:54 字数 486 浏览 1 评论 0原文

我有两个数据集,一个是另一个的子集。

例如,假设我有

主表:

Name,status,date
john,born,1-08-2011
frank,alive,1-08-2011
john,alive,1-09-2011
frank,alive,1-09-2011
frank,alive,1-10-2011
john,dead,1-11-2011
frank,alive,1-11-2011

子表,

frank,alive,1-11-2011
john,dead,1-11-2011

我想在我们有记录的前一天搜索主表,了解每个人的状态。

所以我想要的结果表

frank,alive,1-10-2011
john,alive,1-09-2011 (since he didn't get a record entry on 1-10)

然后理想情况下,抑制/删除人员状态未更改的记录。

I've got two datasets, one is a subset of the other.

For example let's say I have

Master table:

Name,status,date
john,born,1-08-2011
frank,alive,1-08-2011
john,alive,1-09-2011
frank,alive,1-09-2011
frank,alive,1-10-2011
john,dead,1-11-2011
frank,alive,1-11-2011

Sub table

frank,alive,1-11-2011
john,dead,1-11-2011

I'd like to search the master table, for each person's status, on whatever previous day we have a record for.

So my result table I'd like to have

frank,alive,1-10-2011
john,alive,1-09-2011 (since he didn't get a record entry on 1-10)

And then ideally, suppress / remove records where the persons status hasn't changed.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

鸢与 2024-12-05 22:43:55

您可以分两次完成此操作。

第一步是对数据进行排序。

proc sort data=dataset;
    by name date;
run;

对数据集进行排序后,您可以使用数据步骤对其进行迭代并使用“第一个”。和“最后”。元变量。

data second_to_last(drop=last_name last_status last_date);
    retain last_name '' last_status '' last_date .;
    set dataset;
    by name date;
    if first.name then do;
        last_name = '';
        last_status = '';
        last_date = .;
    end;
    if last.name and last_name != '' then do;
        name = last_name;
        status = last_status;
        date = last_date;
        output second_to_last;
    end;
    last_name = name;
    last_status = status;
    last_date = date;
run;

每次数据步骤以“first.name”为 true 进行迭代时,它都会清除您的保留变量。在一组名称末尾之前,当“last.name”为 true 时,它​​将把变量设置回之前的值(如果有前一行)并输出该行。

You can do this in two passes.

The first pass would be to sort your data.

proc sort data=dataset;
    by name date;
run;

Once your dataset is sorted, you can use a data step to iterate through it and use the "first." and "last." metavariables.

data second_to_last(drop=last_name last_status last_date);
    retain last_name '' last_status '' last_date .;
    set dataset;
    by name date;
    if first.name then do;
        last_name = '';
        last_status = '';
        last_date = .;
    end;
    if last.name and last_name != '' then do;
        name = last_name;
        status = last_status;
        date = last_date;
        output second_to_last;
    end;
    last_name = name;
    last_status = status;
    last_date = date;
run;

Every time the data step iterates with "first.name" as true, it will clear your retain variables. Right before the end of a group of names, when "last.name" is true, it will set the variables back to what they were previously (if there was a previous row) and output the row.

牛↙奶布丁 2024-12-05 22:43:54

这是使用 SQL 的另一种方法:

proc sql;
 select a.*
 from master a, sub b
 where a.name=b.name and a.date<=b.date
 group by a.name
 having a.date=max(a.date);
quit;

这仅选择 master 表中最大日期小于 sub 表中日期的条目。

Here's another approach using SQL:

proc sql;
 select a.*
 from master a, sub b
 where a.name=b.name and a.date<=b.date
 group by a.name
 having a.date=max(a.date);
quit;

This just selects entries in the master table with the largest date that is less than the date in the sub table.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文