使用 SAS 在两个数据集中查找尽可能接近给定观察的记录

发布于 2024-11-28 22:43:54 字数 486 浏览 1 评论 0原文

我有两个数据集，一个是另一个的子集。

例如，假设我有

主表：

Name,status,date
john,born,1-08-2011
frank,alive,1-08-2011
john,alive,1-09-2011
frank,alive,1-09-2011
frank,alive,1-10-2011
john,dead,1-11-2011
frank,alive,1-11-2011

子表，

frank,alive,1-11-2011
john,dead,1-11-2011

我想在我们有记录的前一天搜索主表，了解每个人的状态。

所以我想要的结果表

frank,alive,1-10-2011
john,alive,1-09-2011 (since he didn't get a record entry on 1-10)

然后理想情况下，抑制/删除人员状态未更改的记录。

原文

I've got two datasets, one is a subset of the other.

For example let's say I have

Master table:

Name,status,date
john,born,1-08-2011
frank,alive,1-08-2011
john,alive,1-09-2011
frank,alive,1-09-2011
frank,alive,1-10-2011
john,dead,1-11-2011
frank,alive,1-11-2011

Sub table

frank,alive,1-11-2011
john,dead,1-11-2011

I'd like to search the master table, for each person's status, on whatever previous day we have a record for.

So my result table I'd like to have

frank,alive,1-10-2011
john,alive,1-09-2011 (since he didn't get a record entry on 1-10)

And then ideally, suppress / remove records where the persons status hasn't changed.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

鸢与 2024-12-05 22:43:55

您可以分两次完成此操作。

第一步是对数据进行排序。

proc sort data=dataset;
    by name date;
run;

对数据集进行排序后，您可以使用数据步骤对其进行迭代并使用“第一个”。和“最后”。元变量。

data second_to_last(drop=last_name last_status last_date);
    retain last_name '' last_status '' last_date .;
    set dataset;
    by name date;
    if first.name then do;
        last_name = '';
        last_status = '';
        last_date = .;
    end;
    if last.name and last_name != '' then do;
        name = last_name;
        status = last_status;
        date = last_date;
        output second_to_last;
    end;
    last_name = name;
    last_status = status;
    last_date = date;
run;

每次数据步骤以“first.name”为 true 进行迭代时，它都会清除您的保留变量。在一组名称末尾之前，当“last.name”为 true 时，它将把变量设置回之前的值（如果有前一行）并输出该行。

You can do this in two passes.

The first pass would be to sort your data.

proc sort data=dataset;
    by name date;
run;

Once your dataset is sorted, you can use a data step to iterate through it and use the "first." and "last." metavariables.

data second_to_last(drop=last_name last_status last_date);
    retain last_name '' last_status '' last_date .;
    set dataset;
    by name date;
    if first.name then do;
        last_name = '';
        last_status = '';
        last_date = .;
    end;
    if last.name and last_name != '' then do;
        name = last_name;
        status = last_status;
        date = last_date;
        output second_to_last;
    end;
    last_name = name;
    last_status = status;
    last_date = date;
run;

Every time the data step iterates with "first.name" as true, it will clear your retain variables. Right before the end of a group of names, when "last.name" is true, it will set the variables back to what they were previously (if there was a previous row) and output the row.

回复收藏 0 原文

牛↙奶布丁 2024-12-05 22:43:54

这是使用 SQL 的另一种方法：

proc sql;
 select a.*
 from master a, sub b
 where a.name=b.name and a.date<=b.date
 group by a.name
 having a.date=max(a.date);
quit;

这仅选择 master 表中最大日期小于 sub 表中日期的条目。

Here's another approach using SQL:

proc sql;
 select a.*
 from master a, sub b
 where a.name=b.name and a.date<=b.date
 group by a.name
 having a.date=max(a.date);
quit;

This just selects entries in the master table with the largest date that is less than the date in the sub table.

回复收藏 0 原文

~没有更多了~

关于作者

靖瑶

暂无简介

0 文章

0 评论

24 人气

关注发私信

友情链接

文江博客

使用 SAS 在两个数据集中查找尽可能接近给定观察的记录

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

胡图图

zt006

z祗昰~

冰葑

野の

天空

友情链接

使用 SAS 在两个数据集中查找尽可能接近给定观察的记录

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

胡图图

zt006

z祗昰~

冰葑

野の

天空

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。