使用 SAS 在两个数据集中查找尽可能接近给定观察的记录
我有两个数据集,一个是另一个的子集。
例如,假设我有
主表:
Name,status,date john,born,1-08-2011 frank,alive,1-08-2011 john,alive,1-09-2011 frank,alive,1-09-2011 frank,alive,1-10-2011 john,dead,1-11-2011 frank,alive,1-11-2011
子表,
frank,alive,1-11-2011 john,dead,1-11-2011
我想在我们有记录的前一天搜索主表,了解每个人的状态。
所以我想要的结果表
frank,alive,1-10-2011 john,alive,1-09-2011 (since he didn't get a record entry on 1-10)
然后理想情况下,抑制/删除人员状态未更改的记录。
I've got two datasets, one is a subset of the other.
For example let's say I have
Master table:
Name,status,date john,born,1-08-2011 frank,alive,1-08-2011 john,alive,1-09-2011 frank,alive,1-09-2011 frank,alive,1-10-2011 john,dead,1-11-2011 frank,alive,1-11-2011
Sub table
frank,alive,1-11-2011 john,dead,1-11-2011
I'd like to search the master table, for each person's status, on whatever previous day we have a record for.
So my result table I'd like to have
frank,alive,1-10-2011 john,alive,1-09-2011 (since he didn't get a record entry on 1-10)
And then ideally, suppress / remove records where the persons status hasn't changed.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以分两次完成此操作。
第一步是对数据进行排序。
对数据集进行排序后,您可以使用数据步骤对其进行迭代并使用“第一个”。和“最后”。元变量。
每次数据步骤以“first.name”为 true 进行迭代时,它都会清除您的保留变量。在一组名称末尾之前,当“last.name”为 true 时,它将把变量设置回之前的值(如果有前一行)并输出该行。
You can do this in two passes.
The first pass would be to sort your data.
Once your dataset is sorted, you can use a data step to iterate through it and use the "first." and "last." metavariables.
Every time the data step iterates with "first.name" as true, it will clear your retain variables. Right before the end of a group of names, when "last.name" is true, it will set the variables back to what they were previously (if there was a previous row) and output the row.
这是使用 SQL 的另一种方法:
这仅选择
master
表中最大日期小于sub
表中日期的条目。Here's another approach using SQL:
This just selects entries in the
master
table with the largest date that is less than the date in thesub
table.