如何通过数据步骤或PROC SQL查询SAS循环多个数据集?
我有多个数据集(100+),它们都包含相同的3列(code_num,replicate,total_qty),每个列具有不同的代码(code_num)。
data code_num_1
code_num replicate total_qty
12345 376 45
12345 76 67
12345 943 300
.
.
data code_num_2
code_num replicate total_qty
12234 85 746
12234 900 35
12234 726 273
.
.
以及
如果可能的话,我想通过数据步骤运行这些数据集:
data test;
set test_; <-- datasets will go here...
if _N_ in(&PercentileRow10,&PercentileRow20,&PercentileRow30,&PercentileRow40,&PercentileRow50,&PercentileRow60,&PercentileRow70, &PercentileRow80,&PercentileRow90);
run;
*注意:&amp; percentiLerow是一个宏变量,将从数据集中获取百分位数。列数量将确定百分位数。我事先有这个步骤:
proc sql no print;
创建表___ 选择code_num, 复制, 总和(数量)为total_qty 从 ____ code_num组的组,复制 total_qty订购; 辞职;
理想情况下,我想获取每个数据集的百分位数,并创建一个新数据集,该数据集将具有每个百分位数,并且相关的复制量和总量。我可以使用宏和循环通过此数据集运行我的数据集以生成新的数据集吗?
data code_num_1_perc
percentile replicate qty
10 87 45
20 933 65
30 34 100
.
.
90 467 837
这是我对每个数据集Code_num_#的理想输出。如果可能的话
I have multiple datasets (100+) that all contain the same 3 columns (code_num, replicate, total_qty) each with a distinct code (code_num).
data code_num_1
code_num replicate total_qty
12345 376 45
12345 76 67
12345 943 300
.
.
data code_num_2
code_num replicate total_qty
12234 85 746
12234 900 35
12234 726 273
.
.
and etc.
I would like to run those datasets through a data step if possible:
data test;
set test_; <-- datasets will go here...
if _N_ in(&PercentileRow10,&PercentileRow20,&PercentileRow30,&PercentileRow40,&PercentileRow50,&PercentileRow60,&PercentileRow70, &PercentileRow80,&PercentileRow90);
run;
*Note: &percentilerow is a macro variable that will obtain the percentiles from the datasets. The column quantity will determine percentiles. I have this step beforehand:
proc sql no print;
create table ___ as
select code_num,
replicate,
sum(qty) as total_qty
from ____
group by code_num, replicate
order by total_qty;
quit;
Ideally, I would like to obtain the percentiles of each dataset and create a new dataset that will have each percentile and the associated replicate it occurred and the total quantity. Could I use a macro and do loop to run my datasets through this data set to produce new datasets?
data code_num_1_perc
percentile replicate qty
10 87 45
20 933 65
30 34 100
.
.
90 467 837
This is my ideal output for each dataset code_num_#. If possible
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果我了解正确的要求,则提出的方法存在缺陷。
例如,一个系列的中位数(第50个百分点)
1、2、3、4、5、6、7、8、9、10是5.5。 5.5不是数据集中的值,那么如何选择重复数字?
我的建议将完全是一个不同的过程。查看Proc等级,以了解如何处理联系以及您希望如何处理它们。您没有指定哪个变量用于计算百分位数。
If I understand the requirements correct, the proposed methodology is flawed.
For example, the median (50th percentile) of a series such as
1, 2, 3, 4, 5, 6, 7, 8, 9, 10 is 5.5. 5.5 is not a value in the data set so how would a replicate number be selected?
My recommendation would be a different process altogether. Look into PROC RANK to see how ties are handled and how you'd like them handled. You didn't specify which variable would used to calculate the percentiles.