如何通过数据步骤或PROC SQL查询SAS循环多个数据集?

发布于 2025-01-22 15:02:27 字数 1197 浏览 3 评论 0原文

我有多个数据集(100+),它们都包含相同的3列(code_num,replicate,total_qty),每个列具有不同的代码(code_num)。

data code_num_1
code_num replicate total_qty
12345       376       45
12345       76        67
12345       943       300
.
.

data code_num_2
code_num replicate total_qty
12234       85       746
12234       900      35
12234       726      273
.
.

以及

如果可能的话,我想通过数据步骤运行这些数据集:

data test;
set test_; <-- datasets will go here...
if _N_ in(&PercentileRow10,&PercentileRow20,&PercentileRow30,&PercentileRow40,&PercentileRow50,&PercentileRow60,&PercentileRow70, &PercentileRow80,&PercentileRow90);
run;

*注意:&amp; percentiLerow是一个宏变量,将从数据集中获取百分位数。列数量将确定百分位数。我事先有这个步骤:

proc sql no print; 

创建表___ 选择code_num, 复制, 总和(数量)为total_qty 从 ____ code_num组的组,复制 total_qty订购; 辞职;

理想情况下,我想获取每个数据集的百分位数,并创建一个新数据集,该数据集将具有每个百分位数,并且相关的复制量和总量。我可以使用宏和循环通过此数据集运行我的数据集以生成新的数据集吗?

data code_num_1_perc
percentile replicate qty
10           87      45
20           933     65
30           34      100
.
.
90           467      837

这是我对每个数据集Code_num_#的理想输出。如果可能的话

I have multiple datasets (100+) that all contain the same 3 columns (code_num, replicate, total_qty) each with a distinct code (code_num).

data code_num_1
code_num replicate total_qty
12345       376       45
12345       76        67
12345       943       300
.
.

data code_num_2
code_num replicate total_qty
12234       85       746
12234       900      35
12234       726      273
.
.

and etc.

I would like to run those datasets through a data step if possible:

data test;
set test_; <-- datasets will go here...
if _N_ in(&PercentileRow10,&PercentileRow20,&PercentileRow30,&PercentileRow40,&PercentileRow50,&PercentileRow60,&PercentileRow70, &PercentileRow80,&PercentileRow90);
run;

*Note: &percentilerow is a macro variable that will obtain the percentiles from the datasets. The column quantity will determine percentiles. I have this step beforehand:

proc sql no print; 

create table ___ as
select code_num,
replicate,
sum(qty) as total_qty
from ____
group by code_num, replicate
order by total_qty;
quit;

Ideally, I would like to obtain the percentiles of each dataset and create a new dataset that will have each percentile and the associated replicate it occurred and the total quantity. Could I use a macro and do loop to run my datasets through this data set to produce new datasets?

data code_num_1_perc
percentile replicate qty
10           87      45
20           933     65
30           34      100
.
.
90           467      837

This is my ideal output for each dataset code_num_#. If possible

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

青衫儰鉨ミ守葔 2025-01-29 15:02:27

如果我了解正确的要求,则提出的方法存在缺陷。

例如,一个系列的中位数(第50个百分点)
1、2、3、4、5、6、7、8、9、10是5.5。 5.5不是数据集中的值,那么如何选择重复数字?

我的建议将完全是一个不同的过程。查看Proc等级,以了解如何处理联系以及您希望如何处理它们。您没有指定哪个变量用于计算百分位数。

  1. 将所有数据集组合到一个,添加数据集标识符以唯一标识每个数据集。
data combined;
length source data_set_name $50.;
set code_num_: indsname = source;
data_set_name = source;
run;
  1. 使用Proc等级将每个等级的Proc等级分组为DIDILE
proc rank data=combined out=combined_deciles groups=10;
by data_set_name;
var total_qty;
ranks PRanks;
run;
  1. 获得第一个(或最后一个基于要求)值
data want;
set combined_deciles;
by datasetName Pranks;
if first.Pranks;
run;

If I understand the requirements correct, the proposed methodology is flawed.

For example, the median (50th percentile) of a series such as
1, 2, 3, 4, 5, 6, 7, 8, 9, 10 is 5.5. 5.5 is not a value in the data set so how would a replicate number be selected?

My recommendation would be a different process altogether. Look into PROC RANK to see how ties are handled and how you'd like them handled. You didn't specify which variable would used to calculate the percentiles.

  1. Combine all data sets into one, adding in a data set identifier to uniquely identify each data set.
data combined;
length source data_set_name $50.;
set code_num_: indsname = source;
data_set_name = source;
run;
  1. Use PROC RANK to group into deciles
proc rank data=combined out=combined_deciles groups=10;
by data_set_name;
var total_qty;
ranks PRanks;
run;
  1. Get the first (or last, based on requirements) value for each rank
data want;
set combined_deciles;
by datasetName Pranks;
if first.Pranks;
run;
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文