在SAS中迭代几个月

发布于 2025-01-29 21:32:03 字数 910 浏览 2 评论 0 原文

我有以下问题。这样的表中每个实体有六个月的交易数据:

customer_1  1000  January
customer_1  1000  January
customer_1  1000  February
customer_1  1000  February
customer_1  1000  March
customer_1  1000  April
customer_1  1000  May
customer_1  1000  June

我想计算每两个月的交易数量,例如:

1月至2月有4个交易数量。 2月至3月有3次交易 等等。

我使用嵌套环尝试,以便定义一个开始日期和结束日期,然后在循环结束时我添加了1个月并再次迭代。这样的事情:

 do i = n to 1 by -1 while (date_key{i} >= begin_date);

        do k = i to 1 by -1 while(date_key{k} >= b_d and date_key{k} <= intnx('Month',b_d,2, 's'));

            if upcase(account{k}) in (&account_type) and
            then do;
                total_amount = sum(total_amount,currency_amount{k});
                amt_cnt = amt_cnt + 1;
            end;
        end;
        if total_amount >= threshold_total then m_cnt = sum(m_cnt,1);
        b_d = intnx('Month',b_d,1,'s');
 end;

I have the following problem. There are six months of transaction data per entity in a table like this:

customer_1  1000  January
customer_1  1000  January
customer_1  1000  February
customer_1  1000  February
customer_1  1000  March
customer_1  1000  April
customer_1  1000  May
customer_1  1000  June

I'd like to count the number of transactions in every two months like:

In January - February there are 4 number of transactions.
In February - March there are 3 number of transactions
and so on.

I tried it with nested loops, so that I define a begin and end date and then at the end of the loop I add 1 month to both and iterate again. Something like this:

 do i = n to 1 by -1 while (date_key{i} >= begin_date);

        do k = i to 1 by -1 while(date_key{k} >= b_d and date_key{k} <= intnx('Month',b_d,2, 's'));

            if upcase(account{k}) in (&account_type) and
            then do;
                total_amount = sum(total_amount,currency_amount{k});
                amt_cnt = amt_cnt + 1;
            end;
        end;
        if total_amount >= threshold_total then m_cnt = sum(m_cnt,1);
        b_d = intnx('Month',b_d,1,'s');
 end;

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

雅心素梦 2025-02-05 21:32:03

您可能应该只使用多标签格式和支持它们的PROC。

首先,让我们创建一些实际的示例输入数据。

data have ;
  input id :$10. amount date :yymmdd.;
  format date yymmdd.;
cards;
customer_1  1001  2021-01-01
customer_1  1002  2021-01-01
customer_1  1003  2021-02-01
customer_1  1004  2021-02-01
customer_1  1005  2021-03-01
customer_1  1006  2021-04-01
customer_1  1007  2021-05-01
customer_1  1008  2021-06-01
;

然后,让我们创建一个示例格式。这是一种名为“命名期”的格式,是2021年的前5个两个月期。

proc format cntlout=formats;
 value periods (MULTILABEL)
   '01JAN2021'd -< '01MAR2021'd = '2021 01-02'
   '01FEB2021'd -< '01APR2021'd = '2021 02-03'
   '01MAR2021'd -< '01MAY2021'd = '2021 03-04'
   '01APR2021'd -< '01JUN2021'd = '2021 04-05'
   '01MAY2021'd -< '01JUL2021'd = '2021 05-06'
 ;
run;

一旦定义了格式,将其应用于您的日期变量,并按日期进行汇总。

这是使用Proc摘要的代码。

proc summary data=have nway;
  class date / mlf ;
  var amount;
  output out=want(rename=(date=period)) sum= ;
  format date periods.;
run;
proc print data=want;
run;

和结果:

对于一系列日期,动态制作格式并不难。您只需要数据集中的这些变量即可与cntlin = proc格式的选项一起使用。

也许这样的东西:

data cntlin;
  fmtname='PERIODS';
  sexcl='N';
  eexcl='Y';
  hlo='SM';
  length label $20 ;
  do offset=0 to intck('month','01JAN2021'd,'01JAN2022'd)-1;
    start = intnx('month','01JAN2021'd,offset);
    end = intnx('month',start,2);
    label=cat(put(start,yymmd7.),' and ',put(end-1,yymmd7.));
    output;
  end;
run;

You should probably just use a MULTILABEL format and a proc that supports them.

First let's create some actual sample input data.

data have ;
  input id :$10. amount date :yymmdd.;
  format date yymmdd.;
cards;
customer_1  1001  2021-01-01
customer_1  1002  2021-01-01
customer_1  1003  2021-02-01
customer_1  1004  2021-02-01
customer_1  1005  2021-03-01
customer_1  1006  2021-04-01
customer_1  1007  2021-05-01
customer_1  1008  2021-06-01
;

Then let's create an example format. Here is a format named PERIODS that are the first 5 two month periods in 2021.

proc format cntlout=formats;
 value periods (MULTILABEL)
   '01JAN2021'd -< '01MAR2021'd = '2021 01-02'
   '01FEB2021'd -< '01APR2021'd = '2021 02-03'
   '01MAR2021'd -< '01MAY2021'd = '2021 03-04'
   '01APR2021'd -< '01JUN2021'd = '2021 04-05'
   '01MAY2021'd -< '01JUL2021'd = '2021 05-06'
 ;
run;

Once you have the format defined apply it to your DATE variable and summarize by date.

Here is code using PROC SUMMARY.

proc summary data=have nway;
  class date / mlf ;
  var amount;
  output out=want(rename=(date=period)) sum= ;
  format date periods.;
run;
proc print data=want;
run;

And the results:

enter image description here

It is not hard to make format dynamically for a range of dates. You just need these variables in a dataset to use with the CNTLIN= option of PROC FORMAT.

enter image description here

So perhaps something like this:

data cntlin;
  fmtname='PERIODS';
  sexcl='N';
  eexcl='Y';
  hlo='SM';
  length label $20 ;
  do offset=0 to intck('month','01JAN2021'd,'01JAN2022'd)-1;
    start = intnx('month','01JAN2021'd,offset);
    end = intnx('month',start,2);
    label=cat(put(start,yymmd7.),' and ',put(end-1,yymmd7.));
    output;
  end;
run;
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文