当前位置：文江博客话题详情

高效地将 SAS 数据集转换为 CSV

发布于 2024-09-19 11:03:45 字数 92 浏览 10 评论 0原文

谁能告诉我以编程方式将 SAS 数据集转换为 CSV 文件的最快方法是什么。我知道我可以使用数据步骤并输出到文件等。但这是唯一的方法吗？

谢谢，阿德南。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梦在深巷 2024-09-26 11:03:45

沿着这些思路的东西？

proc export data=sashelp.class
    outfile='c:\temp\sashelp class.csv'
    dbms=csv
    replace;
run;

something along these lines?

proc export data=sashelp.class
    outfile='c:\temp\sashelp class.csv'
    dbms=csv
    replace;
run;

回复收藏 0 原文

不必了 2024-09-26 11:03:45

创建 SAS 数据集的 .CSV 文件的 5 种不同方法。

请参阅http://studysas.blogspot。 com/2009/02/how-to-create-comma-separated-file-csv.html

回复收藏 0 原文

感性 2024-09-26 11:03:45

这里的答案解决了创建 csv 的多种方法，但不一定是这些方法的效率。我创建了一个包含 100,000 行和 5 列的示例数据集 (temp)，以测试和比较将 sas 数据集导出到名为 temp.csv 的 csv 的方法。

第一种方法：导出程序。运行时间：0.43秒。

PROC EXPORT data=temp 
     outfile="temp.csv" dbms=csv replace;

选择其他选项时，运行速度快，并且简单、开箱即用的灵活性。尽管如此，它并不是最强的定制方法。

第二种方法：ODS with Print procedure。运行时间：14.09秒。

ODS csv file="temp.csv";
PROC PRINT data=temp noobs;
RUN;
ods csv close;

尽管有一些特殊的用例，但对于大多数用例来说，此方法是三种方法中最差的选择。这对于以前编写的过程的临时输出很有用，特别是如果您希望输出保留在 lst 文件中（如果它不是太大）。当您想要将另一个过程（例如，复杂的制表）转换为文件而不进行进一步操作时，它也可能很有用。如果您不需要 lst 文件中的打印输出，请关闭您的列表 (odslisting close)，否则这将花费非常非常长的时间。

第三种方法：文件声明。运行时间：0.06秒。

DATA _null_;
    FILE "temp.csv ";
    SET temp;
    put (_all_) (',');
RUN;

虽然这种方法的性能还不错，但它不直观并且看起来很混乱。然而，如上所述，您可以更好地控制输出，并且它的运行时间是最快的。

The answers here have addressed many ways to create a csv, but not necessarily the efficiency of these methods. I create a sample dataset (temp) with 100,000 rows and five columns to test and compare the methods for export the sas dataset to a csv named temp.csv.

The first method: The Export Procedure. Run time: 0.43 seconds.

PROC EXPORT data=temp 
     outfile="temp.csv" dbms=csv replace;

Quick run time and simple, in-the-box flexibility when choosing other options. With that said, it is not the strongest for customization

The second method: ODS with Print Procedure. Run time: 14.09 seconds.

ODS csv file="temp.csv";
PROC PRINT data=temp noobs;
RUN;
ods csv close;

This method is the worst option out of the three for most use cases, although there a few special use cases. It is nice for temporary output of previously written procedures, especially if you want the output to stay in the lst file (if it is not too big). It also may be useful when you want to convert another procedure (for instance, a complicated tabulate) to file without further manipulation. If you do not need the print out in the lst file, close your listing (ods listing close) or this will take much, much longer.

The third method: File Statement. Run time: 0.06 seconds.

DATA _null_;
    FILE "temp.csv ";
    SET temp;
    put (_all_) (',');
RUN;

While the performance of this method is not bad, it is not intuitive and looks confusing. As mentioned above however, you would have more control over the output, and it has the quickest run time of them all.

回复收藏 0 原文

奶茶白久 2024-09-26 11:03:45

修改的 `data _NULL_` 方法

这是对 data _NULL_ 方法的修改，该方法在第一个数据步骤中将标题行写入文件，然后继续将数据行写入同一文件在第二个数据步骤中。

%macro outputCSV(dataset,file);
data _NULL_;
   file "&file." dlm=',' dsd;
   length header $ 2000;
   dsid=open("&dataset.","i");
   num=attrn(dsid,"nvars");
   do i=1 to num;
      header = trim(left(coalescec(varlabel(dsid,i),varname(dsid,i))));
      put header @;
   end;
   rc=close(dsid);
run;

data _NULL_;
   set &dataset.;
   file "&file." mod dlm=',' dsd;
   put (_all_) (+0);
run;
%mend;

%outputCSV(sashelp.class,~/temp4.csv)

它可以与 keep 和 drop 数据集选项一起使用（令人惊讶），但不幸的是，当使用 open()，nvars 属性遵循 keep 或 drop 之后的新变量数量，但 varlabel 和 varname 函数仍然使用其 varnum 查找变量。

在以下示例中，仅从 SASHELP.CLASS 中提取 Name (varnum=1) 和 Height (varnum=4)，因为只有两个变量被保留，nvars 为 2，但是如果我们以 num 作为上限迭代循环，我们会错过 Height，因为它的 varnum > 为 4：

 62         data _NULL_;
 63           dsid = open("sashelp.class (keep=name height)","i");
 64           num = attrn(dsid,"nvars");
 65           do i=1 to 5;
 66             vname = varname(dsid,i);
 67             put i= vname= num=;
 68           end;
 69         run;
 
 i=1 vname=Name num=2
 NOTE: Argument 2 to function VARNAME(1,2) at line 66 column 13 is invalid.
 i=2 vname=  num=2
 NOTE: Argument 2 to function VARNAME(1,3) at line 66 column 13 is invalid.
 i=3 vname=  num=2
 i=4 vname=Height num=2
 NOTE: Argument 2 to function VARNAME(1,5) at line 66 column 13 is invalid.
 i=5 vname=  num=2
 dsid=1 num=2 i=6 vname=  _ERROR_=1 _N_=1

有两个选项：

通过剥离第一遍的数据集选项来提取真实的 nvars 值
使用非常大的数字而不是 num，尽管这只是导致日志中出现大量 'invalid' 注释

这是第一种方法，需要两次调用 open 函数：

%macro outputCSV(dataset,file);
data _NULL_;
   file "&file." dlm=',' dsd;
   length header $ 2000;
   dsid=open("%SCAN(&dataset.,1,()","i");
   num=attrn(dsid,"nvars");
   rc=close(dsid);
   dsid=open("&dataset.","i");
   do i=1 to num;
      header = trim(left(coalescec(varlabel(dsid,i),varname(dsid,i))));
      if _error_ = 0 then put header @;
      _error_ = 0;
   end;
   rc=close(dsid);
run;

data _NULL_;
   set &dataset.;
   file "&file." mod dlm=',' dsd;
   put (_all_) (+0);
run;
%mend;

%outputCSV(sashelp.class (keep=name height),~/temp4.csv)

将所有这些写出来后，它可能是有意义的在大多数情况下使用PROC EXPORT，或者在变量不多的情况下显式列出变量。这只是表明可以做什么。

Modified `data _NULL_` approach

Here's a modification to the data _NULL_ approach that writes out header rows to the file in the first datastep and then continues to write the same file with the data rows in the second datastep.

%macro outputCSV(dataset,file);
data _NULL_;
   file "&file." dlm=',' dsd;
   length header $ 2000;
   dsid=open("&dataset.","i");
   num=attrn(dsid,"nvars");
   do i=1 to num;
      header = trim(left(coalescec(varlabel(dsid,i),varname(dsid,i))));
      put header @;
   end;
   rc=close(dsid);
run;

data _NULL_;
   set &dataset.;
   file "&file." mod dlm=',' dsd;
   put (_all_) (+0);
run;
%mend;

%outputCSV(sashelp.class,~/temp4.csv)

It can be made to work with keep and drop data set options (surprisingly), although unfortunately when a dataset is 'opened' using open(), the nvars attribute respects the new number of variables after keep or drop, but the varlabel and varname functions still look for variables using their varnum.

In the following example, just Name (varnum=1) and Height (varnum=4) are extracted from SASHELP.CLASS, because only two variables are kept, nvars is 2, but if we iterate over a loop with num as the upper bound, we miss out Height, because its varnum is 4:

 62         data _NULL_;
 63           dsid = open("sashelp.class (keep=name height)","i");
 64           num = attrn(dsid,"nvars");
 65           do i=1 to 5;
 66             vname = varname(dsid,i);
 67             put i= vname= num=;
 68           end;
 69         run;
 
 i=1 vname=Name num=2
 NOTE: Argument 2 to function VARNAME(1,2) at line 66 column 13 is invalid.
 i=2 vname=  num=2
 NOTE: Argument 2 to function VARNAME(1,3) at line 66 column 13 is invalid.
 i=3 vname=  num=2
 i=4 vname=Height num=2
 NOTE: Argument 2 to function VARNAME(1,5) at line 66 column 13 is invalid.
 i=5 vname=  num=2
 dsid=1 num=2 i=6 vname=  _ERROR_=1 _N_=1

There are two options:

Extract the true nvars value by stripping off the dataset options for the first pass
Use a really high number instead of num, although this just leads to lots of 'invalid' notes in the log

This is the first approach, requiring two calls to the open function:

%macro outputCSV(dataset,file);
data _NULL_;
   file "&file." dlm=',' dsd;
   length header $ 2000;
   dsid=open("%SCAN(&dataset.,1,()","i");
   num=attrn(dsid,"nvars");
   rc=close(dsid);
   dsid=open("&dataset.","i");
   do i=1 to num;
      header = trim(left(coalescec(varlabel(dsid,i),varname(dsid,i))));
      if _error_ = 0 then put header @;
      _error_ = 0;
   end;
   rc=close(dsid);
run;

data _NULL_;
   set &dataset.;
   file "&file." mod dlm=',' dsd;
   put (_all_) (+0);
run;
%mend;

%outputCSV(sashelp.class (keep=name height),~/temp4.csv)

After writing all of that out, it probably makes sense to go with PROC EXPORT in most cases, or explicitly list the variables if there aren't many. This just shows what can be done.

回复收藏 0 原文

~没有更多了~