高效地将 SAS 数据集转换为 CSV
谁能告诉我以编程方式将 SAS 数据集转换为 CSV 文件的最快方法是什么。我知道我可以使用数据步骤并输出到文件等。但这是唯一的方法吗?
谢谢, 阿德南。
Can anyone tell me what is the fastest way to programmatically convert a SAS dataset into a CSV file. I know I can use a data step and output to a file etc. But is that the only way?
Thanks,
Adnan.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
沿着这些思路的东西?
something along these lines?
创建 SAS 数据集的 .CSV 文件的 5 种不同方法。
请参阅http://studysas.blogspot。 com/2009/02/how-to-create-comma-separated-file-csv.html
5 different ways to create .CSV file of a SAS dataset.
refer http://studysas.blogspot.com/2009/02/how-to-create-comma-separated-file-csv.html
这里的答案解决了创建 csv 的多种方法,但不一定是这些方法的效率。我创建了一个包含 100,000 行和 5 列的示例数据集 (
temp
),以测试和比较将 sas 数据集导出到名为temp.csv
的 csv 的方法。第一种方法:导出程序。运行时间:0.43秒。
选择其他选项时,运行速度快,并且简单、开箱即用的灵活性。尽管如此,它并不是最强的定制方法。
第二种方法:ODS with Print procedure。运行时间:14.09秒。
尽管有一些特殊的用例,但对于大多数用例来说,此方法是三种方法中最差的选择。这对于以前编写的过程的临时输出很有用,特别是如果您希望输出保留在 lst 文件中(如果它不是太大)。当您想要将另一个过程(例如,复杂的
制表
)转换为文件而不进行进一步操作时,它也可能很有用。如果您不需要 lst 文件中的打印输出,请关闭您的列表 (odslisting close
),否则这将花费非常非常长的时间。第三种方法:文件声明。运行时间:0.06秒。
虽然这种方法的性能还不错,但它不直观并且看起来很混乱。然而,如上所述,您可以更好地控制输出,并且它的运行时间是最快的。
The answers here have addressed many ways to create a csv, but not necessarily the efficiency of these methods. I create a sample dataset (
temp
) with 100,000 rows and five columns to test and compare the methods for export the sas dataset to a csv namedtemp.csv
.The first method: The Export Procedure. Run time: 0.43 seconds.
Quick run time and simple, in-the-box flexibility when choosing other options. With that said, it is not the strongest for customization
The second method: ODS with Print Procedure. Run time: 14.09 seconds.
This method is the worst option out of the three for most use cases, although there a few special use cases. It is nice for temporary output of previously written procedures, especially if you want the output to stay in the lst file (if it is not too big). It also may be useful when you want to convert another procedure (for instance, a complicated
tabulate
) to file without further manipulation. If you do not need the print out in the lst file, close your listing (ods listing close
) or this will take much, much longer.The third method: File Statement. Run time: 0.06 seconds.
While the performance of this method is not bad, it is not intuitive and looks confusing. As mentioned above however, you would have more control over the output, and it has the quickest run time of them all.
修改的
data _NULL_
方法这是对
data _NULL_
方法的修改,该方法在第一个数据步骤中将标题行写入文件,然后继续将数据行写入同一文件在第二个数据步骤中。它可以与
keep
和drop
数据集选项一起使用(令人惊讶),但不幸的是,当使用open(),
nvars
属性遵循keep
或drop
之后的新变量数量,但varlabel
和varname
函数仍然使用其varnum
查找变量。在以下示例中,仅从
SASHELP.CLASS
中提取Name
(varnum=1) 和Height
(varnum=4),因为只有两个变量被保留,nvars
为 2,但是如果我们以 num 作为上限迭代循环,我们会错过Height
,因为它的varnum
> 为 4:有两个选项:
nvars
值num
,尽管这只是导致日志中出现大量'invalid'
注释这是第一种方法,需要两次调用
open
函数:将所有这些写出来后,它可能是有意义的在大多数情况下使用
PROC EXPORT
,或者在变量不多的情况下显式列出变量。这只是表明可以做什么。Modified
data _NULL_
approachHere's a modification to the
data _NULL_
approach that writes out header rows to the file in the first datastep and then continues to write the same file with the data rows in the second datastep.It can be made to work with
keep
anddrop
data set options (surprisingly), although unfortunately when a dataset is 'opened' usingopen()
, thenvars
attribute respects the new number of variables afterkeep
ordrop
, but thevarlabel
andvarname
functions still look for variables using theirvarnum
.In the following example, just
Name
(varnum=1) andHeight
(varnum=4) are extracted fromSASHELP.CLASS
, because only two variables are kept,nvars
is 2, but if we iterate over a loop with num as the upper bound, we miss outHeight
, because itsvarnum
is 4:There are two options:
nvars
value by stripping off the dataset options for the first passnum
, although this just leads to lots of'invalid'
notes in the logThis is the first approach, requiring two calls to the
open
function:After writing all of that out, it probably makes sense to go with
PROC EXPORT
in most cases, or explicitly list the variables if there aren't many. This just shows what can be done.