如何将大数据导入SAS?
我的文本文件中有一个相当大的数据集,大约有 2500 万行和 200 列(所有数据都是数字)。我想进行一些汇总统计和数据分析 (生存分析)他们。
将数据导入
SAS
的最快方法是什么?我的电脑需要多少内存才能运行如此大的数据集?
I have a rather large dataset in text file with about 25 million rows and 200 columns (all of them are numeric). I would like to run some summary statistics and data analysis
(survival analysis) on them.
What is the fastest way to import the data into
SAS
?How much memory do I need for my PC in order to run such a large dataset?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我不确定有什么会比使用 PROC IMPORT 读取数据集快得多。提前指定您的信息和格式可能有助于加快速度,但 PROC IMPORT 默认情况下仅从前 20 条记录中推断出这些信息和格式,因此它不会读取整个数据集来确定要使用的数据类型。您的列都是数字这一事实可能会有所帮助。最重要的是确保将结果保存到永久数据集(即为其指定一个库) - 如果您只需导入一次数据,那么即使需要很长时间也没关系。< /p>
SAS 的优点之一是它默认将数据保存在磁盘上而不是内存中,因此 RAM 的大小并不会真正限制数据集的大小。它可能会限制您可以使用该数据集执行的操作,但我对 SAS 的内部操作了解不够,无法预测您会遇到什么问题。
I'm not sure that anything is going to be much faster than just reading your dataset in using PROC IMPORT. Specifying your informats and formats in advance might help speed things up a bit, but PROC IMPORT infers these from only the first 20 records by default, so it's not like it's going to read your whole dataset to figure out what data types to use. The fact that your columns are all numeric will probably help. The most important thing is to be sure to save the results to a permanent dataset (i.e., specify a library for it) - if you only have to import the data once, it doesn't really matter if it takes a long time.
One of the nice things about SAS is that it keeps data on disk rather than in memory by default, so the size of your RAM doesn't really limit the size of your dataset. It might limit what you can do with that dataset, but I don't know enough about SAS's internal operations to be able to predict what you'll have trouble with.