什么对 SAS 数据集性能影响更大 - 观测值数量还是变量数量?
在 SAS 中使用不同的数据集一两个月后,在我看来,数据集的变量越多,在数据集上运行 PROC 和其他操作所需的时间就越多。然而,如果我有 5 个变量,但有 100 万个观察值,性能不会受到太大影响。
虽然我对观察或变量是否影响性能感兴趣,但我也想知道在查看 SAS 性能时是否还遗漏了其他因素?
谢谢!
After working with different data sets in SAS for a month or two, it seems to me that the more variables a data set has, the more time it takes to run PROCs and other operations on the data set. Yet if I have for example 5 variables, but 1 million observations, performance is not impacted too much.
While I'm interested in if observations or variables affect performance, I was also wondering if there are other factors I'm missing in looking at SAS performance?
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
对于相同大小的数据集(行*列),我相信具有更多变量的数据集通常会更慢。我尝试创建两个包含 1 行和 10000 列或 1 列和 10000 行的数据集。变量越多,占用的内存和时间就越多。
在日志上
您还应该查看 BUFNO= 和 BUFSIZE=< /a>.如果您必须多次访问数据集,您可以考虑使用 SASFILE 以及将整个数据集存储在内存中。
For the same size data set (rows*columns) the one with more variables will usually be slower I believe. I tried creating two data sets with either 1 row and 10000 columns, or 1 column and 10000 rows. The one with more variables took a lot more memory and time.
On the Log
You should also check out BUFNO= and BUFSIZE=. If you have to access a data set many times, you might consider using SASFILE as well to store the entire data set in memory.
我不太清楚(并且正在做出有根据的猜测),但我想这与多种因素有关,包括将整个记录读入 PDV,这意味着内存中存在更多包含许多变量的数据。
使用压缩数据集进行一些测量可能是值得的,因为 I/O 通常是瓶颈。
SAS 数据集选项:
I can't quite elucidate (and am making an educated guess), but I imagine it has something to do with a combination of factors, including that a whole record is read into the PDV, which means more data sits in memory with many variables.
It might be worth doing some measurements with compressed datasets, because I/O is often the bottleneck.
SAS dataset option: