SAS内存使用和排序

发布于 2024-11-01 13:30:26 字数 340 浏览 1 评论 0原文

我很好奇 SAS 对内存的使用、排序以及为什么它看起来效率如此低下。

我有一个带有 8GB 内存的四核至强处理器。我有一个 3GB 的数据集。为什么在标准 proc 排序期间的任何给定时间,仅使用 120MB 的 RAM,而 CPU 利用率仅为 15-20%?这个过程似乎效率极低。

在我看来,由于我有可用内存,它会加载整个数据集,然后继续消除所有可用的 CPU 周期。但只有15%?这是对可用资源的惊人浪费,让我很烦恼。看起来就像是不断地来回 磁盘速度慢得令人痛苦。

是否有一些神奇的设置说“SAS,你可以利用一切来加快速度”,我错过了?

顺便说一句,运行 64 位 SAS 的 64 位操作系统。

I'm curious about SAS's use of memory, sorting, and why it seems to be so inefficient.

I have a quad core xeon with 8GB ram. I have a 3GB dataset. Why, at any given time during a standard proc sort, is a mere 120MB of ram being used and a meager 15-20% CPU utilization? This seems like something horribly inefficient is going on with the procedure.

In my opinion, as I have the available memory, it would load the entire dataset and then proceed to obliterate all available CPU cycles. But only 15%? It's a stunning waste of available resources and bothers me. It seems like it's constantly going back and forth to
the disk which is painfully slow.

Is there some magical setting that says "SAS, you can utilize everything to go faster" I'm missing?

64bit OS running 64bit SAS, btw.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

带刺的爱情 2024-11-08 13:30:26

您可以检查您的 MEMSIZE排序大小设置。有关排序性能的更多讨论是 这里

You might check your MEMSIZE and SORTSIZE settings. More discussion about sort performance is here.

屋顶上的小猫咪 2024-11-08 13:30:26

排序的问题在于,花费时间的不是排序,通常是读入数据集并再次写出。相对而言,排序速度很快。因此,对于 3GB 数据集,等待磁盘提供所有数据就需要花费大量时间。它可以重叠对部分数据进行排序并读取更多数据,但它仍然可能受到 I/O 限制。
也就是说,MEMSIZE 和 SORTSIZE 至少可以让您最大限度地利用可用内存。您需要确保 SAS 一次性读取整个数据集并对其进行排序,然后再次将其写出。如果内存较低,或者 MEMSIZE/SORTSIZE 未正确配置,它将按块对数据集进行排序,然后必须合并这些块。如果可能的话,您确实希望避免“多次排序”,因为它会使所需的时间加倍(必须遍历整个数据集排序块,然后再次遍历所有数据,合并这些块)。我认为您可以从 SASLOG 中得到关于是否是多遍排序的提示。

The thing with sort is that it's not the sorting that takes the time, generally it's the reading the data set in and writing it out again. Sorting is, comparatively, quick. So with a 3GB data set significant time is taken just waiting for the disk to supply all of the data. It can overlap sorting parts of the data with reading more of it in, but it's still likely to be I/O bound.
That said, MEMSIZE and SORTSIZE will at least allow you to make maximum use of your available memory. You need to ensure that SAS will be reading the entire data set in and sorting it in one go and then writing it out again. With lower memory, or if MEMSIZE/SORTSIZE are not suitably configured, it will sort the data set in chunks and then have to merge those chunks. You really want to avoid "multi-pass sort" if at all possible as it will double the time it takes (has to go through the whole data set sorting chunks, then to through all the data again, merging those chunks). I think you get hints from the SASLOG as to whether it is multi-pass sorting or not.

南街女流氓 2024-11-08 13:30:26

一般来说,SAS 并不是这样工作的。 SAS 将数据保存在磁盘驱动器上,并且一次仅读取其中的一小部分。对我来说,这就是 SAS 的优势:我使用 SAS 来处理 RAM 中无法容纳的内容。

您可能对 Stata、R 或其他将数据保存在 RAM 中的软件包感兴趣。向后移动非常容易。即使对于同一项目,也可以在程序之间进行。

In general, that's not how SAS works. SAS keeps your data on your disk drives and only reads a small portion of it at a time. To, me that's the advantage of SAS: I use SAS for stuff that can't fit in RAM.

You might be interested in Stata, R, or another package that keeps your data in RAM. It's pretty easy to move back & forth between the programs, even for the same project.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文