什么对 SAS 数据集性能影响更大 - 观测值数量还是变量数量?

发布于 2024-08-07 12:58:37 字数 175 浏览 5 评论 0原文

在 SAS 中使用不同的数据集一两个月后,在我看来,数据集的变量越多,在数据集上运行 PROC 和其他操作所需的时间就越多。然而,如果我有 5 个变量,但有 100 万个观察值,性能不会受到太大影响。

虽然我对观察或变量是否影响性能感兴趣,但我也想知道在查看 SAS 性能时是否还遗漏了其他因素?

谢谢!

After working with different data sets in SAS for a month or two, it seems to me that the more variables a data set has, the more time it takes to run PROCs and other operations on the data set. Yet if I have for example 5 variables, but 1 million observations, performance is not impacted too much.

While I'm interested in if observations or variables affect performance, I was also wondering if there are other factors I'm missing in looking at SAS performance?

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

陌伤浅笑 2024-08-14 12:58:37

对于相同大小的数据集(行*列),我相信具有更多变量的数据集通常会更慢。我尝试创建两个包含 1 行和 10000 列或 1 列和 10000 行的数据集。变量越多,占用的内存和时间就越多。

options fullstimer;
data a;
    retain var1-var10000 1;
run;
data b(drop=i);
    do i=1 to 10000;
    var1=i;
    output;
    end;
run;

在日志上

31   options fullstimer;
32   data a;
33       retain var1-var10000 1;
34   run;

NOTE: The data set WORK.A has 1 observations and 10000 variables.
NOTE: DATA statement used (Total process time):
      real time           0.23 seconds
      user cpu time       0.20 seconds
      system cpu time     0.03 seconds
      Memory                            5382k
      OS Memory                         14208k
      Timestamp            10/14/2009  2:03:57 PM


35   data b(drop=i);
36       do i=1 to 10000;
37       var1=i;
38       output;
39       end;
40   run;

NOTE: The data set WORK.B has 10000 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      user cpu time       0.00 seconds
      system cpu time     0.01 seconds
      Memory                            173k
      OS Memory                         12144k
      Timestamp            10/14/2009  2:03:57 PM

您还应该查看 BUFNO=BUFSIZE=< /a>.如果您必须多次访问数据集,您可以考虑使用 SASFILE 以及将整个数据集存储在内存中。

For the same size data set (rows*columns) the one with more variables will usually be slower I believe. I tried creating two data sets with either 1 row and 10000 columns, or 1 column and 10000 rows. The one with more variables took a lot more memory and time.

options fullstimer;
data a;
    retain var1-var10000 1;
run;
data b(drop=i);
    do i=1 to 10000;
    var1=i;
    output;
    end;
run;

On the Log

31   options fullstimer;
32   data a;
33       retain var1-var10000 1;
34   run;

NOTE: The data set WORK.A has 1 observations and 10000 variables.
NOTE: DATA statement used (Total process time):
      real time           0.23 seconds
      user cpu time       0.20 seconds
      system cpu time     0.03 seconds
      Memory                            5382k
      OS Memory                         14208k
      Timestamp            10/14/2009  2:03:57 PM


35   data b(drop=i);
36       do i=1 to 10000;
37       var1=i;
38       output;
39       end;
40   run;

NOTE: The data set WORK.B has 10000 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      user cpu time       0.00 seconds
      system cpu time     0.01 seconds
      Memory                            173k
      OS Memory                         12144k
      Timestamp            10/14/2009  2:03:57 PM

You should also check out BUFNO= and BUFSIZE=. If you have to access a data set many times, you might consider using SASFILE as well to store the entire data set in memory.

枉心 2024-08-14 12:58:37

我不太清楚(并且正在做出有根据的猜测),但我想这与多种因素有关,包括将整个记录读入 PDV,这意味着内存中存在更多包含许多变量的数据。

使用压缩数据集进行一些测量可能是值得的,因为 I/O 通常是瓶颈。

SAS 数据集选项:

data foo(compress=yes);
...
run;

I can't quite elucidate (and am making an educated guess), but I imagine it has something to do with a combination of factors, including that a whole record is read into the PDV, which means more data sits in memory with many variables.

It might be worth doing some measurements with compressed datasets, because I/O is often the bottleneck.

SAS dataset option:

data foo(compress=yes);
...
run;
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文