C++ 到底是什么？分析（谷歌CPU性能工具）测量？

发布于 2024-07-24 07:09:34 字数 2151 浏览 3 评论 0原文

我尝试开始使用 Google Perf Tools 来分析一些 CPU 密集型应用程序。这是一种统计计算，使用“ofstream”将每个步骤转储到文件中。我不是 C++ 专家，所以我很难找到瓶颈。我的第一遍给出了结果：

Total: 857 samples
     357  41.7%  41.7%      357  41.7% _write$UNIX2003
     134  15.6%  57.3%      134  15.6% _exp$fenv_access_off
     109  12.7%  70.0%      276  32.2% scythe::dnorm
     103  12.0%  82.0%      103  12.0% _log$fenv_access_off
      58   6.8%  88.8%       58   6.8% scythe::const_matrix_forward_iterator::operator*
      37   4.3%  93.1%       37   4.3% scythe::matrix_forward_iterator::operator*
      15   1.8%  94.9%       47   5.5% std::transform
      13   1.5%  96.4%      486  56.7% SliceStep::DoStep
      10   1.2%  97.5%       10   1.2% 0x0002726c
       5   0.6%  98.1%        5   0.6% 0x000271c7
       5   0.6%  98.7%        5   0.6% _write$NOCANCEL$UNIX2003

这令人惊讶，因为所有实际计算都发生在 SliceStep::DoStep 中。 “_write$UNIX2003”（我在哪里可以找到这是什么？）似乎来自写入输出文件。现在，令我困惑的是，如果我注释掉所有 outfile << “text” 语句并运行 pprof，95% 位于 SliceStep::DoStep 中，并且“_write$UNIX2003”消失。然而，从总时间来看，我的应用程序并没有加速。整个过程的速度加快了不到百分之一。

我缺少什么？

添加：不带 outfile << 语句的 pprof 输出是：

Total: 790 samples
     205  25.9%  25.9%      205  25.9% _exp$fenv_access_off
     170  21.5%  47.5%      170  21.5% _log$fenv_access_off
     162  20.5%  68.0%      437  55.3% scythe::dnorm
      83  10.5%  78.5%       83  10.5% scythe::const_matrix_forward_iterator::operator*
      70   8.9%  87.3%       70   8.9% scythe::matrix_forward_iterator::operator*
      28   3.5%  90.9%       78   9.9% std::transform
      26   3.3%  94.2%       26   3.3% 0x00027262
      12   1.5%  95.7%       12   1.5% _write$NOCANCEL$UNIX2003
      11   1.4%  97.1%      764  96.7% SliceStep::DoStep
       9   1.1%  98.2%        9   1.1% 0x00027253
       6   0.8%  99.0%        6   0.8% 0x000274a6

这看起来像我所期望的那样，只是我没有看到性能有明显的提高（10 秒计算为 0.1 秒）。代码本质上是：

ofstream outfile("out.txt");
for loop:
  SliceStep::DoStep()
  outfile << 'result'
outfile.close()

更新：我使用 boost::timer 计时，从探查器开始的地方开始，到结束的地方结束。我不使用线程或任何花哨的东西。

原文

I trying to get started with Google Perf Tools to profile some CPU intensive applications. It's a statistical calculation that dumps each step to a file using `ofstream'. I'm not a C++ expert so I'm having troubling finding the bottleneck. My first pass gives results:

Total: 857 samples
     357  41.7%  41.7%      357  41.7% _write$UNIX2003
     134  15.6%  57.3%      134  15.6% _exp$fenv_access_off
     109  12.7%  70.0%      276  32.2% scythe::dnorm
     103  12.0%  82.0%      103  12.0% _log$fenv_access_off
      58   6.8%  88.8%       58   6.8% scythe::const_matrix_forward_iterator::operator*
      37   4.3%  93.1%       37   4.3% scythe::matrix_forward_iterator::operator*
      15   1.8%  94.9%       47   5.5% std::transform
      13   1.5%  96.4%      486  56.7% SliceStep::DoStep
      10   1.2%  97.5%       10   1.2% 0x0002726c
       5   0.6%  98.1%        5   0.6% 0x000271c7
       5   0.6%  98.7%        5   0.6% _write$NOCANCEL$UNIX2003

This is surprising, since all the real calculation occurs in SliceStep::DoStep. The "_write$UNIX2003" (where can I find out what this is?) appears to be coming from writing the output file. Now, what confuses me is that if I comment out all the outfile << "text" statements and run pprof, 95% is in SliceStep::DoStep and `_write$UNIX2003' goes away. However my application does not speed up, as measured by total time. The whole thing speeds up less than 1 percent.

What am I missing?

Added:
The pprof output without the outfile << statements is:

Total: 790 samples
     205  25.9%  25.9%      205  25.9% _exp$fenv_access_off
     170  21.5%  47.5%      170  21.5% _log$fenv_access_off
     162  20.5%  68.0%      437  55.3% scythe::dnorm
      83  10.5%  78.5%       83  10.5% scythe::const_matrix_forward_iterator::operator*
      70   8.9%  87.3%       70   8.9% scythe::matrix_forward_iterator::operator*
      28   3.5%  90.9%       78   9.9% std::transform
      26   3.3%  94.2%       26   3.3% 0x00027262
      12   1.5%  95.7%       12   1.5% _write$NOCANCEL$UNIX2003
      11   1.4%  97.1%      764  96.7% SliceStep::DoStep
       9   1.1%  98.2%        9   1.1% 0x00027253
       6   0.8%  99.0%        6   0.8% 0x000274a6

This looks like what I'd expect, except I see no visible increase in performance (.1 second on a 10 second calculation). The code is essentially:

ofstream outfile("out.txt");
for loop:
  SliceStep::DoStep()
  outfile << 'result'
outfile.close()

Update: I timing using boost::timer, starting where the profiler starts and ending where it ends. I do not use threads or anything fancy.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

只是我以为 2024-07-31 07:09:35

根据我的评论：

您从分析器获得的数字显示，如果没有 print 语句，程序的速度应该提高 40% 左右。

然而，运行时间几乎保持不变。

显然，其中一项测量肯定是错误的。这意味着您必须进行更多、更好的测量。

首先，我建议从另一个简单的工具开始：时间命令。这应该能让你大致了解你的时间都花在哪里了。

如果结果仍然不是结论性的，您需要一个更好的测试用例：

使用更大的问题
在测量之前进行热身。进行一些循环并随后开始任何测量（在同一过程中）。

Tiristan：一切都在用户手中。我认为我正在做的事情非常简单......文件一直打开的事实意味着什么吗？

这意味着分析器是错误的。

使用 python 将 100000 行打印到控制台会产生如下结果：

for i in xrange(100000):
    print i

To console:

time python print.py
[...]
real    0m2.370s
user    0m0.156s
sys     0m0.232s

Versus:

time python test.py > /dev/null

real    0m0.133s
user    0m0.116s
sys     0m0.008s

My point is:
您的内部测量和时间显示您没有从禁用输出中获得任何好处。 Google Perf Tools 说你应该这么做。谁错了？

From my comments:

The numbers you get from your profiler say, that the program should be around 40% faster without the print statements.

The runtime, however, stays nearly the same.

Obviously one of the measurements must be wrong. That means you have to do more and better measurements.

First I suggest starting with another easy tool: the time command. This should get you a rough idea where your time is spend.

If the results are still not conclusive you need a better testcase:

Use a larger problem
Do a warmup before measuring. Do some loops and start any measurement afterwards (in the same process).

Tiristan: It's all in user. What I'm doing is pretty simple, I think... Does the fact that the file is open the whole time mean anything?

That means the profiler is wrong.

Printing 100000 lines to the console using python results in something like:

for i in xrange(100000):
    print i

To console:

time python print.py
[...]
real    0m2.370s
user    0m0.156s
sys     0m0.232s

Versus:

time python test.py > /dev/null

real    0m0.133s
user    0m0.116s
sys     0m0.008s

My point is:
Your internal measurements and time show you do not gain anything from disabling output. Google Perf Tools says you should. Who's wrong?

回复收藏 0 原文