C++ 到底是什么? 分析(谷歌CPU性能工具)测量?

发布于 2024-07-24 07:09:34 字数 2151 浏览 3 评论 0原文

我尝试开始使用 Google Perf Tools 来分析一些 CPU 密集型应用程序。 这是一种统计计算,使用“ofstream”将每个步骤转储到文件中。 我不是 C++ 专家,所以我很难找到瓶颈。 我的第一遍给出了结果:

Total: 857 samples
     357  41.7%  41.7%      357  41.7% _write$UNIX2003
     134  15.6%  57.3%      134  15.6% _exp$fenv_access_off
     109  12.7%  70.0%      276  32.2% scythe::dnorm
     103  12.0%  82.0%      103  12.0% _log$fenv_access_off
      58   6.8%  88.8%       58   6.8% scythe::const_matrix_forward_iterator::operator*
      37   4.3%  93.1%       37   4.3% scythe::matrix_forward_iterator::operator*
      15   1.8%  94.9%       47   5.5% std::transform
      13   1.5%  96.4%      486  56.7% SliceStep::DoStep
      10   1.2%  97.5%       10   1.2% 0x0002726c
       5   0.6%  98.1%        5   0.6% 0x000271c7
       5   0.6%  98.7%        5   0.6% _write$NOCANCEL$UNIX2003

这令人惊讶,因为所有实际计算都发生在 SliceStep::DoStep 中。 “_write$UNIX2003”(我在哪里可以找到这是什么?)似乎来自写入输出文件。 现在,令我困惑的是,如果我注释掉所有 outfile << “text” 语句并运行 pprof,95% 位于 SliceStep::DoStep 中,并且“_write$UNIX2003”消失。 然而,从总时间来看,我的应用程序并没有加速。 整个过程的速度加快了不到百分之一。

我缺少什么?

添加: 不带 outfile << 语句的 pprof 输出是:

Total: 790 samples
     205  25.9%  25.9%      205  25.9% _exp$fenv_access_off
     170  21.5%  47.5%      170  21.5% _log$fenv_access_off
     162  20.5%  68.0%      437  55.3% scythe::dnorm
      83  10.5%  78.5%       83  10.5% scythe::const_matrix_forward_iterator::operator*
      70   8.9%  87.3%       70   8.9% scythe::matrix_forward_iterator::operator*
      28   3.5%  90.9%       78   9.9% std::transform
      26   3.3%  94.2%       26   3.3% 0x00027262
      12   1.5%  95.7%       12   1.5% _write$NOCANCEL$UNIX2003
      11   1.4%  97.1%      764  96.7% SliceStep::DoStep
       9   1.1%  98.2%        9   1.1% 0x00027253
       6   0.8%  99.0%        6   0.8% 0x000274a6

这看起来像我所期望的那样,只是我没有看到性能有明显的提高(10 秒计算为 0.1 秒)。 代码本质上是:

ofstream outfile("out.txt");
for loop:
  SliceStep::DoStep()
  outfile << 'result'
outfile.close()

更新:我使用 boost::timer 计时,从探查器开始的地方开始,到结束的地方结束。 我不使用线程或任何花哨的东西。

I trying to get started with Google Perf Tools to profile some CPU intensive applications. It's a statistical calculation that dumps each step to a file using `ofstream'. I'm not a C++ expert so I'm having troubling finding the bottleneck. My first pass gives results:

Total: 857 samples
     357  41.7%  41.7%      357  41.7% _write$UNIX2003
     134  15.6%  57.3%      134  15.6% _exp$fenv_access_off
     109  12.7%  70.0%      276  32.2% scythe::dnorm
     103  12.0%  82.0%      103  12.0% _log$fenv_access_off
      58   6.8%  88.8%       58   6.8% scythe::const_matrix_forward_iterator::operator*
      37   4.3%  93.1%       37   4.3% scythe::matrix_forward_iterator::operator*
      15   1.8%  94.9%       47   5.5% std::transform
      13   1.5%  96.4%      486  56.7% SliceStep::DoStep
      10   1.2%  97.5%       10   1.2% 0x0002726c
       5   0.6%  98.1%        5   0.6% 0x000271c7
       5   0.6%  98.7%        5   0.6% _write$NOCANCEL$UNIX2003

This is surprising, since all the real calculation occurs in SliceStep::DoStep. The "_write$UNIX2003" (where can I find out what this is?) appears to be coming from writing the output file. Now, what confuses me is that if I comment out all the outfile << "text" statements and run pprof, 95% is in SliceStep::DoStep and `_write$UNIX2003' goes away. However my application does not speed up, as measured by total time. The whole thing speeds up less than 1 percent.

What am I missing?

Added:
The pprof output without the outfile << statements is:

Total: 790 samples
     205  25.9%  25.9%      205  25.9% _exp$fenv_access_off
     170  21.5%  47.5%      170  21.5% _log$fenv_access_off
     162  20.5%  68.0%      437  55.3% scythe::dnorm
      83  10.5%  78.5%       83  10.5% scythe::const_matrix_forward_iterator::operator*
      70   8.9%  87.3%       70   8.9% scythe::matrix_forward_iterator::operator*
      28   3.5%  90.9%       78   9.9% std::transform
      26   3.3%  94.2%       26   3.3% 0x00027262
      12   1.5%  95.7%       12   1.5% _write$NOCANCEL$UNIX2003
      11   1.4%  97.1%      764  96.7% SliceStep::DoStep
       9   1.1%  98.2%        9   1.1% 0x00027253
       6   0.8%  99.0%        6   0.8% 0x000274a6

This looks like what I'd expect, except I see no visible increase in performance (.1 second on a 10 second calculation). The code is essentially:

ofstream outfile("out.txt");
for loop:
  SliceStep::DoStep()
  outfile << 'result'
outfile.close()

Update: I timing using boost::timer, starting where the profiler starts and ending where it ends. I do not use threads or anything fancy.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

只是我以为 2024-07-31 07:09:35

根据我的评论:

您从分析器获得的数字显示,如果没有 print 语句,程序的速度应该提高 40% 左右。

然而,运行时间几乎保持不变。

显然,其中一项测量肯定是错误的。 这意味着您必须进行更多、更好的测量。

首先,我建议从另一个简单的工具开始:时间命令。 这应该能让你大致了解你的时间都花在哪里了。

如果结果仍然不是结论性的,您需要一个更好的测试用例:

  • 使用更大的问题
  • 在测量之前进行热身。 进行一些循环并随后开始任何测量(在同一过程中)。

Tiristan:一切都在用户手中。 我认为我正在做的事情非常简单......文件一直打开的事实意味着什么吗?

这意味着分析器是错误的。

使用 python 将 100000 行打印到控制台会产生如下结果:

for i in xrange(100000):
    print i

To console:

time python print.py
[...]
real    0m2.370s
user    0m0.156s
sys     0m0.232s

Versus:

time python test.py > /dev/null

real    0m0.133s
user    0m0.116s
sys     0m0.008s

My point is:
您的内部测量时间显示您没有从禁用输出中获得任何好处。 Google Perf Tools 说你应该这么做。 谁错了?

From my comments:

The numbers you get from your profiler say, that the program should be around 40% faster without the print statements.

The runtime, however, stays nearly the same.

Obviously one of the measurements must be wrong. That means you have to do more and better measurements.

First I suggest starting with another easy tool: the time command. This should get you a rough idea where your time is spend.

If the results are still not conclusive you need a better testcase:

  • Use a larger problem
  • Do a warmup before measuring. Do some loops and start any measurement afterwards (in the same process).

Tiristan: It's all in user. What I'm doing is pretty simple, I think... Does the fact that the file is open the whole time mean anything?

That means the profiler is wrong.

Printing 100000 lines to the console using python results in something like:

for i in xrange(100000):
    print i

To console:

time python print.py
[...]
real    0m2.370s
user    0m0.156s
sys     0m0.232s

Versus:

time python test.py > /dev/null

real    0m0.133s
user    0m0.116s
sys     0m0.008s

My point is:
Your internal measurements and time show you do not gain anything from disabling output. Google Perf Tools says you should. Who's wrong?

彼岸花ソ最美的依靠 2024-07-31 07:09:35

_write$UNIX2003 可能指的是 write POSIX 系统调用,它输出到终端。 与几乎任何其他东西相比,I/O 都非常慢,因此如果您正在编写相当多的输出,那么您的程序在那里花费大量时间是有道理的。

我不确定为什么当您删除输出时您的程序不会加速,但我无法仅根据您提供的信息进行猜测。 很高兴看到一些代码,甚至是删除 cout 语句后的 perftools 输出。

_write$UNIX2003 is probably referring to the write POSIX system call, which outputs to the terminal. I/O is very slow compared to almost anything else, so it makes sense that your program is spending a lot of time there if you are writing a fair bit of output.

I'm not sure why your program wouldn't speed up when you remove the output, but I can't really make a guess on only the information you've given. It would be nice to see some of the code, or even the perftools output when the cout statement is removed.

生活了然无味 2024-07-31 07:09:35

Google perftools 收集调用堆栈的样本,因此您需要的是对这些样本有一定的了解。

根据文档,您可以按语句或地址粒度显示调用图。 这应该告诉你你需要知道什么。

Google perftools collects samples of the call stack, so what you need is to get some visibility into those.

According to the doc, you can display the call graph at statement or address granularity. That should tell you what you need to know.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文