C++ 到底是什么? 分析(谷歌CPU性能工具)测量?
我尝试开始使用 Google Perf Tools 来分析一些 CPU 密集型应用程序。 这是一种统计计算,使用“ofstream”将每个步骤转储到文件中。 我不是 C++ 专家,所以我很难找到瓶颈。 我的第一遍给出了结果:
Total: 857 samples 357 41.7% 41.7% 357 41.7% _write$UNIX2003 134 15.6% 57.3% 134 15.6% _exp$fenv_access_off 109 12.7% 70.0% 276 32.2% scythe::dnorm 103 12.0% 82.0% 103 12.0% _log$fenv_access_off 58 6.8% 88.8% 58 6.8% scythe::const_matrix_forward_iterator::operator* 37 4.3% 93.1% 37 4.3% scythe::matrix_forward_iterator::operator* 15 1.8% 94.9% 47 5.5% std::transform 13 1.5% 96.4% 486 56.7% SliceStep::DoStep 10 1.2% 97.5% 10 1.2% 0x0002726c 5 0.6% 98.1% 5 0.6% 0x000271c7 5 0.6% 98.7% 5 0.6% _write$NOCANCEL$UNIX2003
这令人惊讶,因为所有实际计算都发生在 SliceStep::DoStep 中。 “_write$UNIX2003”(我在哪里可以找到这是什么?)似乎来自写入输出文件。 现在,令我困惑的是,如果我注释掉所有 outfile << “text”
语句并运行 pprof,95% 位于 SliceStep::DoStep
中,并且“_write$UNIX2003”消失。 然而,从总时间来看,我的应用程序并没有加速。 整个过程的速度加快了不到百分之一。
我缺少什么?
添加: 不带 outfile <<
语句的 pprof 输出是:
Total: 790 samples 205 25.9% 25.9% 205 25.9% _exp$fenv_access_off 170 21.5% 47.5% 170 21.5% _log$fenv_access_off 162 20.5% 68.0% 437 55.3% scythe::dnorm 83 10.5% 78.5% 83 10.5% scythe::const_matrix_forward_iterator::operator* 70 8.9% 87.3% 70 8.9% scythe::matrix_forward_iterator::operator* 28 3.5% 90.9% 78 9.9% std::transform 26 3.3% 94.2% 26 3.3% 0x00027262 12 1.5% 95.7% 12 1.5% _write$NOCANCEL$UNIX2003 11 1.4% 97.1% 764 96.7% SliceStep::DoStep 9 1.1% 98.2% 9 1.1% 0x00027253 6 0.8% 99.0% 6 0.8% 0x000274a6
这看起来像我所期望的那样,只是我没有看到性能有明显的提高(10 秒计算为 0.1 秒)。 代码本质上是:
ofstream outfile("out.txt");
for loop:
SliceStep::DoStep()
outfile << 'result'
outfile.close()
更新:我使用 boost::timer 计时,从探查器开始的地方开始,到结束的地方结束。 我不使用线程或任何花哨的东西。
I trying to get started with Google Perf Tools to profile some CPU intensive applications. It's a statistical calculation that dumps each step to a file using `ofstream'. I'm not a C++ expert so I'm having troubling finding the bottleneck. My first pass gives results:
Total: 857 samples 357 41.7% 41.7% 357 41.7% _write$UNIX2003 134 15.6% 57.3% 134 15.6% _exp$fenv_access_off 109 12.7% 70.0% 276 32.2% scythe::dnorm 103 12.0% 82.0% 103 12.0% _log$fenv_access_off 58 6.8% 88.8% 58 6.8% scythe::const_matrix_forward_iterator::operator* 37 4.3% 93.1% 37 4.3% scythe::matrix_forward_iterator::operator* 15 1.8% 94.9% 47 5.5% std::transform 13 1.5% 96.4% 486 56.7% SliceStep::DoStep 10 1.2% 97.5% 10 1.2% 0x0002726c 5 0.6% 98.1% 5 0.6% 0x000271c7 5 0.6% 98.7% 5 0.6% _write$NOCANCEL$UNIX2003
This is surprising, since all the real calculation occurs in SliceStep::DoStep. The "_write$UNIX2003" (where can I find out what this is?) appears to be coming from writing the output file. Now, what confuses me is that if I comment out all the outfile << "text"
statements and run pprof, 95% is in SliceStep::DoStep
and `_write$UNIX2003' goes away. However my application does not speed up, as measured by total time. The whole thing speeds up less than 1 percent.
What am I missing?
Added:
The pprof output without the outfile <<
statements is:
Total: 790 samples 205 25.9% 25.9% 205 25.9% _exp$fenv_access_off 170 21.5% 47.5% 170 21.5% _log$fenv_access_off 162 20.5% 68.0% 437 55.3% scythe::dnorm 83 10.5% 78.5% 83 10.5% scythe::const_matrix_forward_iterator::operator* 70 8.9% 87.3% 70 8.9% scythe::matrix_forward_iterator::operator* 28 3.5% 90.9% 78 9.9% std::transform 26 3.3% 94.2% 26 3.3% 0x00027262 12 1.5% 95.7% 12 1.5% _write$NOCANCEL$UNIX2003 11 1.4% 97.1% 764 96.7% SliceStep::DoStep 9 1.1% 98.2% 9 1.1% 0x00027253 6 0.8% 99.0% 6 0.8% 0x000274a6
This looks like what I'd expect, except I see no visible increase in performance (.1 second on a 10 second calculation). The code is essentially:
ofstream outfile("out.txt");
for loop:
SliceStep::DoStep()
outfile << 'result'
outfile.close()
Update: I timing using boost::timer, starting where the profiler starts and ending where it ends. I do not use threads or anything fancy.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
根据我的评论:
您从分析器获得的数字显示,如果没有 print 语句,程序的速度应该提高 40% 左右。
然而,运行时间几乎保持不变。
显然,其中一项测量肯定是错误的。 这意味着您必须进行更多、更好的测量。
首先,我建议从另一个简单的工具开始:时间命令。 这应该能让你大致了解你的时间都花在哪里了。
如果结果仍然不是结论性的,您需要一个更好的测试用例:
这意味着分析器是错误的。
使用 python 将 100000 行打印到控制台会产生如下结果:
To console:
Versus:
My point is:
您的内部测量和时间显示您没有从禁用输出中获得任何好处。 Google Perf Tools 说你应该这么做。 谁错了?
From my comments:
The numbers you get from your profiler say, that the program should be around 40% faster without the print statements.
The runtime, however, stays nearly the same.
Obviously one of the measurements must be wrong. That means you have to do more and better measurements.
First I suggest starting with another easy tool: the time command. This should get you a rough idea where your time is spend.
If the results are still not conclusive you need a better testcase:
That means the profiler is wrong.
Printing 100000 lines to the console using python results in something like:
To console:
Versus:
My point is:
Your internal measurements and time show you do not gain anything from disabling output. Google Perf Tools says you should. Who's wrong?
_write$UNIX2003 可能指的是
write
POSIX 系统调用,它输出到终端。 与几乎任何其他东西相比,I/O 都非常慢,因此如果您正在编写相当多的输出,那么您的程序在那里花费大量时间是有道理的。我不确定为什么当您删除输出时您的程序不会加速,但我无法仅根据您提供的信息进行猜测。 很高兴看到一些代码,甚至是删除 cout 语句后的 perftools 输出。
_write$UNIX2003 is probably referring to the
write
POSIX system call, which outputs to the terminal. I/O is very slow compared to almost anything else, so it makes sense that your program is spending a lot of time there if you are writing a fair bit of output.I'm not sure why your program wouldn't speed up when you remove the output, but I can't really make a guess on only the information you've given. It would be nice to see some of the code, or even the perftools output when the cout statement is removed.
Google perftools 收集调用堆栈的样本,因此您需要的是对这些样本有一定的了解。
根据文档,您可以按语句或地址粒度显示调用图。 这应该告诉你你需要知道什么。
Google perftools collects samples of the call stack, so what you need is to get some visibility into those.
According to the doc, you can display the call graph at statement or address granularity. That should tell you what you need to know.