为什么 gprof 大大低估了程序的运行时间?
我有这个程序需要 2.34 秒才能运行,而 gprof 说它只需要 1.18 秒。我在其他地方读过答案,表明如果程序受 I/O 限制,gprof 可能会出错,但这个程序显然不是。
对于我正在尝试分析的有用程序,也会发生这种情况。它并不特定于这个简单的测试用例。
(另外,在这种情况下,gprof 表示 main() 占用了超过 100% 的程序运行时间,这是一个非常愚蠢的错误,但并没有真正给我带来问题。)
$ cat test.c
int main() {
int i;
for (i=0;i<1000000000;i++);
}
$ gcc test.c -o test
$ time ./test
real 0m2.342s
user 0m2.340s
sys 0m0.000s
$ gcc test.c -o test -pg
$ time ./test
real 0m2.342s
user 0m2.340s
sys 0m0.000s
$ gprof test |head
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
101.33 1.18 1.18 main
% the percentage of the total running time of the
time program used by this function.
I have this program that takes 2.34 seconds to run, and gprof says it only takes 1.18 seconds. I've read answers elsewhere suggesting that gprof can get it wrong if e.g the program is I/O bound, but this program clearly isn't.
This also happens for a useful program I'm trying to profile. It's not specific to this trivial test case.
(Also in this case gprof says that main() takes more than 100% of the program's running time, which is a pretty stupid bug but not really causing problems for me.)
$ cat test.c
int main() {
int i;
for (i=0;i<1000000000;i++);
}
$ gcc test.c -o test
$ time ./test
real 0m2.342s
user 0m2.340s
sys 0m0.000s
$ gcc test.c -o test -pg
$ time ./test
real 0m2.342s
user 0m2.340s
sys 0m0.000s
$ gprof test |head
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
101.33 1.18 1.18 main
% the percentage of the total running time of the
time program used by this function.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我建议放弃
gprof
并切换到oprofile
。任何将检测插入到程序中的分析都会本质上影响性能,从而可能会扭曲结果或使结果无效。使用oprofile
,您不必构建具有分析支持的程序或获取特殊的支持分析的库;它通过统计方法工作,基本上对指令指针进行采样(在内核协助下)并使用样本计数来估计每个函数花费了多少时间。I would recommend dropping
gprof
and switching tooprofile
. Any profiling that inserts instrumentation into your program will inherently affect the performance in ways that might skew or invalidate the results. Withoprofile
you don't have to build your program with profiling support or get special profiling-enabled libraries; it works by statistical methods, basically sampling the instruction pointer (with kernel assistance) and using the sample counts to estimate how much time was spent in each function.首先,对一个在 2.3 秒内完成的程序进行分析有点可笑。你确实需要一个长时间运行的程序来很好地测量程序热点等。但我离题了......
为了回答你的第一个问题,时间(命令行实用程序)乘以整个过程的执行(包括分析仪器本身)。当您在构建中启用分析时,每次运行程序时,程序都会写入一个包含执行时间等的 gmon.out 文件。也就是说,每次运行程序时都会完成分析的艰苦工作。分析工具努力分离出其自身对时间统计的影响,在本例中,分析本身似乎占了 2.34 - 1.18 = 1.16 秒的运行时间(按时间报告)。 gprof 程序本身本质上只是分析并重新格式化 gmon.out 程序中存储的运行时统计信息。需要明确的是,真正的分析发生在程序运行期间,而不是 gprof 运行期间。
最后,gprof 输出直接回答了您的第二个问题。它以 1/100 秒的速度对程序的执行进行采样。间隔并将整个 0.01 秒的时间归功于样本期间发生的任何事情。如果您的程序执行时间不是 0.01 秒的精确倍数,您将得到的数字加起来不等于 100%。再次必须强调的是,对运行速度如此之快的程序进行分析是很容易出错的,而且这种明显的错误肯定可以通过更长的采样间隔(即运行时间)来缓解。此外,gprof 对自身开销的计算并不完美,这可能进一步导致看似可笑的 101.33% 数字。
这是一个统计过程,并不完美。您必须对结果持保留态度。
祝你好运!
First of all, profiling a program that completes in 2.3 seconds is a little bit ludicrous. You really need a long-running program to get a good measurement of program hot spots, etc. But I digress...
To answer your first question, time (the command-line utility) times the execution of the whole process (including the profiling instrumentation itself). When you enable profiling in your build, the program writes a gmon.out file containing execution times etc. each time you run the program. That is, the hard work of profiling is done each time you run the program. The profiling instrumentation tries hard to separate out its own impact on the time accounting, and in this case, it seems that the profiling itself accounted for 2.34 - 1.18 = 1.16s of the runtime (as reported by time). The gprof program itself essentially just analyzes and reformats the runtime statistics stored in the gmon.out program. To be clear about this, the real profiling happens during your program's run, not during the gprof run.
Finally, the gprof output answers your second question directly. It samples your program's execution at 1/100 sec. intervals and gives credit for that whole 0.01 s to whatever happened to be executing during the sample. If your program doesn't take an exact multiple of 0.01 seconds to execute, you'll get numbers that don't add up to 100%. Again, it must be emphasized that profiling a program that runs this quickly is quite error prone, and this apparent bug would certainly be mitigated by a longer sampling interval (i.e. runtime). Also, gprof's accounting for its own overhead is imperfect, which might further contribute to the seemingly ludicrous 101.33% number.
It's a statistical process, and it's not perfect. You have to interpret the results with a grain of salt.
Good luck!
首先,为了回答您的问题,gprof 不计算阻塞时间,因此如果机器中“同时”发生其他任何事情,您将看到这种差异。另外,如果您的程序执行任何 I/O,也不会被计算在内。
gprof
实际上只对非常有限的一类程序有用。以下是问题列表。
First, to answer your question,
gprof
does not count blocked time, so if anything else is going on in the machine at "the same time" you will see that discrepancy. Also, if your program does any I/O, that will not be counted either.gprof
is really only useful for a very restricted class of programs.Here is a list of the issues.