如何修改 C 程序以便 gprof 可以分析它?

发布于 2024-08-12 20:19:37 字数 114 浏览 5 评论 0原文

当我在我的 C 程序上运行 gprof 时,它说我的程序没有累积时间,并且所有函数调用都显示 0 时间。但它确实会计算函数调用次数。

如何修改我的程序,以便 gprof 能够计算某项运行所需的时间?

When I run gprof on my C program it says no time accumulated for my program and shows 0 time for all function calls. However it does count the function calls.

How do I modify my program so that gprof will be able to count how much time something takes to run?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

唠甜嗑 2024-08-19 20:19:37

编译的时候有没有指定-pg?

http://sourceware.org/binutils/docs-2.20/gprof/Compiling .html#Compiling

编译完成后,运行该程序,然后对二进制文件运行 gprof。

例如:

test.c:

#include <stdio.h>

int main ()
{
    int i;
    for (i = 0; i < 10000; i++) {
        printf ("%d\n", i);
    }
    return 0;
}

编译为cc -pg test.c,然后运行为a.out,然后gprof a.out,给出我

granularity: each sample hit covers 4 byte(s) for 1.47% of 0.03 seconds

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 45.6       0.02     0.02    10000     0.00     0.00  __sys_write [10]
 45.6       0.03     0.02        0  100.00%           .mcount (26)
  2.9       0.03     0.00    20000     0.00     0.00  __sfvwrite [6]
  1.5       0.03     0.00    20000     0.00     0.00  memchr [11]
  1.5       0.03     0.00    10000     0.00     0.00  __ultoa [12]
  1.5       0.03     0.00    10000     0.00     0.00  _swrite [9]
  1.5       0.03     0.00    10000     0.00     0.00  vfprintf [2]

你得到什么?

Did you specify -pg when compiling?

http://sourceware.org/binutils/docs-2.20/gprof/Compiling.html#Compiling

Once it is compiled, you run the program and then run gprof on the binary.

E.g.:

test.c:

#include <stdio.h>

int main ()
{
    int i;
    for (i = 0; i < 10000; i++) {
        printf ("%d\n", i);
    }
    return 0;
}

Compile as cc -pg test.c, then run as a.out, then gprof a.out, gives me

granularity: each sample hit covers 4 byte(s) for 1.47% of 0.03 seconds

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 45.6       0.02     0.02    10000     0.00     0.00  __sys_write [10]
 45.6       0.03     0.02        0  100.00%           .mcount (26)
  2.9       0.03     0.00    20000     0.00     0.00  __sfvwrite [6]
  1.5       0.03     0.00    20000     0.00     0.00  memchr [11]
  1.5       0.03     0.00    10000     0.00     0.00  __ultoa [12]
  1.5       0.03     0.00    10000     0.00     0.00  _swrite [9]
  1.5       0.03     0.00    10000     0.00     0.00  vfprintf [2]

What are you getting?

猫九 2024-08-19 20:19:37

我尝试运行 Kinopiko 的示例,但我将迭代次数增加了 100 倍。

test.c:

#include <stdio.h>

int main ()
{
    int i;
    for (i = 0; i < 1000000; i++) {
        printf ("%d\n", i);
    }
    return 0;
}

然后我采用了 10 stackshots(在 VC 下,但您可以使用pstack)。下面是堆栈截图:

9 copies of this stack:
NTDLL! 7c90e514()
KERNEL32! 7c81cbfe()
KERNEL32! 7c81cc75()
KERNEL32! 7c81cc89()
_write() line 168 + 57 bytes
_flush() line 162 + 23 bytes
_ftbuf() line 171 + 9 bytes
printf() line 62 + 14 bytes
main() line 7 + 14 bytes
mainCRTStartup() line 206 + 25 bytes
KERNEL32! 7c817077()

1 copy of this stack:
KERNEL32! 7c81cb96()
KERNEL32! 7c81cc75()
KERNEL32! 7c81cc89()
_write() line 168 + 57 bytes
_flush() line 162 + 23 bytes
_ftbuf() line 171 + 9 bytes
printf() line 62 + 14 bytes
main() line 7 + 14 bytes
mainCRTStartup() line 206 + 25 bytes
KERNEL32! 7c817077()

如果不明显,这会告诉您:

mainCRTStartup() line 206 + 25 bytes Cost ~100% of the time
main() line 7 + 14 bytes             Cost ~100% of the time
printf() line 62 + 14 bytes          Cost ~100% of the time
_ftbuf() line 171 + 9 bytes          Cost ~100% of the time
_flush() line 162 + 23 bytes         Cost ~100% of the time
_write() line 168 + 57 bytes         Cost ~100% of the time

简而言之,程序花费大约 100% 的时间将输出缓冲区作为第 7 行 printf 的一部分刷新到磁盘(或控制台)。

(我所说的“一行成本”是指,在该行的请求上花费的总时间的比例,并且大致是包含该行的样本的比例。
如果可以使该行不花费任何时间,例如通过删除它、跳过它或将其工作传递给无限快的协处理器,则该时间分数就是总时间将缩短的量。因此,如果可以避免执行任何这些代码行,时间就会缩短 95% 到 100% 之间。如果您问“递归怎么样?”,答案是没有什么区别。

现在,也许您想知道其他事情,例如时间是多少例如,在循环中花费。要找出这一点,请删除 printf,因为它一直在占用。也许您想知道纯粹花费在 CPU 时间上而不是系统调用上的时间百分比是多少。要实现这一点,只需丢弃所有未在代码中结束的堆栈截图即可。

我想说的是,如果您正在寻找可以修复的东西以使代码运行得更快,那么 gprof 为您提供的数据,即使您理解它,也几乎没有用处。相比之下,如果您的某些代码导致花费的挂钟时间超出您的预期,堆栈快照将查明它。

I tried running Kinopiko's example, except I increased the number of iterations by a factor of 100.

test.c:

#include <stdio.h>

int main ()
{
    int i;
    for (i = 0; i < 1000000; i++) {
        printf ("%d\n", i);
    }
    return 0;
}

Then I took 10 stackshots (under VC, but you can use pstack). Here are the stackshots:

9 copies of this stack:
NTDLL! 7c90e514()
KERNEL32! 7c81cbfe()
KERNEL32! 7c81cc75()
KERNEL32! 7c81cc89()
_write() line 168 + 57 bytes
_flush() line 162 + 23 bytes
_ftbuf() line 171 + 9 bytes
printf() line 62 + 14 bytes
main() line 7 + 14 bytes
mainCRTStartup() line 206 + 25 bytes
KERNEL32! 7c817077()

1 copy of this stack:
KERNEL32! 7c81cb96()
KERNEL32! 7c81cc75()
KERNEL32! 7c81cc89()
_write() line 168 + 57 bytes
_flush() line 162 + 23 bytes
_ftbuf() line 171 + 9 bytes
printf() line 62 + 14 bytes
main() line 7 + 14 bytes
mainCRTStartup() line 206 + 25 bytes
KERNEL32! 7c817077()

In case it isn't obvious, this tells you that:

mainCRTStartup() line 206 + 25 bytes Cost ~100% of the time
main() line 7 + 14 bytes             Cost ~100% of the time
printf() line 62 + 14 bytes          Cost ~100% of the time
_ftbuf() line 171 + 9 bytes          Cost ~100% of the time
_flush() line 162 + 23 bytes         Cost ~100% of the time
_write() line 168 + 57 bytes         Cost ~100% of the time

In a nutshell, the program spends ~100% of it's time flushing to disk (or console) the output buffer as part of the printf on line 7.

(What I mean by "Cost of a line" is - it is the fraction of total time spent at the request of that line, and that's roughly the fraction of samples that contain it.
If that line could be made to take no time, such as by removing it, skipping over it, or passing its work off to an infinitely fast coprocessor, that time fraction is how much the total time would shrink. So if the execution of any of these lines of code could be avoided, time would shrink by somewhere in the range of 95% to 100%. If you were to ask "What about recursion?", the answer is It Makes No Difference.)

Now, maybe you want to know something else, like how much time is spent in the loop, for example. To find that out, remove the printf because it's hogging all the time. Maybe you want to know what % of time is spent purely in CPU time, not in system calls. To get that, just throw away any stackshots that don't end in your code.

The point I'm trying to make is if you're looking for things you can fix to make the code run faster, the data gprof gives you, even if you understand it, is almost useless. By comparison, if there is some of your code that is causing more wall-clock time to be spent than you would like, stackshots will pinpoint it.

江心雾 2024-08-19 20:19:37

gprof 的一个问题是:它不适用于动态链接库中的代码。为此,您需要使用sprof。请参阅此答案:gprof:如何为链接到主程序的共享库中的函数生成调用图

One gotcha with gprof: it doesn't work with code in dynamically-linked libraries. For that, you need to use sprof. See this answer: gprof : How to generate call graph for functions in shared library that is linked to main program

千笙结 2024-08-19 20:19:37

首先使用 -g 编译您的应用程序,然后检查您使用的 CPU 计数器。
如果您的应用程序运行得非常快,那么 gprof 可能会错过所有事件或少于所需的事件(减少要读取的事件数量)。

实际上,分析应该向您显示 CPU_CLK_UNHALTEDINST_RETIRED 事件,而无需任何特殊开关。但有了这些数据,您只能说明代码的执行情况:INST_RETIRED/CPU_CLK_UNHALTED。

尝试使用英特尔 VTune 分析器 - 它免费 30 天并可用于教育。

First compile you application with -g, and check what CPU counters are you using.
If your application runs very quick than gprof could just miss all events or less that required (reduce the number of events to read).

Actually profiling should show you CPU_CLK_UNHALTED or INST_RETIRED events without any special switches. But with such data you'll be able only to say how well your code it performing: INST_RETIRED/CPU_CLK_UNHALTED.

Try to use Intel VTune profiler - it's free for 30 days and for education.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文