当前位置：文江博客话题详情

如何修改 C 程序以便 gprof 可以分析它？

发布于 2024-08-12 20:19:37 字数 114 浏览 12 评论 0原文

当我在我的 C 程序上运行 gprof 时，它说我的程序没有累积时间，并且所有函数调用都显示 0 时间。但它确实会计算函数调用次数。

如何修改我的程序，以便 gprof 能够计算某项运行所需的时间？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

唠甜嗑 2024-08-19 20:19:37

编译的时候有没有指定-pg？

http://sourceware.org/binutils/docs-2.20/gprof/Compiling .html#Compiling

编译完成后，运行该程序，然后对二进制文件运行 gprof。

例如：

test.c：

#include <stdio.h>

int main ()
{
    int i;
    for (i = 0; i < 10000; i++) {
        printf ("%d\n", i);
    }
    return 0;
}

编译为cc -pg test.c，然后运行为a.out，然后gprof a.out，给出我

granularity: each sample hit covers 4 byte(s) for 1.47% of 0.03 seconds

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 45.6       0.02     0.02    10000     0.00     0.00  __sys_write [10]
 45.6       0.03     0.02        0  100.00%           .mcount (26)
  2.9       0.03     0.00    20000     0.00     0.00  __sfvwrite [6]
  1.5       0.03     0.00    20000     0.00     0.00  memchr [11]
  1.5       0.03     0.00    10000     0.00     0.00  __ultoa [12]
  1.5       0.03     0.00    10000     0.00     0.00  _swrite [9]
  1.5       0.03     0.00    10000     0.00     0.00  vfprintf [2]

你得到什么？

Did you specify -pg when compiling?

http://sourceware.org/binutils/docs-2.20/gprof/Compiling.html#Compiling

Once it is compiled, you run the program and then run gprof on the binary.

E.g.:

test.c:

#include <stdio.h>

int main ()
{
    int i;
    for (i = 0; i < 10000; i++) {
        printf ("%d\n", i);
    }
    return 0;
}

Compile as cc -pg test.c, then run as a.out, then gprof a.out, gives me

granularity: each sample hit covers 4 byte(s) for 1.47% of 0.03 seconds

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 45.6       0.02     0.02    10000     0.00     0.00  __sys_write [10]
 45.6       0.03     0.02        0  100.00%           .mcount (26)
  2.9       0.03     0.00    20000     0.00     0.00  __sfvwrite [6]
  1.5       0.03     0.00    20000     0.00     0.00  memchr [11]
  1.5       0.03     0.00    10000     0.00     0.00  __ultoa [12]
  1.5       0.03     0.00    10000     0.00     0.00  _swrite [9]
  1.5       0.03     0.00    10000     0.00     0.00  vfprintf [2]

What are you getting?

回复收藏 0 原文

猫九 2024-08-19 20:19:37

我尝试运行 Kinopiko 的示例，但我将迭代次数增加了 100 倍。

test.c:

#include <stdio.h>

int main ()
{
    int i;
    for (i = 0; i < 1000000; i++) {
        printf ("%d\n", i);
    }
    return 0;
}

然后我采用了 10 stackshots（在 VC 下，但您可以使用pstack）。下面是堆栈截图：

9 copies of this stack:
NTDLL! 7c90e514()
KERNEL32! 7c81cbfe()
KERNEL32! 7c81cc75()
KERNEL32! 7c81cc89()
_write() line 168 + 57 bytes
_flush() line 162 + 23 bytes
_ftbuf() line 171 + 9 bytes
printf() line 62 + 14 bytes
main() line 7 + 14 bytes
mainCRTStartup() line 206 + 25 bytes
KERNEL32! 7c817077()

1 copy of this stack:
KERNEL32! 7c81cb96()
KERNEL32! 7c81cc75()
KERNEL32! 7c81cc89()
_write() line 168 + 57 bytes
_flush() line 162 + 23 bytes
_ftbuf() line 171 + 9 bytes
printf() line 62 + 14 bytes
main() line 7 + 14 bytes
mainCRTStartup() line 206 + 25 bytes
KERNEL32! 7c817077()

如果不明显，这会告诉您：

mainCRTStartup() line 206 + 25 bytes Cost ~100% of the time
main() line 7 + 14 bytes             Cost ~100% of the time
printf() line 62 + 14 bytes          Cost ~100% of the time
_ftbuf() line 171 + 9 bytes          Cost ~100% of the time
_flush() line 162 + 23 bytes         Cost ~100% of the time
_write() line 168 + 57 bytes         Cost ~100% of the time

简而言之，程序花费大约 100% 的时间将输出缓冲区作为第 7 行 printf 的一部分刷新到磁盘（或控制台）。

（我所说的“一行成本”是指，在该行的请求上花费的总时间的比例，并且大致是包含该行的样本的比例。
如果可以使该行不花费任何时间，例如通过删除它、跳过它或将其工作传递给无限快的协处理器，则该时间分数就是总时间将缩短的量。因此，如果可以避免执行任何这些代码行，时间就会缩短 95% 到 100% 之间。如果您问“递归怎么样？”，答案是没有什么区别。）

现在，也许您想知道其他事情，例如时间是多少例如，在循环中花费。要找出这一点，请删除 printf，因为它一直在占用。也许您想知道纯粹花费在 CPU 时间上而不是系统调用上的时间百分比是多少。要实现这一点，只需丢弃所有未在代码中结束的堆栈截图即可。

我想说的是，如果您正在寻找可以修复的东西以使代码运行得更快，那么 gprof 为您提供的数据，即使您理解它，也几乎没有用处。相比之下，如果您的某些代码导致花费的挂钟时间超出您的预期，堆栈快照将查明它。

I tried running Kinopiko's example, except I increased the number of iterations by a factor of 100.

test.c:

#include <stdio.h>

int main ()
{
    int i;
    for (i = 0; i < 1000000; i++) {
        printf ("%d\n", i);
    }
    return 0;
}

Then I took 10 stackshots (under VC, but you can use pstack). Here are the stackshots:

9 copies of this stack:
NTDLL! 7c90e514()
KERNEL32! 7c81cbfe()
KERNEL32! 7c81cc75()
KERNEL32! 7c81cc89()
_write() line 168 + 57 bytes
_flush() line 162 + 23 bytes
_ftbuf() line 171 + 9 bytes
printf() line 62 + 14 bytes
main() line 7 + 14 bytes
mainCRTStartup() line 206 + 25 bytes
KERNEL32! 7c817077()

1 copy of this stack:
KERNEL32! 7c81cb96()
KERNEL32! 7c81cc75()
KERNEL32! 7c81cc89()
_write() line 168 + 57 bytes
_flush() line 162 + 23 bytes
_ftbuf() line 171 + 9 bytes
printf() line 62 + 14 bytes
main() line 7 + 14 bytes
mainCRTStartup() line 206 + 25 bytes
KERNEL32! 7c817077()

In case it isn't obvious, this tells you that:

mainCRTStartup() line 206 + 25 bytes Cost ~100% of the time
main() line 7 + 14 bytes             Cost ~100% of the time
printf() line 62 + 14 bytes          Cost ~100% of the time
_ftbuf() line 171 + 9 bytes          Cost ~100% of the time
_flush() line 162 + 23 bytes         Cost ~100% of the time
_write() line 168 + 57 bytes         Cost ~100% of the time

In a nutshell, the program spends ~100% of it's time flushing to disk (or console) the output buffer as part of the printf on line 7.

(What I mean by "Cost of a line" is - it is the fraction of total time spent at the request of that line, and that's roughly the fraction of samples that contain it.
If that line could be made to take no time, such as by removing it, skipping over it, or passing its work off to an infinitely fast coprocessor, that time fraction is how much the total time would shrink. So if the execution of any of these lines of code could be avoided, time would shrink by somewhere in the range of 95% to 100%. If you were to ask "What about recursion?", the answer is It Makes No Difference.)

Now, maybe you want to know something else, like how much time is spent in the loop, for example. To find that out, remove the printf because it's hogging all the time. Maybe you want to know what % of time is spent purely in CPU time, not in system calls. To get that, just throw away any stackshots that don't end in your code.

The point I'm trying to make is if you're looking for things you can fix to make the code run faster, the data gprof gives you, even if you understand it, is almost useless. By comparison, if there is some of your code that is causing more wall-clock time to be spent than you would like, stackshots will pinpoint it.

回复收藏 0 原文