使用 callgrind 作为采样分析器?
我一直在寻找 Linux 采样分析器,callgrind 已经出现了最接近显示有用的结果。然而,开销估计比正常情况慢 20--100 倍。此外,我只对每个函数花费的时间感兴趣(特别强调阻塞调用,例如 read()
和 write()
,其他分析器都不会忠实地显示这些调用)。
- 有没有办法关闭多余的选项,以便只记录最少的数据来生成各种调用堆栈中花费的时间?
- callgrind 的 Cachegrind 传统是否意味着在缓存分析等方面做了过多的工作?
- 我认为 callgrind 的操作就像一个调试器。是否可以调整以每隔一段时间而不是每条指令对过程进行采样?
I've been searching for a Linux sampling profiler, and callgrind has come the closest to showing useful results. However the overhead is estimated at 20--100x slower than normal. Additionally, I'm only interested in time spent per function (with particular emphasis on blocking calls such as read()
and write()
, which no other profiler will faithfully display).
- Is there a way to turn off excess options, so that just the minimum data is recorded for generating times spent in various call stacks?
- Does callgrind's cachegrind heritage imply that excess stuff is being done with regards to cache profiling etc?
- I assume callgrind operates like a debugger. Can this be adjusted to sample the process at intervals, rather than every single instruction?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
3) Callgrind 的工作方式类似于动态翻译器,它用计数仪器代码来检测原始代码。对代码中的每个内存访问指令(用于缓存模拟)以及(我建议)每个类似 jmp 的指令进行检测以跟踪 exec。每个基本块的计数。
我有一个小型采样分析器,它的作用就像调试器一样;它确实将
setitimer()
分析计数器注入到应用程序中,然后拦截所有 SIGALRM 并打印当前的$eip
值。之前有一些使用
setitimer
方法的采样分析器,还有一个profil()
之类的东西。这是由glibc/gmon/gmon.c
和gprof -p
使用的(确切地说,是由gcc -pg
)使用的。profil()
函数能够通过每 1 或 10 毫秒采样一次虚拟 CPU 时间来分析单个连续代码片段。还有sprofil()
函数。另请检查 LD_PRELOAD=/lib/libpcprofile.so PCPROFILE_OUTPUT=output.file - 但我不知道它是否工作或如何工作
对于编号问题:
2)“Callgrind 是 Cachegrind 的扩展。它提供了 Cachegrind 的所有信息确实如此,再加上有关书法的额外信息。” - 因此它可以提供cachegrind中的任何内容,而且还允许用户关闭缓存模拟:
--simulate-cache=no
(这是默认值)对于速度:根据< a href="http://www.valgrind.org/docs/manual/nl-manual.html" rel="nofollow">http://www.valgrind.org/docs/manual/nl-manual.html< /a> - Nul valgrind 工具(又名 nulgrind)的手册,它没有额外的仪器,速度减慢了 5 倍。这是因为程序是由 valgrind 本身动态翻译的。所以,valgrind 不可能有比 nulgrind 更快的工具。
3) Callgrind is working like dynamic translator, which instruments orginal code with counting instrument code. Instrumenting is done for each memory access instruction in the code (for cache simulation), and (i suggest) for each jmp-like instruction to track exec. count of every basic block.
I have a small sampling profiler, which acts just like debugger; It does inject a
setitimer()
profiling counter into the application and then it does intercept all SIGALRM and prints current$eip
value.There were some sampling profilers with
setitimer
approach earlier, also there is aprofil()
for something like. This is used byglibc/gmon/gmon.c
andgprof -p
(to be exact, bygcc -pg
).profil()
function is able to profile single contonous code fragment with sampling a virtual cpu time each 1 or 10 millisecond. There is alsosprofil()
function.Check also LD_PRELOAD=/lib/libpcprofile.so PCPROFILE_OUTPUT=output.file - but I don't know does it work or how it work
For numbered questions:
2) "Callgrind is an extension to Cachegrind. It provides all the information that Cachegrind does, plus extra information about callgraphs." - So it can provide any stuff that is in cachegrind, but also it allow user to turn off cache simulation:
--simulate-cache=no
(it is the default value)For speed: According to http://www.valgrind.org/docs/manual/nl-manual.html - manual of Nul valgrind tool (aka nulgrind), which does no additional instrumentation, slowdown is 5 times. It is because program is dynamically translated by valgrind itself. So, there can be no tool for valgrind, which can work faster then nulgrind.
您尝试过 gprof 吗?它不像 valgrind 那样有很大的开销。
Have you tried gprof ? It does not have the big overhead as valgrind do.
尝试使用 RotateRight 中的 Zoom。它有一个“线程时间”配置,可以对单个进程中的所有线程进行采样,无论它们是运行还是阻塞。
Try using Zoom from RotateRight. It has a "Thread Time" configuration that samples all threads in a single process whether they are running or blocked.