基于 Linux 时间样本的分析器
简短版本:
是否有适用于 Linux 的良好的基于时间的采样分析器?
长版:
我通常使用OProfile来优化我的应用程序。最近我发现了一个令我困惑的缺点。
问题是一个紧密的循环,产生了 c++filt 来分解 c++ 名称。我只是在寻找另一个瓶颈时偶然发现了这段代码。 OProfile 没有显示代码中的任何异常情况,因此我几乎忽略了它,但我的代码感觉告诉我优化调用并查看发生了什么。我将 c++filt 的 popen
更改为 abi::__cxa_demangle
。运行时间从一分多钟缩短到一秒多一点。大约加速 60 倍。
有没有办法配置 OProfile 来标记 popen
调用?由于现在的配置文件数据,OProfile 认为瓶颈是堆和 std::string 调用(顺便说一句,它曾经优化过,将运行时间降低到不到一秒,速度加快了 2 倍以上)。
这是我的 OProfile 配置:
$ sudo opcontrol --status
Daemon not running
Event 0: CPU_CLK_UNHALTED:90000:0:1:1
Separate options: library
vmlinux file: none
Image filter: /path/to/executable
Call-graph depth: 7
Buffer size: 65536
是否有另一个 Linux 分析器可以找到瓶颈?
我怀疑问题在于 OProfile 仅将其样本记录到当前正在运行的进程。我希望它始终将其样本记录到我正在分析的进程中。因此,如果进程当前被切换(阻塞 IO 或 popen 调用),OProfile 只会将其样本放置在阻塞的调用处。
如果我无法解决此问题,则 OProfile 仅当可执行文件的 CPU 使用率接近 100% 时才有用。它对具有低效阻塞调用的可执行文件没有帮助。
short version:
Is there a good time based sampling profiler for Linux?
long version:
I generally use OProfile to optimize my applications. I recently found a shortcoming that has me wondering.
The problem was a tight loop, spawning c++filt to demangle a c++ name. I only stumbled upon the code by accident while chasing down another bottleneck. The OProfile didn't show anything unusual about the code so I almost ignored it but my code sense told me to optimize the call and see what happened. I changed the popen
of c++filt to abi::__cxa_demangle
. The runtime went from more than a minute to a little over a second. About a x60 speed up.
Is there a way I could have configured OProfile to flag the popen
call? As the profile data sits now OProfile thinks the bottle neck was the heap and std::string
calls (which BTW once optimized dropped the runtime to less than a second, more than x2 speed up).
Here is my OProfile configuration:
$ sudo opcontrol --status
Daemon not running
Event 0: CPU_CLK_UNHALTED:90000:0:1:1
Separate options: library
vmlinux file: none
Image filter: /path/to/executable
Call-graph depth: 7
Buffer size: 65536
Is there another profiler for Linux that could have found the bottleneck?
I suspect the issue is that OProfile only logs its samples to the currently running process. I'd like it to always log its samples to the process I'm profiling. So if the process is currently switched out (blocking on IO or a popen
call) OProfile would just place its sample at the blocked call.
If I can't fix this, OProfile will only be useful when the executable is pushing near 100% CPU. It can't help with executables that have inefficient blocking calls.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
很高兴你问了。我相信 OProfile 可以做我认为正确的事情,即在程序运行缓慢时按挂钟时间获取堆栈样本,并且如果它不允许您检查单个堆栈样本,至少总结样本中出现的每一行代码,该行出现的样本的百分比。这是对如果该行不存在时将保存的内容的直接衡量。 这里有一个讨论。 这是另一个, 和另一个。而且,正如 Paul 所说,Zoom 应该这么做。
如果您的时间从 60 秒变为 1 秒,则意味着每个堆栈样本都有 59/60 的概率向您显示问题。
Glad you asked. I believe OProfile can be made to do what I consider the right thing, which is to take stack samples on wall-clock time when the program is being slow and, if it won't let you examine individual stack samples, at least summarize for each line of code that appears on samples, the percent of samples the line appears on. That is a direct measure of what would be saved if that line were not there. Here's one discussion. Here's another, and another. And, as Paul said, Zoom should do it.
If your time went from 60 sec to 1 sec, that implies every single stack sample would have had a 59/60 probability of showing you the problem.
尝试 Zoom - 我相信它可以让您分析所有进程 - 知道它是否突出您的问题会很有趣在这种情况下。
Try Zoom - I believe it will let you profile all processes - it would be interesting to know if it highlights your problem in this case.
在尝试了这里建议的所有内容(除了现已不复存在的 Zoom 之外,它仍然可以从 dropbox 中以大文件形式获得),我发现没有什么可以按照 Dunlavey 先生的建议进行。上面一些答案中列出的“快速技巧”不适合我,或者也不适合我。花了一整天的时间尝试一些东西......在一个受 I/O 限制的简单测试程序中,没有任何东西可以找到 fseek 作为热点。
因此,我编写了另一个基于 GDB 的分析器,这次没有构建依赖项,因此它应该“适用于”几乎所有可调试代码。单个 CPP 文件。
https://github.com/jasonrohrer/wallClockProfiler
它自动执行 Dunlavey 先生建议的手动过程,定期使用 GDB 中断目标进程并收集堆栈跟踪,然后在最后打印一份有关哪些堆栈跟踪最常见的报告。这些是您真正的挂钟热点。它确实有效。
After trying everything suggested here (except for the now-defunct Zoom, which is still available as huge file from dropbox), I found that NOTHING does what Mr. Dunlavey recommends. The "quick hacks" listed above in some of the answers wouldn't build for me, or didn't work for me either. Spent all day trying stuff... and nothing could find fseek as a hotspot in an otherwise simple test program that was I/O bound.
So I coded up yet another profiler, this time with no build dependencies, based on GDB, so it should "just work" for almost any debuggable code. A single CPP file.
https://github.com/jasonrohrer/wallClockProfiler
It automates the manual process suggested by Mr. Dunlavey, interrupting the target process with GDB periodically and harvesting a stack trace, and then printing a report at the end about which stack traces are the most common. Those are your true wall-clock hotspots. And it actually works.
我很久以前就写了这个,只是因为我找不到更好的东西: https://github.com/dicej /profile
我也刚刚发现了这个,尽管我还没有尝试过: https:// github.com/oliver/ptrace-sampler
I wrote this a long time ago, only because I couldn't find anything better: https://github.com/dicej/profile
I just found this, too, though I haven't tried it: https://github.com/oliver/ptrace-sampler
快速破解 Linux 的简单采样分析器:http://vi-server.org/vi/simple_sampling_profiler。它将
backtrace(3)
附加到SIGUSR1
上的文件,然后将其转换为带注释的源代码。Quickly hacked up trivial sampling profiler for linux: http://vi-server.org/vi/simple_sampling_profiler.html
It appends
backtrace(3)
to a file onSIGUSR1
, and then converts it to annotated source.