基于 Linux 时间样本的分析器

发布于 2024-08-25 09:13:15 字数 1069 浏览 6 评论 0原文

简短版本：

是否有适用于 Linux 的良好的基于时间的采样分析器？

长版：

我通常使用OProfile来优化我的应用程序。最近我发现了一个令我困惑的缺点。

问题是一个紧密的循环，产生了 c++filt 来分解 c++ 名称。我只是在寻找另一个瓶颈时偶然发现了这段代码。 OProfile 没有显示代码中的任何异常情况，因此我几乎忽略了它，但我的代码感觉告诉我优化调用并查看发生了什么。我将 c++filt 的 popen 更改为 abi::__cxa_demangle。运行时间从一分多钟缩短到一秒多一点。大约加速 60 倍。

有没有办法配置 OProfile 来标记 popen 调用？由于现在的配置文件数据，OProfile 认为瓶颈是堆和 std::string 调用（顺便说一句，它曾经优化过，将运行时间降低到不到一秒，速度加快了 2 倍以上）。

这是我的 OProfile 配置：

$ sudo opcontrol --status
Daemon not running
Event 0: CPU_CLK_UNHALTED:90000:0:1:1
Separate options: library
vmlinux file: none
Image filter: /path/to/executable
Call-graph depth: 7
Buffer size: 65536

是否有另一个 Linux 分析器可以找到瓶颈？

我怀疑问题在于 OProfile 仅将其样本记录到当前正在运行的进程。我希望它始终将其样本记录到我正在分析的进程中。因此，如果进程当前被切换（阻塞 IO 或 popen 调用），OProfile 只会将其样本放置在阻塞的调用处。

如果我无法解决此问题，则 OProfile 仅当可执行文件的 CPU 使用率接近 100% 时才有用。它对具有低效阻塞调用的可执行文件没有帮助。

原文

short version:

Is there a good time based sampling profiler for Linux?

long version:

I generally use OProfile to optimize my applications. I recently found a shortcoming that has me wondering.

The problem was a tight loop, spawning c++filt to demangle a c++ name. I only stumbled upon the code by accident while chasing down another bottleneck. The OProfile didn't show anything unusual about the code so I almost ignored it but my code sense told me to optimize the call and see what happened. I changed the popen of c++filt to abi::__cxa_demangle. The runtime went from more than a minute to a little over a second. About a x60 speed up.

Is there a way I could have configured OProfile to flag the popen call? As the profile data sits now OProfile thinks the bottle neck was the heap and std::string calls (which BTW once optimized dropped the runtime to less than a second, more than x2 speed up).

Here is my OProfile configuration:

$ sudo opcontrol --status
Daemon not running
Event 0: CPU_CLK_UNHALTED:90000:0:1:1
Separate options: library
vmlinux file: none
Image filter: /path/to/executable
Call-graph depth: 7
Buffer size: 65536

Is there another profiler for Linux that could have found the bottleneck?

I suspect the issue is that OProfile only logs its samples to the currently running process. I'd like it to always log its samples to the process I'm profiling. So if the process is currently switched out (blocking on IO or a popen call) OProfile would just place its sample at the blocked call.

If I can't fix this, OProfile will only be useful when the executable is pushing near 100% CPU. It can't help with executables that have inefficient blocking calls.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

挖鼻大婶 2024-09-01 09:13:15

很高兴你问了。我相信 OProfile 可以做我认为正确的事情，即在程序运行缓慢时按挂钟时间获取堆栈样本，并且如果它不允许您检查单个堆栈样本，至少总结样本中出现的每一行代码，该行出现的样本的百分比。这是对如果该行不存在时将保存的内容的直接衡量。这里有一个讨论。这是另一个，和另一个。而且，正如 Paul 所说，Zoom 应该这么做。

如果您的时间从 60 秒变为 1 秒，则意味着每个堆栈样本都有 59/60 的概率向您显示问题。

回复收藏 0 原文

眼泪淡了忧伤 2024-09-01 09:13:15

尝试 Zoom - 我相信它可以让您分析所有进程 - 知道它是否突出您的问题会很有趣在这种情况下。

回复收藏 0 原文

还给你自由 2024-09-01 09:13:15

在尝试了这里建议的所有内容（除了现已不复存在的 Zoom 之外，它仍然可以从 dropbox 中以大文件形式获得），我发现没有什么可以按照 Dunlavey 先生的建议进行。上面一些答案中列出的“快速技巧”不适合我，或者也不适合我。花了一整天的时间尝试一些东西......在一个受 I/O 限制的简单测试程序中，没有任何东西可以找到 fseek 作为热点。

因此，我编写了另一个基于 GDB 的分析器，这次没有构建依赖项，因此它应该“适用于”几乎所有可调试代码。单个 CPP 文件。

https://github.com/jasonrohrer/wallClockProfiler

它自动执行 Dunlavey 先生建议的手动过程，定期使用 GDB 中断目标进程并收集堆栈跟踪，然后在最后打印一份有关哪些堆栈跟踪最常见的报告。这些是您真正的挂钟热点。它确实有效。

回复收藏 0 原文