基于 Linux 时间样本的分析器

发布于 2024-08-25 09:13:15 字数 1069 浏览 3 评论 0原文

简短版本:

是否有适用于 Linux 的良好的基于​​时间的采样分析器?

长版:

我通常使用OProfile来优化我的应用程序。最近我发现了一个令我困惑的缺点。

问题是一个紧密的循环,产生了 c++filt 来分解 c++ 名称。我只是在寻找另一个瓶颈时偶然发现了这段代码。 OProfile 没有显示代码中的任何异常情况,因此我几乎忽略了它,但我的代码感觉告诉我优化调用并查看发生了什么。我将 c++filt 的 popen 更改为 abi::__cxa_demangle。运行时间从一分多钟缩短到一秒多一点。大约加速 60 倍。

有没有办法配置 OProfile 来标记 popen 调用?由于现在的配置文件数据,OProfile 认为瓶颈是堆和 std::string 调用(顺便说一句,它曾经优化过,将运行时间降低到不到一秒,速度加快了 2 倍以上)。

这是我的 OProfile 配置:

$ sudo opcontrol --status
Daemon not running
Event 0: CPU_CLK_UNHALTED:90000:0:1:1
Separate options: library
vmlinux file: none
Image filter: /path/to/executable
Call-graph depth: 7
Buffer size: 65536

是否有另一个 Linux 分析器可以找到瓶颈?

我怀疑问题在于 OProfile 仅将其样本记录到当前正在运行的进程。我希望它始终将其样本记录到我正在分析的进程中。因此,如果进程当前被切换(阻塞 IO 或 popen 调用),OProfile 只会将其样本放置在阻塞的调用处。

如果我无法解决此问题,则 OProfile 仅当可执行文件的 CPU 使用率接近 100% 时才有用。它对具有低效阻塞调用的可执行文件没有帮助。

short version:

Is there a good time based sampling profiler for Linux?

long version:

I generally use OProfile to optimize my applications. I recently found a shortcoming that has me wondering.

The problem was a tight loop, spawning c++filt to demangle a c++ name. I only stumbled upon the code by accident while chasing down another bottleneck. The OProfile didn't show anything unusual about the code so I almost ignored it but my code sense told me to optimize the call and see what happened. I changed the popen of c++filt to abi::__cxa_demangle. The runtime went from more than a minute to a little over a second. About a x60 speed up.

Is there a way I could have configured OProfile to flag the popen call? As the profile data sits now OProfile thinks the bottle neck was the heap and std::string calls (which BTW once optimized dropped the runtime to less than a second, more than x2 speed up).

Here is my OProfile configuration:

$ sudo opcontrol --status
Daemon not running
Event 0: CPU_CLK_UNHALTED:90000:0:1:1
Separate options: library
vmlinux file: none
Image filter: /path/to/executable
Call-graph depth: 7
Buffer size: 65536

Is there another profiler for Linux that could have found the bottleneck?

I suspect the issue is that OProfile only logs its samples to the currently running process. I'd like it to always log its samples to the process I'm profiling. So if the process is currently switched out (blocking on IO or a popen call) OProfile would just place its sample at the blocked call.

If I can't fix this, OProfile will only be useful when the executable is pushing near 100% CPU. It can't help with executables that have inefficient blocking calls.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

挖鼻大婶 2024-09-01 09:13:15

很高兴你问了。我相信 OProfile 可以做我认为正确的事情,即在程序运行缓慢时按挂钟时间获取堆栈样本,并且如果它不允许您检查单个堆栈样本,至少总结样本中出现的每一行代码,该行出现的样本的百分比。这是对如果该行不存在时将保存的内容的直接衡量。 这里有一个讨论。 这是另一个,另一个。而且,正如 Paul 所说,Zoom 应该这么做。

如果您的时间从 60 秒变为 1 秒,则意味着每个堆栈样本都有 59/60 的概率向您显示问题。

Glad you asked. I believe OProfile can be made to do what I consider the right thing, which is to take stack samples on wall-clock time when the program is being slow and, if it won't let you examine individual stack samples, at least summarize for each line of code that appears on samples, the percent of samples the line appears on. That is a direct measure of what would be saved if that line were not there. Here's one discussion. Here's another, and another. And, as Paul said, Zoom should do it.

If your time went from 60 sec to 1 sec, that implies every single stack sample would have had a 59/60 probability of showing you the problem.

眼泪淡了忧伤 2024-09-01 09:13:15

尝试 Zoom - 我相信它可以让您分析所有进程 - 知道它是否突出您的问题会很有趣在这种情况下。

Try Zoom - I believe it will let you profile all processes - it would be interesting to know if it highlights your problem in this case.

还给你自由 2024-09-01 09:13:15

在尝试了这里建议的所有内容(除了现已不复存在的 Zoom 之外,它仍然可以从 dropbox 中以大文件形式获得),我发现没有什么可以按照 Dunlavey 先生的建议进行。上面一些答案中列出的“快速技巧”不适合我,或者也不适合我。花了一整天的时间尝试一些东西......在一个受 I/O 限制的简单测试程序中,没有任何东西可以找到 fseek 作为热点。

因此,我编写了另一个基于 GDB 的分析器,这次没有构建依赖项,因此它应该“适用于”几乎所有可调试代码。单个 CPP 文件。

https://github.com/jasonrohrer/wallClockProfiler

它自动执行 Dunlavey 先生建议的手动过程,定期使用 GDB 中断目标进程并收集堆栈跟踪,然后在最后打印一份有关哪些堆栈跟踪最常见的报告。这些是您真正的挂钟热点。它确实有效。

After trying everything suggested here (except for the now-defunct Zoom, which is still available as huge file from dropbox), I found that NOTHING does what Mr. Dunlavey recommends. The "quick hacks" listed above in some of the answers wouldn't build for me, or didn't work for me either. Spent all day trying stuff... and nothing could find fseek as a hotspot in an otherwise simple test program that was I/O bound.

So I coded up yet another profiler, this time with no build dependencies, based on GDB, so it should "just work" for almost any debuggable code. A single CPP file.

https://github.com/jasonrohrer/wallClockProfiler

It automates the manual process suggested by Mr. Dunlavey, interrupting the target process with GDB periodically and harvesting a stack trace, and then printing a report at the end about which stack traces are the most common. Those are your true wall-clock hotspots. And it actually works.

终止放荡 2024-09-01 09:13:15

我很久以前就写了这个,只是因为我找不到更好的东西: https://github.com/dicej /profile

我也刚刚发现了这个,尽管我还没有尝试过: https:// github.com/oliver/ptrace-sampler

I wrote this a long time ago, only because I couldn't find anything better: https://github.com/dicej/profile

I just found this, too, though I haven't tried it: https://github.com/oliver/ptrace-sampler

一口甜 2024-09-01 09:13:15

快速破解 Linux 的简单采样分析器:http://vi-server.org/vi/simple_sampling_profiler。它将

backtrace(3) 附加到 SIGUSR1 上的文件,然后将其转换为带注释的源代码。

Quickly hacked up trivial sampling profiler for linux: http://vi-server.org/vi/simple_sampling_profiler.html

It appends backtrace(3) to a file on SIGUSR1, and then converts it to annotated source.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文