python 有统计分析器吗?如果没有的话,我该如何去写一篇呢?

发布于 2024-10-31 06:00:24 字数 86 浏览 5 评论 0 原文

我需要运行一个 python 脚本一段随机的时间,暂停它,获取堆栈回溯,然后取消暂停它。我已经在谷歌上搜索了一种方法来做到这一点,但我没有看到明显的解决方案。

I would need to run a python script for some random amount of time, pause it, get a stack traceback, and unpause it. I've googled around for a way to do this, but I see no obvious solution.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

绅刃 2024-11-07 06:00:25

statprof 模块

pip install statprof (或 easy_install statprof),然后使用:

import statprof

statprof.start()
try:
    my_questionable_function()
finally:
    statprof.stop()
    statprof.display()

此博文

但这为什么重要呢? Python 已经有两个内置分析器:lsprof 和长期弃用的 hotshot。 lsprof 的问题在于它只跟踪函数调用。如果函数中有一些热循环,那么 lsprof 对于确定哪些热循环实际上是重要的几乎毫无价值。

几天前,我发现自己处于 lsprof 失败的情况:它告诉我我有一个热门函数,但该函数对我来说不熟悉,而且时间足够长,以至于不能立即明显地看出它在哪里问题是。

在 Twitter 和 Google+ 上苦苦哀求之后,有人向我指出了 statprof。但有一个问题:虽然它正在进行统计采样(耶!),但采样时它只跟踪函数的第一行(wtf!?)。所以我修复了这个问题,整理了文档,现在它既可用又不会误导。以下是其输出的示例,可以更准确地定位该热函数中的有问题的行:

 % 累积自我          
 时间 秒 秒 名称    
 68.75 0.14 0.14 scmutil.py:546:revrange
  6.25 0.01 0.01 cmdutil.py:1006:walkchangerevs
  6.25 0.01 0.01 revlog.py:241:__init__
  [……哈哈哈哈哈哈……]
  0.00 0.01 0.00 util.py:237:__get__
---
样本数量:16
总时间:0.200000秒

我已将 statprof 上传到 Python 包索引,因此安装几乎很简单:“easy_install statprof”即可启动并运行。

由于代码已发布在 github 上,欢迎贡献错误报告和改进。享受吧!

There's the statprof module

pip install statprof (or easy_install statprof), then to use:

import statprof

statprof.start()
try:
    my_questionable_function()
finally:
    statprof.stop()
    statprof.display()

There's a bit of background on the module from this blog post:

Why would this matter, though? Python already has two built-in profilers: lsprof and the long-deprecated hotshot. The trouble with lsprof is that it only tracks function calls. If you have a few hot loops within a function, lsprof is nearly worthless for figuring out which ones are actually important.

A few days ago, I found myself in exactly the situation in which lsprof fails: it was telling me that I had a hot function, but the function was unfamiliar to me, and long enough that it wasn’t immediately obvious where the problem was.

After a bit of begging on Twitter and Google+, someone pointed me at statprof. But there was a problem: although it was doing statistical sampling (yay!), it was only tracking the first line of a function when sampling (wtf!?). So I fixed that, spiffed up the documentation, and now it’s both usable and not misleading. Here’s an example of its output, locating the offending line in that hot function more accurately:

  %   cumulative      self          
 time    seconds   seconds  name    
 68.75      0.14      0.14  scmutil.py:546:revrange
  6.25      0.01      0.01  cmdutil.py:1006:walkchangerevs
  6.25      0.01      0.01  revlog.py:241:__init__
  [...blah blah blah...]
  0.00      0.01      0.00  util.py:237:__get__
---
Sample count: 16
Total time: 0.200000 seconds

I have uploaded statprof to the Python package index, so it’s almost trivial to install: "easy_install statprof" and you’re up and running.

Since the code is up on github, please feel welcome to contribute bug reports and improvements. Enjoy!

倾其所爱 2024-11-07 06:00:25

我可以想到几个几种方法来做到这一点:

I can think of a couple of few ways to do this:

  • Rather than trying to get a stack trace while the program is running, just fire an interrupt at it, and parse the output. You could do this with a shell script or with another python script that invokes your app as a subprocess. The basic idea is explained and rather thoroughly defended in this answer to a C++-specific question.

    • Actually, rather than having to parse the output, you could register a postmortem routine (using sys.excepthook) that logs the stack trace. Unfortunately, Python doesn't have any way to continue from the point at which an exception occurred, so you can't resume execution after logging.
  • In order to actually get a stack trace from a running program, you will may have to hack the implementation. So if you really want to do that, it may be worth your time to check out pypy, a Python implementation written mostly in Python. I've no idea how convenient it would be to do this in pypy. I'm guessing that it wouldn't be particularly convenient, since it would involve introducing a hook into basically every instruction, which would I think be prohibitively inefficient. Also, I don't think there will be much advantage over the first option, unless it takes a very long time to reach the state where you want to start doing stack traces.

  • There exists a set of macros for the gdb debugger intended to facilitate debugging Python itself. gdb can attach to an external process (in this case the instance of python which is executing your application) and do, well, pretty much anything with it. It seems that the macro pystack will get you a backtrace of the Python stack at the current point of execution. I think it would be pretty easy to automate this procedure, since you can (at worst) just feed text into gdb using expect or whatever.

如梦初醒的夏天 2024-11-07 06:00:25

Python 已经包含了执行您所描述的操作所需的一切,无需破解解释器。

您只需使用 traceback模块与 sys._current_frames()< /a> 函数。您所需要的只是一种以所需频率转储所需回溯的方法,例如使用 UNIX 信号或其他线程。

要快速启动您的代码,您可以按照此提交中的操作进行操作:

  1. 复制threads.py 来自该提交的模块,或者至少是堆栈跟踪转储功能(ZPL 许可证,非常自由):

  2. 将其连接到信号处理程序,例如 <代码>SIGUSR1

然后您只需运行代码并根据需要频繁地使用 SIGUSR1“杀死”它即可。

对于使用相同技术不时“采样”单个线程的单个函数,使用另一个线程进行计时的情况,我建议剖析 Products.LongRequestLogger 及其测试(由您真正开发,同时使用 Nexedi):

无论这是否是正确的“统计”分析,回答Mike Dunlavey引用intuited 提出了一个令人信服的论点,即这是一种非常强大的“性能调试”技术,而且我个人的经验表明,它确实有助于快速找出性能问题的真正原因。

Python already contains everything you need to do what you described, no need to hack the interpreter.

You just have to use the traceback module in conjunction with the sys._current_frames() function. All you need is a way to dump the tracebacks you need at the frequency you want, for example using UNIX signals, or another thread.

To jump-start your code, you can do exactly what is done in this commit:

  1. Copy the threads.py module from that commit, or at least the stack trace dumping function (ZPL license, very liberal):

  2. Hook it up to a signal handler, say, SIGUSR1

Then you just need to run your code and "kill" it with SIGUSR1 as frequently as you need.

For the case where a single function of a single thread is "sampled" from time to time with the same technique, using another thread for timing, I suggest dissecting the code of Products.LongRequestLogger and its tests (developed by yours truly, while under the employ of Nexedi):

Whether or not this is proper "statistical" profiling, the answer by Mike Dunlavey referenced by intuited makes a compelling argument that this is a very powerful "performance debugging" technique, and I have personal experience that it really helps zoom in quickly on the real causes of performance issues.

烈酒灼喉 2024-11-07 06:00:25

要为 Python 实现外部统计分析器,您将需要一些通用调试工具来询问另一个进程,以及一些 Python 特定工具来获取解释器状态。

一般来说,这不是一个简单的问题,但您可能想尝试从 GDB 7 和相关的 CPython 分析工具开始。

To implement an external statistical profiler for Python, you're going to need some general debugging tools that let you interrogate another process, as well as some Python specific tools to get a hold of the interpreter state.

That's not an easy problem in general, but you may want to try starting with GDB 7 and the associated CPython analysis tools.

左秋 2024-11-07 06:00:25

有一个用 C 语言编写的跨平台采样(统计)Python 分析器,名为 vmprof-python
它由 PyPy 团队成员开发,支持 PyPy 以及 CPython。
它适用于 Linux、Mac OSX 和 Windows。它是用C编写的,因此开销非常小。
它分析 Python 代码以及由 Python 代码进行的本机调用。
此外,它还有一个非常有用的选项,除了函数名称之外,还可以收集有关函数内部执行行的统计信息。
它还可以分析内存使用情况(通过跟踪堆大小)。

可以通过 API 从 Python 代码或控制台调用它。
有一个 Web UI 用于查看配置文件转储:vmprof.com,它也是 开源

此外,一些 Python IDE(例如 PyCharm)与其集成,允许运行探查器并在编辑器中查看结果。

There is a cross-platform sampling(statistical) Python profiler written in C called vmprof-python.
Developed by the members of PyPy team, it supports PyPy as well as CPython.
It works on Linux, Mac OSX, and Windows. It is written in C, thus has a very small overhead.
It profiles Python code as well as native calls made from Python code.
Also, it has a very useful option to collect statistics about execution lines inside functions in addition to function names.
It can also profile memory usage (by tracing the heap size).

It can be called from the Python code via API or from the console.
There is a Web UI to view the profile dumps: vmprof.com, which is also open sourced.

Also, some Python IDEs (for example PyCharm) have integration with it, allowing to run the profiler and see the results in the editor.

冷情妓 2024-11-07 06:00:25

在这个问题提出七年后,现在有几个很好的可用于 Python 的统计分析器。除了 vmprof 之外,Dmitry Trofimov这个答案,还有vprofpyflame。它们都以某种方式支持火焰图,让您可以很好地了解时间花在哪里。

Seven years after the question was asked, there are now several good statistical profilers available for Python. In addition to vmprof, already mentioned by Dmitry Trofimov in this answer, there are also vprof and pyflame. All of them support flame graphs one way or another, giving you a nice overview of where the time was spent.

虚拟世界 2024-11-07 06:00:25

Austin 是 CPython 的帧堆栈采样器,可用于为 Python 制作不需要仪器的统计分析器并引入最小的开销。最简单的做法是使用 FlameGraph 通过管道传输 Austin 的输出。但是,您可以使用自定义应用程序获取 Austin 的输出,以制作您自己的分析器,该分析器正是针对您的需求。

这是 Austin TUI 的屏幕截图,这是一个终端应用程序,它提供了正在运行的 Python 应用程序内发生的所有事情的顶部视图。
奥斯汀·图伊

这是 Web Austin,一个 Web 应用程序,可以向您显示所收集样本的实时火焰图。您可以配置为应用程序提供服务的地址,然后您可以进行远程分析。

输入图像此处描述

Austin is a frame stack sampler for CPython that can be used to make statistical profilers for Python that require no instrumentation and introduce minimal overhead. The simplest thing to do is to pipe the output of Austin with FlameGraph. However, you can just grab Austin's output with a custom application to make your very own profiler that is targeted at precisely your needs.

This is a screenshot of Austin TUI, a terminal application that provides a top-like view of everything that is happening inside a running Python application.
Austin TUI

This is Web Austin, a web application that shows you a live flame graph of the collected samples. You can configure the address where to serve the application which then allows you to do remote profiling.

enter image description here

鹿港小镇 2024-11-07 06:00:25

对于 Python,有 py-spy 来转储堆栈跟踪。转储可以通过 speedscope 进行分析

来源:指南

For Python there is py-spy to dump the stacktraces. The dumps can get analyzed by speedscope

Source: Guidelines

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文