python 有统计分析器吗?如果没有的话,我该如何去写一篇呢?
我需要运行一个 python 脚本一段随机的时间,暂停它,获取堆栈回溯,然后取消暂停它。我已经在谷歌上搜索了一种方法来做到这一点,但我没有看到明显的解决方案。
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
我需要运行一个 python 脚本一段随机的时间,暂停它,获取堆栈回溯,然后取消暂停它。我已经在谷歌上搜索了一种方法来做到这一点,但我没有看到明显的解决方案。
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
接受
或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
发布评论
评论(8)
有
statprof
模块pip install statprof
(或easy_install statprof
),然后使用:此博文:
There's the
statprof
modulepip install statprof
(oreasy_install statprof
), then to use:There's a bit of background on the module from this blog post:
我可以想到
几个几种方法来做到这一点:不要在程序运行时尝试获取堆栈跟踪,而是对其触发中断,然后解析输出。您可以使用 shell 脚本或另一个将您的应用程序作为子进程调用的 python 脚本来执行此操作。基本思想在 这个针对 C++ 特定问题的答案。
sys.excepthook
) 记录堆栈跟踪。不幸的是,Python 没有任何方法从发生异常的地方继续执行,因此您无法在记录后恢复执行。为了真正从正在运行的程序中获取堆栈跟踪,您
将可能必须破解实现。因此,如果您确实想这样做,那么可能值得您花时间查看 pypy,一个主要用 Python 编写的 Python 实现。我不知道在 pypy 中执行此操作有多方便。我猜这不会特别方便,因为它会涉及到在基本上每条指令中引入一个钩子,我认为这效率极低。另外,我认为与第一个选项相比不会有太大优势,除非需要很长时间才能达到您想要开始执行堆栈跟踪的状态。gdb
调试器存在一组宏旨在方便调试 Python 本身。 gdb 可以附加到外部进程(在本例中是正在执行应用程序的 python 实例),并用它做几乎任何事情。似乎宏 pystack 将为您提供当前执行点的 Python 堆栈的回溯。我认为自动化这个过程非常容易,因为你可以(在最坏的情况下)使用expect
或其他方式将文本输入到gdb
中。I can think of a
couple offew ways to do this:Rather than trying to get a stack trace while the program is running, just fire an interrupt at it, and parse the output. You could do this with a shell script or with another python script that invokes your app as a subprocess. The basic idea is explained and rather thoroughly defended in this answer to a C++-specific question.
sys.excepthook
) that logs the stack trace. Unfortunately, Python doesn't have any way to continue from the point at which an exception occurred, so you can't resume execution after logging.In order to actually get a stack trace from a running program, you
willmay have to hack the implementation. So if you really want to do that, it may be worth your time to check out pypy, a Python implementation written mostly in Python. I've no idea how convenient it would be to do this in pypy. I'm guessing that it wouldn't be particularly convenient, since it would involve introducing a hook into basically every instruction, which would I think be prohibitively inefficient. Also, I don't think there will be much advantage over the first option, unless it takes a very long time to reach the state where you want to start doing stack traces.There exists a set of macros for the
gdb
debugger intended to facilitate debugging Python itself. gdb can attach to an external process (in this case the instance of python which is executing your application) and do, well, pretty much anything with it. It seems that the macropystack
will get you a backtrace of the Python stack at the current point of execution. I think it would be pretty easy to automate this procedure, since you can (at worst) just feed text intogdb
usingexpect
or whatever.Python 已经包含了执行您所描述的操作所需的一切,无需破解解释器。
您只需使用
traceback
模块与sys._current_frames()
< /a> 函数。您所需要的只是一种以所需频率转储所需回溯的方法,例如使用 UNIX 信号或其他线程。要快速启动您的代码,您可以按照此提交中的操作进行操作:
复制
threads.py
来自该提交的模块,或者至少是堆栈跟踪转储功能(ZPL 许可证,非常自由):将其连接到信号处理程序,例如 <代码>SIGUSR1
然后您只需运行代码并根据需要频繁地使用 SIGUSR1“杀死”它即可。
对于使用相同技术不时“采样”单个线程的单个函数,使用另一个线程进行计时的情况,我建议剖析 Products.LongRequestLogger 及其测试(由您真正开发,同时使用 Nexedi):
无论这是否是正确的“统计”分析,回答由Mike Dunlavey引用intuited 提出了一个令人信服的论点,即这是一种非常强大的“性能调试”技术,而且我个人的经验表明,它确实有助于快速找出性能问题的真正原因。
Python already contains everything you need to do what you described, no need to hack the interpreter.
You just have to use the
traceback
module in conjunction with thesys._current_frames()
function. All you need is a way to dump the tracebacks you need at the frequency you want, for example using UNIX signals, or another thread.To jump-start your code, you can do exactly what is done in this commit:
Copy the
threads.py
module from that commit, or at least the stack trace dumping function (ZPL license, very liberal):Hook it up to a signal handler, say,
SIGUSR1
Then you just need to run your code and "kill" it with SIGUSR1 as frequently as you need.
For the case where a single function of a single thread is "sampled" from time to time with the same technique, using another thread for timing, I suggest dissecting the code of Products.LongRequestLogger and its tests (developed by yours truly, while under the employ of Nexedi):
Whether or not this is proper "statistical" profiling, the answer by Mike Dunlavey referenced by intuited makes a compelling argument that this is a very powerful "performance debugging" technique, and I have personal experience that it really helps zoom in quickly on the real causes of performance issues.
要为 Python 实现外部统计分析器,您将需要一些通用调试工具来询问另一个进程,以及一些 Python 特定工具来获取解释器状态。
一般来说,这不是一个简单的问题,但您可能想尝试从 GDB 7 和相关的 CPython 分析工具开始。
To implement an external statistical profiler for Python, you're going to need some general debugging tools that let you interrogate another process, as well as some Python specific tools to get a hold of the interpreter state.
That's not an easy problem in general, but you may want to try starting with GDB 7 and the associated CPython analysis tools.
有一个用 C 语言编写的跨平台采样(统计)Python 分析器,名为 vmprof-python。
它由 PyPy 团队成员开发,支持 PyPy 以及 CPython。
它适用于 Linux、Mac OSX 和 Windows。它是用C编写的,因此开销非常小。
它分析 Python 代码以及由 Python 代码进行的本机调用。
此外,它还有一个非常有用的选项,除了函数名称之外,还可以收集有关函数内部执行行的统计信息。
它还可以分析内存使用情况(通过跟踪堆大小)。
可以通过 API 从 Python 代码或控制台调用它。
有一个 Web UI 用于查看配置文件转储:vmprof.com,它也是 开源。
此外,一些 Python IDE(例如 PyCharm)与其集成,允许运行探查器并在编辑器中查看结果。
There is a cross-platform sampling(statistical) Python profiler written in C called vmprof-python.
Developed by the members of PyPy team, it supports PyPy as well as CPython.
It works on Linux, Mac OSX, and Windows. It is written in C, thus has a very small overhead.
It profiles Python code as well as native calls made from Python code.
Also, it has a very useful option to collect statistics about execution lines inside functions in addition to function names.
It can also profile memory usage (by tracing the heap size).
It can be called from the Python code via API or from the console.
There is a Web UI to view the profile dumps: vmprof.com, which is also open sourced.
Also, some Python IDEs (for example PyCharm) have integration with it, allowing to run the profiler and see the results in the editor.
在这个问题提出七年后,现在有几个很好的可用于 Python 的统计分析器。除了 vmprof 之外,Dmitry Trofimov 在 这个答案,还有vprof和pyflame。它们都以某种方式支持火焰图,让您可以很好地了解时间花在哪里。
Seven years after the question was asked, there are now several good statistical profilers available for Python. In addition to vmprof, already mentioned by Dmitry Trofimov in this answer, there are also vprof and pyflame. All of them support flame graphs one way or another, giving you a nice overview of where the time was spent.
Austin 是 CPython 的帧堆栈采样器,可用于为 Python 制作不需要仪器的统计分析器并引入最小的开销。最简单的做法是使用 FlameGraph 通过管道传输 Austin 的输出。但是,您可以使用自定义应用程序获取 Austin 的输出,以制作您自己的分析器,该分析器正是针对您的需求。
这是 Austin TUI 的屏幕截图,这是一个终端应用程序,它提供了正在运行的 Python 应用程序内发生的所有事情的顶部视图。
这是 Web Austin,一个 Web 应用程序,可以向您显示所收集样本的实时火焰图。您可以配置为应用程序提供服务的地址,然后您可以进行远程分析。
Austin is a frame stack sampler for CPython that can be used to make statistical profilers for Python that require no instrumentation and introduce minimal overhead. The simplest thing to do is to pipe the output of Austin with FlameGraph. However, you can just grab Austin's output with a custom application to make your very own profiler that is targeted at precisely your needs.
This is a screenshot of Austin TUI, a terminal application that provides a top-like view of everything that is happening inside a running Python application.
This is Web Austin, a web application that shows you a live flame graph of the collected samples. You can configure the address where to serve the application which then allows you to do remote profiling.
对于 Python,有 py-spy 来转储堆栈跟踪。转储可以通过 speedscope 进行分析
来源:指南
For Python there is py-spy to dump the stacktraces. The dumps can get analyzed by speedscope
Source: Guidelines