有没有简单的方法来对Python脚本进行基准测试?
通常我使用 shell 命令time
。我的目的是测试数据集是小、中、大还是非常大,需要多少时间和内存使用量。
有什么 Linux 或 Python 工具可以做到这一点吗?
Usually I use shell command time
. My purpose is to test if data is small, medium, large or very large set, how much time and memory usage will be.
Any tools for Linux or just Python to do this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(14)
有多种方法可以对 Python 脚本进行基准测试。一种简单的方法是使用 timeit 模块,它提供了一种简单的方法来测量小代码片段的执行时间。但是,如果您正在寻找包含内存使用情况的更全面的基准测试,则可以使用 memory_profiler 包来测量内存使用情况。
要可视化您的基准,您可以使用 plotly 库,它允许您创建交互式绘图。您可以创建折线图来显示不同输入大小的执行时间和内存使用情况。
下面是一个示例代码片段,用于对以矩阵、行和列作为输入的函数的两种不同实现进行基准测试:
图表:
从图表中可以看出,这两个函数似乎具有相似的内存使用情况,这一点很高兴知道。就运行时而言,func_impl_2 似乎普遍比 func_impl_1 更快,这也是一个积极的发现。然而,两个函数之间的性能差异非常小,并且对于非常小的输入大小,func_impl_1 的性能超过了 func_impl_2 的性能。这可能表明 func_impl_1 的更简单实现对于较小的输入仍然是可行的选择,尽管 func_impl_2 通常更快。总体而言,这些图表提供了有关这些功能性能的宝贵见解,并且可以在选择在不同场景中使用哪种实现时帮助做出决策。
There are several ways to benchmark Python scripts. One simple way to do this is by using the timeit module, which provides a simple way to measure the execution time of small code snippets. However, if you are looking for a more comprehensive benchmark that includes memory usage, you can use the memory_profiler package to measure memory usage.
To visualize your benchmarks, you can use the plotly library, which allows you to create interactive plots. You can create a line chart to display the execution time and memory usage for different input sizes.
Here's an example code snippet to benchmark two different implementations of a function that takes a matrix, row and column as inputs:
The graph:
Looking at the graph, it seems like both functions have similar memory usage, which is good to know. In terms of runtime, func_impl_2 seems to be generally faster than func_impl_1, which is also a positive finding. However, the difference in performance between the two functions is quite small, and there is a point where the performance of func_impl_1 surpasses that of func_impl_2 for very small input sizes. This may indicate that the simpler implementation of func_impl_1 is still a viable option for smaller inputs, even though func_impl_2 is generally faster. Overall, the graphs provide valuable insights into the performance of these functions and can help with decision-making when choosing which implementation to use in different scenarios.
快速测试任何函数的简单方法是使用以下语法:
%timeit my_code
例如:
The easy way to quickly test any function is to use this syntax :
%timeit my_code
For instance :
小心
timeit
非常慢,在我的中型处理器上需要 12 秒才能初始化(或者可能运行该函数)。您可以简单地测试这个接受的答案,我将使用
time
代替,在我的电脑上它返回结果0.0
Be carefull
timeit
is very slow, it take 12 second on my medium processor to just initialize (or maybe run the function). you can test this accepted answerfor simple thing I will use
time
instead, on my PC it return the result0.0
根据 Danyun Liu 的回答以及一些便利功能,也许它对某人有用。
一些测试:
结果:
Based on Danyun Liu's answer with some convenience features, perhaps it is useful to someone.
Some tests:
Result:
我写了一个工具对给定函数进行并发压力测试,输出与 Apache AB 类似。可能这就是您想要的:
输出将是:
I wrote a tool to do concurrency stress test on a given function, and the out put is similar to Apache AB. may be this is what you want:
and the output will be:
看看timeit,Python 分析器 和 pycallgraph< /a>.另请务必查看评论下面由
nikicc
提到“SnakeViz”。它为您提供了另一种有用的分析数据可视化效果。timeit
本质上,你可以将Python代码作为字符串参数传递给它,它会运行指定的次数并打印执行时间。 文档中的重要内容:
... 和:
分析
分析将为您提供 关于正在发生的事情的更详细的想法。这是来自官方文档的“即时示例”:
将为您提供:
这 模块应该让您了解在哪里寻找瓶颈。
另外,要掌握
profile
的输出,请查看此发布pycallgraph
注意 pycallgraph已被正式放弃自2018年2月起。截至 2020 年 12 月,它仍在 Python 3.6 上运行。只要 python 公开分析 API 的方式没有发生核心变化,它就应该仍然是一个有用的工具。
此模块使用 graphviz 创建调用图,如下所示:
您可以通过颜色轻松查看哪些路径使用时间最多。您可以使用 pycallgraph API 或使用打包脚本创建它们:
不过开销相当大。因此,对于已经长时间运行的流程,创建图表可能需要一些时间。
Have a look at timeit, the python profiler and pycallgraph. Also make sure to have a look at the comment below by
nikicc
mentioning "SnakeViz". It gives you yet another visualisation of profiling data which can be helpful.timeit
Essentially, you can pass it python code as a string parameter, and it will run in the specified amount of times and prints the execution time. The important bits from the docs:
... and:
Profiling
Profiling will give you a much more detailed idea about what's going on. Here's the "instant example" from the official docs:
Which will give you:
Both of these modules should give you an idea about where to look for bottlenecks.
Also, to get to grips with the output of
profile
, have a look at this postpycallgraph
NOTE pycallgraph has been officially abandoned since Feb. 2018. As of Dec. 2020 it was still working on Python 3.6 though. As long as there are no core changes in how python exposes the profiling API it should remain a helpful tool though.
This module uses graphviz to create callgraphs like the following:
You can easily see which paths used up the most time by colour. You can either create them using the pycallgraph API, or using a packaged script:
The overhead is quite considerable though. So for already long-running processes, creating the graph can take some time.
我使用一个简单的装饰器来计时功能
I use a simple decorator to time the func
timeit 模块又慢又奇怪,所以我写了这个:
示例:
对我来说,它说:
这是一种原始的基准测试,但它已经足够好了。
The
timeit
module was slow and weird, so I wrote this:Example:
For me, it says:
This is a primitive sort of benchmarking, but it's good enough.
我通常会快速执行一次
time ./script.py
来看看需要多长时间。但这并没有显示内存,至少不是默认的。您可以使用 /usr/bin/time -v ./script.py 来获取大量信息,包括内存使用情况。I usually do a quick
time ./script.py
to see how long it takes. That does not show you the memory though, at least not as a default. You can use/usr/bin/time -v ./script.py
to get a lot of information, including memory usage.内存分析器可满足您的所有内存需求。
https://pypi.python.org/pypi/memory_profiler
运行 pip install:
导入库:
向您想要分析的项目添加装饰器:
执行代码:
接收输出:
示例来自上面链接的文档。
Memory Profiler for all your memory needs.
https://pypi.python.org/pypi/memory_profiler
Run a pip install:
Import the library:
Add a decorator to the item you wish to profile:
Execute the code:
Recieve the output:
Examples are from the docs, linked above.
line_profiler(逐行执行时间)
安装
用法
@profile
装饰器。例如:kernprof -l <file_name>
创建 line_profiler 的实例。例如:如果成功,kernprof 将打印
Wrote profile results to.lprof
。例如:python -m line_profiler.lprof
打印基准测试结果。例如:您将看到每行代码的详细信息:
memory_profiler(逐行内存使用情况)
安装
用法
@profile
装饰器。例如:python -m memory_profiler
打印基准测试结果。例如:您将看到有关每行代码的详细信息:
良好实践
多次调用函数以最大程度地减少对环境的影响。
line_profiler (execution time line by line)
instalation
Usage
@profile
decorator before function. For example:kernprof -l <file_name>
to create an instance of line_profiler. For example:kernprof will print
Wrote profile results to <file_name>.lprof
on success. For example:python -m line_profiler <file_name>.lprof
to print benchmark results. For example:You will see detailed info about each line of code:
memory_profiler (memory usage line by line)
instalation
Usage
@profile
decorator before function. For example:python -m memory_profiler <file_name>
to print benchmark results. For example:You will see detailed info about each line of code:
Good Practice
Call a function many times to minimize environment impact.
snakeviz
cProfile 交互式查看器https://github.com /jiffyclub/snakeviz/
cProfile 在 https://stackoverflow.com/a/1593034/895245 中提到并且评论中提到了snakeviz,但我想进一步强调这一点。
仅通过查看
cprofile
/pstats
输出来调试程序性能非常困难,因为它们只能开箱即用地显示每个函数的总时间。然而,我们真正需要的是查看包含每个调用的堆栈跟踪的嵌套视图,以便轻松找到主要瓶颈。
这正是 Snakeviz 通过其默认的“冰柱”视图提供的功能。
首先,您必须将 cProfile 数据转储到二进制文件,然后您可以对其进行蛇形可视化。
这将打印一个指向 stdout 的 URL,您可以在浏览器上打开该 URL,其中包含所需的输出,如下所示:
然后您可以:
更多面向个人资料的问题: 如何分析 Python 脚本?
snakeviz
interactive viewer for cProfilehttps://github.com/jiffyclub/snakeviz/
cProfile was mentioned at https://stackoverflow.com/a/1593034/895245 and snakeviz was mentioned in a comment, but I wanted to highlight it further.
It is very hard to debug program performance just by looking at
cprofile
/pstats
output, because they can only total times per function out of the box.However, what we really need in general is to see a nested view containing the stack traces of each call to actually find the main bottlenecks easily.
And this is exactly what snakeviz provides via its default "icicle" view.
First you have to dump the cProfile data to a binary file, and then you can snakeviz on that
This prints an URL to stdout which you can open on your browser, which contains the desired output that looks like this:
and you can then:
More profile oriented question: How can you profile a Python script?
如果您不想为 timeit 编写样板代码并轻松分析结果,请查看 benchmarkit。它还保存了以前运行的历史记录,因此可以轻松地在开发过程中比较相同的功能。
打印到终端并返回包含上次运行数据的字典列表。命令行入口点也可用。
如果您更改
N=1000000
并重新运行If you don't want to write boilerplate code for timeit and get easy to analyze results, take a look at benchmarkit. Also it saves history of previous runs, so it is easy to compare the same function over the course of development.
Prints to terminal and returns list of dictionaries with data for the last run. Command line entrypoints also available.
If you change
N=1000000
and rerun看一下 nose 及其插件之一,这个尤其如此。
安装后,nose 就是您路径中的一个脚本,您可以在包含一些 python 脚本的目录中调用它:
这将查找当前目录中的所有 python 文件,并执行它识别为测试的任何函数:例如,它会将名称中带有单词 test_ 的任何函数识别为测试。
因此,您可以创建一个名为 test_yourfunction.py 的 python 脚本,并在其中编写类似以下内容:
然后您必须运行
并读取配置文件,使用以下 python 行:
Have a look at nose and at one of its plugins, this one in particular.
Once installed, nose is a script in your path, and that you can call in a directory which contains some python scripts:
This will look in all the python files in the current directory and will execute any function that it recognizes as a test: for example, it recognizes any function with the word test_ in its name as a test.
So you can just create a python script called test_yourfunction.py and write something like this in it:
Then you have to run
and to read the profile file, use this python line: