有关使用 C 扩展或 Cython 优化重要 Python 应用程序的教程
Python 社区发布了有用的参考材料,展示了如何分析 Python 代码,以及 C 或 Cython。然而,对于重要的 Python 程序,我仍在寻找教程,其中显示以下内容:
- 如何识别将受益于通过转换为 C 扩展进行优化的热点
- 同样重要的是,如何识别将没有从转换为 C 扩展中受益
- 最后,如何使用 Python C-API 或(也许更好)使用 Cython 进行从 Python 到 C 的适当转换。
一个好的教程将为读者提供一种如何通过完整的示例来推理优化问题的方法。我没有成功找到这样的资源。
您知道(或编写过)这样的教程吗?
为了澄清,我对仅涵盖以下内容的教程不感兴趣:
- 使用 (c)Profile 来配置文件 用于测量运行时间的 Python 代码
- 使用工具检查配置文件(我推荐 RunSnakeRun)
- 通过选择进行优化更合适的算法或Python构造(例如,用于成员资格测试的集合而不是列表);本教程应该假设算法和 Python 代码已经是最佳的,并且我们正处于 C 扩展是下一个逻辑步骤的阶段
- 概括 关于编写 C 扩展的 Python 文档,作为参考已经很不错了,但作为展示何时以及如何从 Python 迁移到 C 的资源却没有什么用处。
The Python community has published helpful reference material showing how to profile Python code, and the technical details of Python extensions in C or in Cython. I am still searching for tutorials which show, however, for non-trivial Python programs, the following:
- How to identify the hotspots which will benefit from optimization by conversion to a C extension
- Just as importantly, how to identify the hotspots which will not benefit from conversion to a C extension
- Finally, how to make the appropriate conversion from Python to C, either using the Python C-API or (perhaps even preferably) using Cython.
A good tutorial would provide the reader with a methodology on how to reason through the problem of optimization by working through a complete example. I have had no success finding such a resource.
Do you know of (or have you written) such a tutorial?
For clarification, I'm not interested in tutorials that cover only the following:
- Using (c)Profile to profile Python code to measure running times
- Using tools to examine profiles (I recommend RunSnakeRun)
- Optimizing by selecting more appropriate algorithms or Python constructs (e.g., sets for membership tests instead of lists); the tutorial should assume the algorithm and Python code is already optimal, and we are at a point where a C extension is the next logical step
- Recapitulating the Python documentation on writing C extensions, which is already excellent as a reference but not useful as a resource for showing when and how to move from Python to C.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
第 1 点和第 2 点只是基本的优化经验法则。如果有您正在寻找的教程,我会感到非常惊讶。也许这就是你还没有找到的原因。我的简短清单:
...
只需从使用 常用 python 工具分析 python 代码开始。找到您的代码需要优化的地方。然后尝试用 python 来优化它。如果仍然太慢,请尝试了解原因。如果它受 IO 限制,那么 C 程序不太可能会更好。如果问题出在算法上,那么 C 也不太可能表现得更好。实际上,C 可以提供帮助的“好”情况非常罕见,运行时不应与您想要的相差太远(例如 3 倍加速中的 2 倍)数据结构很简单,并且会从低级表示中受益,并且您真的,确实需要加速。在大多数其他情况下,使用 C 而不是 python 将是一项没有回报的工作。
事实上,从 python 调用 C 代码时很少会以性能为主要目标。更常见的目标是将 python 与一些现有的 C 代码连接起来。
正如另一位发帖者所说,最好建议您使用 cython。
如果您仍然想为 Python 编写 C 模块,所有必要的内容都在官方文档中。
Points 1 and 2 are just basic optimization rule of thumbs. I would be very astonished if there was anywhere the kind of tutorial you are looking for. Maybe that's why you haven't found one. My short list:
...
Just start by profiling your python code with usual python tools. Find where you code need to be optimized. Then try to optimize it sticking with python. If it is still too slow, try to understand why. If it's IO bound it is unlikely a C program would be better. If the problem come from the algorithm it is also unlikely C would perform better. Really the "good" cases where C could help are quite rare, runtime should not be too far away from what you want (like a 2 of 3 times speedup) data structure are simples and would benefit from a low level representation and you really, really need that speedup. In most other cases using C instead of python will be an unrewarding job.
Really it is quite rare calling C code from python is done with performance in mind as a primary goal. More often the goal is to interface python with some existing C code.
And as another other poster said, you would probably be better advised of using cython.
If you still want to write a C module for Python, all necessary is in the official documentation.
O'Reilly 有一个教程(免费提供)据我所知,我能够阅读整篇文章),它说明了如何分析真实项目(他们使用 EDI 解析项目作为分析主题)并识别热点。 O'Reilly 文章中没有提供太多有关编写可解决瓶颈的 C 扩展的详细信息。然而,它确实通过一个重要的示例涵盖了您想要的前两件事。
此处详细记录了编写 C 扩展的过程。困难的部分是想出方法来复制 Python 代码在 C 中所做的事情,这需要一些在教程中很难教授的东西:独创性、算法知识、硬件和效率,以及相当多的 C 技能。
希望这有帮助。
O'Reilly has a tutorial (freely available as far as I can tell, I was able to read the whole thing) that illustrates how to profile a real project (they use an EDI parsing project as a subject for profiling) and identify hotspots. There's not too much detail on writing the C extension that will fix the bottleneck in the O'Reilly article. It does, however, cover the first two things that you want with a non-trivial example.
The process of writing C extensions is fairly well documented here. The hard part is coming up with ways to replicate what Python code is doing in C, and that takes something that would be hard to teach in a tutorial: ingenuity, knowledge of algorithms, hardware, and efficiency, and considerable C skill.
Hope this helps.
对于第 1 点和第 2 点,我将使用 Python 分析器,例如 cProfile。请参阅此处获取快速教程。
如果您已经有一个现有的 python 程序,对于第 3 点,您可能需要考虑使用 Cython。当然,您也许可以想出一种算法改进来提高执行速度,而不是用 C 语言重写。
For points 1 and 2, I would use a Python profiler, for example cProfile. See here for a quick tutorial.
If you've got an already existing python program, for point 3 you might want to consider using Cython. Of course, rather than re-writing in C, you may be able to think up an algorithmic improvement that will increase execution speed.
我将尝试解决您的第 1 点和第 2 点,以及您的前 3 个要点,但不按顺序。
第三个要点是“假设算法和 python 代码已经是最优的”。当代码处于该状态时,如果获取堆栈样本(如此处所述),从时间角度来看,示例准确地显示了程序正在执行的操作,并且如果不更改语言,似乎没有什么可以改进的。然而,既然你知道它是如何花费时间的,你就知道哪种低级算法(可能包含多个函数,而不仅仅是一个热点)可以通过花费更少的时间来受益,即通过转换为 C 关于第 1 点,
此方法显示了代码的哪些部分将通过转换为 C 受益,并且它们可能是也可能不是热点。 (首先想到的是任何类型的递归函数或函数集。或者,一起完成某些目的的一小群函数,例如爬山器。)
关于第 2 点,任何未出现的代码堆栈样本的健康百分比,或者通过转换为 C 语言确实但显然不会受益,例如 I/O。
关于第一点和第二点,我同意测量不是主要目标,而是寻找要优化的代码过程的副产品。提出这样的测量结果也不是重点。
我也遇到过类似的情况,只不过不是在 python 和 C 之间,而是在 C 和硬件之间。**
举个例子,如果总运行时间为 10 秒,并且算法大约 50% 的时间在堆栈上,那么它大约负责 10 秒中的 5 秒。如果将算法转换为 C 会带来 10 倍的加速,那么 5 秒将缩短至 0.5 秒,因此总时间将缩短至 5.5 秒。 (粗略地说,实现时间减少比提前准确知道时间有多大更重要。)请注意,此时,可以重复整个过程,并且将其他内容转换为 C 也可能是有意义的。
当示例显示 python 代码正在执行其擅长的操作并且 C 代码正在执行其擅长的操作时,您可以停止此过程。
** 例如,浮点数学、库与芯片、或图形、绘制文本和图形。多边形。
I will try to address your points 1 and 2, and your first 3 bullet points, but not in order.
The third bullet point says "assume the algorithm and python code is already optimal". When code is in that state, if one takes stack samples (as outlined here), the samples show exactly what the program is doing, from a time perspective, and there seems to be nothing that could be improved without language change. However, since you know how it is spending its time, you know which low-level algorithm (which could consist of more than one function, not just a hotspot) could benefit by being made to take less time, i.e. by being converted to C.
Regarding point 1, this method shows which parts of the code will benefit by conversion to C, and they may or may not be hotspots. (The first thing that comes to mind is any sort of recursive function or set of functions. Or, a small group of functions that together accomplish some purpose, such as a hill-climber.)
Regarding point 2, any code which does not appear on a healthy percent of stack samples, or which does but clearly will not benefit by being converted to C, such as I/O.
Regarding the first and second bullet points, I would agree that measuring is not the primary objective, but a by-product of the process of finding the code to optimize. Presenting such measurements also is beside the point.
I have been in similar situations, except not between python and C, but between C and hardware.**
Just to give an example, if the total run time is 10 seconds, and the algorithm is on the stack roughly 50% of the time, then it is responsible for roughly 5 of the 10 seconds. If converting the algorithm to C would give a 10x speedup, then that 5 seconds would shrink to 0.5 seconds, so the overall time would shrink to 5.5 seconds. (Roughly - it's more important to achieve the time reduction than to know in advance precisely how big it will be.) Notice, at this point, the whole process could be repeated, and it might make sense to convert something else to C also.
You can stop this process when samples show that the python code is doing what it's good at, and the C code is doing what it's good at.
** e.g. Floating-point math, library vs. chip, or graphics, drawing text & polygons.