If the C variant needs x hours less, then I'd invest that time in letting the algorithms run longer/again
"invest" isn't the right word here.
Build a working implementation in Python. You'll finish this long before you'd finish a C version.
Measure performance with the Python profiler. Fix any problems you find. Change data structures and algorithms as necessary to really do this properly. You'll finish this long before you finish the first version in C.
If it's still too slow, manually translate the well-designed and carefully constructed Python into C.
Because of the way hindsight works, doing the second version from existing Python (with existing unit tests, and with existing profiling data) will still be faster than trying to do the C code from scratch.
This quote is important.
Thompson's Rule for First-Time Telescope Makers It is faster to make a four-inch mirror and then a six-inch mirror than to make a six-inch mirror.
Shed Skin is an experimental compiler, that can translate pure, but implicitly statically typed Python (2.4-2.6) programs into optimized C++. It can generate stand-alone programs or extension modules that can be imported and used in larger Python programs.
Besides the typing restriction, programs cannot freely use the Python standard library (although about 25 common modules, such as random and re, are currently supported). Also, not all Python features, such as nested functions and variable numbers of arguments, are supported.
For a set of a 75 non-trivial programs (at over 25,000 lines in total (sloccount)), measurements show a typical speedup of 2-200 times over CPython.
From their page - "Nuitka is a good replacement for the Python interpreter and compiles every construct that CPython 2.6, 2.7, 3.2 and 3.3 offer. It translates the Python into a C++ program that then uses "libpython" to execute in the same way as CPython does, in a very compatible way."
编辑:我想对编译做一个简短的说明:当你编译时,生成的二进制文件比你的Python脚本大得多,因为它将所有依赖项构建到其中,等等。但是然后你会得到一个有几个明显的好处:速度!,现在该应用程序可以在任何机器上运行(取决于您编译的操作系统,如果不是全部。哈哈),无需 Python 或库,它还会混淆您的代码,并且在技术上已做好“生产”准备(对于程度)。有些编译器还生成 C 代码,我还没有真正查看过它是否有用或只是胡言乱语。祝你好运。
希望有帮助。
I know this is an older thread but I wanted to give what I think to be helpful information.
I personally use PyPy which is really easy to install using pip. I interchangeably use Python/PyPy interpreter, you don't need to change your code at all and I've found it to be roughly 40x faster than the standard python interpreter (Either Python 2x or 3x). I use pyCharm Community Edition to manage my code and I love it.
I like writing code in python as I think it lets you focus more on the task than the language, which is a huge plus for me. And if you need it to be even faster, you can always compile to a binary for Windows, Linux, or Mac (not straight forward but possible with other tools). From my experience, I get about 3.5x speedup over PyPy when compiling, meaning 140x faster than python. PyPy is available for Python 3x and 2x code and again if you use an IDE like PyCharm you can interchange between say PyPy, Cython, and Python very easily (takes a little of initial learning and setup though).
Some people may argue with me on this one, but I find PyPy to be faster than Cython. But they're both great choices though.
Edit: I'd like to make another quick note about compiling: when you compile, the resulting binary is much bigger than your python script as it builds all dependencies into it, etc. But then you get a few distinct benefits: speed!, now the app will work on any machine (depending on which OS you compiled for, if not all. lol) without Python or libraries, it also obfuscates your code and is technically 'production' ready (to a degree). Some compilers also generate C code, which I haven't really looked at or seen if it's useful or just gibberish. Good luck.
Another option - to convert to C++ besides Shed Skin - is Pythran.
To quote High Performance Python by Micha Gorelick and Ian Ozsvald:
Pythran is a Python-to-C++ compiler for a subset of Python that includes partial numpy support. It acts a little like Numba and Cython—you annotate a function’s arguments, and then it takes over with further type annotation and code specialization. It takes advantage of vectorization possibilities and of OpenMP-based parallelization possibilities. It runs using Python 2.7 only.
One very interesting feature of Pythran is that it will attempt to automatically spot parallelization opportunities (e.g., if you’re using a map), and turn this into parallel code without requiring extra effort from you. You can also specify parallel sections using pragma omp > directives; in this respect, it feels very similar to Cython’s OpenMP support.
Behind the scenes, Pythran will take both normal Python and numpy code and attempt to aggressively compile them into very fast C++—even faster than the results of Cython.
You should note that this project is young, and you may encounter bugs; you should also note that the development team are very friendly and tend to fix bugs in a matter of hours.
http://code.google.com/p/py2c/ looks like a possibility - they also mention on their site: Cython, Shedskin and RPython and confirm that they are converting Python code to pure C/C++ which is much faster than C/C++ riddled with Python API calls. Note: I haven’t tried it but I am going to..
For the functions I tried, Pythran gives extremely good results. The resulting functions are as fast as well written Fortran code (or only slightly slower) and a little bit faster than the (quite optimized) Cython solution.
The advantage compared to Cython is that you just have to use Pythran on the Python function optimized for Numpy, meaning that you do not have to expand the loops and add types for all variables in the loop. Pythran takes its time to analyse the code so it understands the operations on numpy.ndarray.
It is also a huge advantage compared to Numba or other projects based on just-in-time compilation for which (to my knowledge), you have to expand the loops to be really efficient. And then the code with the loops becomes very very inefficient using only CPython and Numpy...
A drawback of Pythran: no classes! But since only the functions that really need to be optimized have to be compiled, it is not very annoying.
Another point: Pythran supports well (and very easily) OpenMP parallelism. But I don't think mpi4py is supported...
发布评论
评论(8)
“投资”在这里不是正确的词。
用 Python 构建一个有效的实现。您将在完成 C 版本之前完成此任务。
使用 Python 分析器测量性能。解决您发现的任何问题。根据需要更改数据结构和算法,以真正正确地做到这一点。您将在完成第一个 C 版本之前完成此任务。
如果仍然太慢,请手动将精心设计和精心构造的 Python 翻译为 C。
由于事后看来,从现有 Python 执行第二个版本(使用现有单元测试和现有分析数据)仍然比尝试从头开始执行 C 代码更快。
这句话很重要。
"invest" isn't the right word here.
Build a working implementation in Python. You'll finish this long before you'd finish a C version.
Measure performance with the Python profiler. Fix any problems you find. Change data structures and algorithms as necessary to really do this properly. You'll finish this long before you finish the first version in C.
If it's still too slow, manually translate the well-designed and carefully constructed Python into C.
Because of the way hindsight works, doing the second version from existing Python (with existing unit tests, and with existing profiling data) will still be faster than trying to do the C code from scratch.
This quote is important.
是的。查看 Cython。它的作用就是:将 Python 转换为 C 以提高速度。
Yes. Look at Cython. It does just that: Converts Python to C for speedups.
Shed Skin 是“一个(受限的)Python 到 C++ 编译器”。
来自文档:
Shed Skin is "a (restricted) Python-to-C++ compiler".
From the docs:
刚刚在黑客新闻中发现了这个新工具。
从他们的页面 - “Nuitka 是 Python 解释器的一个很好的替代品,它编译 CPython 2.6、2.7、3.2 和 3.3 提供的每个结构。它将 Python 转换为 C++ 程序,然后使用“libpython”以与CPython 以一种非常兼容的方式做到了。”
Just came across this new tool in hacker news.
From their page - "Nuitka is a good replacement for the Python interpreter and compiles every construct that CPython 2.6, 2.7, 3.2 and 3.3 offer. It translates the Python into a C++ program that then uses "libpython" to execute in the same way as CPython does, in a very compatible way."
我知道这是一个较旧的线程,但我想提供我认为有用的信息。
我个人使用 PyPy,它非常容易使用 pip 安装。我交替使用 Python/PyPy 解释器,您根本不需要更改代码,而且我发现它比标准 python 解释器(Python 2 倍或 3 倍)快大约 40 倍。我使用 pyCharm 社区版来管理我的代码,我喜欢它。
我喜欢用 python 编写代码,因为我认为它可以让你更多地关注任务而不是语言,这对我来说是一个巨大的优势。如果您需要更快,您可以随时编译为 Windows、Linux 或 Mac 的二进制文件(不是直接进行,但可以使用其他工具)。根据我的经验,编译时速度比 PyPy 快 3.5 倍,这意味着比 python 快 140 倍。 PyPy 可用于 Python 3x 和 2x 代码,如果您使用像 PyCharm 这样的 IDE,您可以非常轻松地在 PyPy、Cython 和 Python 之间进行互换(不过需要一些初始学习和设置)。
有些人可能会在这一点上与我争论,但我发现 PyPy 比 Cython 更快。但它们都是不错的选择。
编辑:我想对编译做一个简短的说明:当你编译时,生成的二进制文件比你的Python脚本大得多,因为它将所有依赖项构建到其中,等等。但是然后你会得到一个有几个明显的好处:速度!,现在该应用程序可以在任何机器上运行(取决于您编译的操作系统,如果不是全部。哈哈),无需 Python 或库,它还会混淆您的代码,并且在技术上已做好“生产”准备(对于程度)。有些编译器还生成 C 代码,我还没有真正查看过它是否有用或只是胡言乱语。祝你好运。
希望有帮助。
I know this is an older thread but I wanted to give what I think to be helpful information.
I personally use PyPy which is really easy to install using pip. I interchangeably use Python/PyPy interpreter, you don't need to change your code at all and I've found it to be roughly 40x faster than the standard python interpreter (Either Python 2x or 3x). I use pyCharm Community Edition to manage my code and I love it.
I like writing code in python as I think it lets you focus more on the task than the language, which is a huge plus for me. And if you need it to be even faster, you can always compile to a binary for Windows, Linux, or Mac (not straight forward but possible with other tools). From my experience, I get about 3.5x speedup over PyPy when compiling, meaning 140x faster than python. PyPy is available for Python 3x and 2x code and again if you use an IDE like PyCharm you can interchange between say PyPy, Cython, and Python very easily (takes a little of initial learning and setup though).
Some people may argue with me on this one, but I find PyPy to be faster than Cython. But they're both great choices though.
Edit: I'd like to make another quick note about compiling: when you compile, the resulting binary is much bigger than your python script as it builds all dependencies into it, etc. But then you get a few distinct benefits: speed!, now the app will work on any machine (depending on which OS you compiled for, if not all. lol) without Python or libraries, it also obfuscates your code and is technically 'production' ready (to a degree). Some compilers also generate C code, which I haven't really looked at or seen if it's useful or just gibberish. Good luck.
Hope that helps.
除了 Shed Skin 之外,另一种选择 - 转换为 C++ - 是 Pythran。
引用 Micha Gorelick 和 Ian Ozsvald 的《高性能 Python》:
Another option - to convert to C++ besides Shed Skin - is Pythran.
To quote High Performance Python by Micha Gorelick and Ian Ozsvald:
http://code.google.com/p/py2c/ 看起来有可能 - 它们还在他们的网站上提到:Cython、Shedskin 和 RPython,并确认他们正在将 Python 代码转换为纯 C/C++,这比充满 Python API 调用的 C/C++ 快得多。注意:我还没有尝试过,但我会..
http://code.google.com/p/py2c/ looks like a possibility - they also mention on their site: Cython, Shedskin and RPython and confirm that they are converting Python code to pure C/C++ which is much faster than C/C++ riddled with Python API calls. Note: I haven’t tried it but I am going to..
我意识到缺少一个全新解决方案的答案。如果代码中使用 Numpy,我建议尝试 Pythran:
http://pythran.readthedocs.io/
对于我尝试过的函数,Pythran 给出了非常好的结果。生成的函数与编写良好的 Fortran 代码一样快(或仅稍慢),并且比(相当优化的)Cython 解决方案快一点。
与 Cython 相比的优点是,您只需在针对 Numpy 优化的 Python 函数上使用 Pythran,这意味着您不必扩展循环并为循环中的所有变量添加类型。 Pythran 花时间分析代码,以便理解 numpy.ndarray 上的操作。
与 Numba 或其他基于即时编译的项目相比,这也是一个巨大的优势,(据我所知),您必须扩展循环才能真正高效。然后,仅使用 CPython 和 Numpy,带有循环的代码变得非常非常低效...
Pythran 的缺点:没有类!但由于只有真正需要优化的函数才需要编译,所以这并不是很烦人。
另一点:Pythran 很好地(并且非常容易地)支持 OpenMP 并行性。但我不认为 mpi4py 受支持......
I realize that an answer on a quite new solution is missing. If Numpy is used in the code, I would advice to try Pythran:
http://pythran.readthedocs.io/
For the functions I tried, Pythran gives extremely good results. The resulting functions are as fast as well written Fortran code (or only slightly slower) and a little bit faster than the (quite optimized) Cython solution.
The advantage compared to Cython is that you just have to use Pythran on the Python function optimized for Numpy, meaning that you do not have to expand the loops and add types for all variables in the loop. Pythran takes its time to analyse the code so it understands the operations on
numpy.ndarray
.It is also a huge advantage compared to Numba or other projects based on just-in-time compilation for which (to my knowledge), you have to expand the loops to be really efficient. And then the code with the loops becomes very very inefficient using only CPython and Numpy...
A drawback of Pythran: no classes! But since only the functions that really need to be optimized have to be compiled, it is not very annoying.
Another point: Pythran supports well (and very easily) OpenMP parallelism. But I don't think mpi4py is supported...