为什么 Python 对于一个简单的 for 循环来说这么慢?
我们正在 Python 中实现一些 kNN
和 SVD
实现。其他人选择了 Java。我们的执行时间非常不同。我使用 cProfile 来查看我在哪里犯了错误,但实际上一切都很好。是的,我也使用 numpy。但我想问一个简单的问题。
total = 0.0
for i in range(9999): # xrange is slower according
for j in range(1, 9999): #to my test but more memory-friendly.
total += (i / j)
print total
这段代码在我的电脑上花费了 31.40 秒。
此代码的 Java 版本在同一台计算机上需要 1 秒或更短的时间。我认为类型检查是这段代码的主要问题。但我应该为我的项目做很多这样的操作,我认为 9999*9999 不是那么大的数字。
我认为我犯了错误,因为我知道许多科学项目都使用 Python。但为什么这段代码这么慢,我该如何处理比这更大的问题呢?
我应该使用 JIT 编译器(例如 Psyco
)吗?
编辑
我还说这个循环问题只是一个例子。代码并不像这样简单,将您的改进/代码示例付诸实践可能很困难。
另一个问题是我可以实施大量的数据挖掘和数据挖掘吗?使用 numpy
和 scipy
的机器学习算法是否正确使用?
We are making some kNN
and SVD
implementations in Python. Others picked Java. Our execution times are very different. I used cProfile to see where I make mistakes but everything is quite fine actually. Yes, I use numpy
also. But I would like to ask simple question.
total = 0.0
for i in range(9999): # xrange is slower according
for j in range(1, 9999): #to my test but more memory-friendly.
total += (i / j)
print total
This snippet takes 31.40s on my computer.
Java version of this code takes 1 second or less on the same computer. Type checking is a main problem for this code, I suppose. But I should make lots of operation like this for my project and I think 9999*9999 is not so big number.
I think I am making mistakes because I know Python is used by lots of scientific projects. But why is this code so slow and how can I handle problems bigger than this?
Should I use a JIT compiler such as Psyco
?
EDIT
I also say that this loop problem is only an example. The code is not as simple as like this and It may be tough to put into practice your improvements/code samples.
Another question is that can I implement lots of data mining & machine learning algorithms with numpy
and scipy
if I use it correctly?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
为什么在此示例循环中 Java 比 Python 更快?
新手解释:将程序想象成一列货运列车,在前进时铺设自己的火车轨道。火车行驶之前必须先铺设轨道。 Java货运列车可以在火车前面发送数千个铺轨器,所有铺轨器都并行工作,提前数英里铺设轨道,而Python一次只能发送一名劳动力,并且只能在列车前方10英尺处铺设轨道。特雷恩群岛
Java 具有强类型,使编译器能够使用 JIT 功能:(https://en. wikipedia.org/wiki/Just-in-time_compilation),它使 CPU 能够在需要指令之前并行地获取内存并执行指令。 Java 可以“某种程度上”与其自身并行运行 for 循环中的指令。 Python 没有具体的类型,因此必须在每条指令中决定要完成的工作的性质。这会导致您的整个计算机停止并等待所有变量中的所有内存被重新扫描。这意味着 Python 中的循环是多项式
O(n^2)
时间,而 Java 循环由于强类型的原因可以而且通常是线性时间 O(n)。他们大量使用 SciPy(NumPy 是最重要的组件,但我听说围绕 NumPy API 开发的生态系统更为重要),这极大地加快了这些项目所需的各种操作。你做错了什么:你没有用 C 编写关键代码。Python 非常适合一般开发,但放置得当的扩展模块本身就是一个重要的优化(至少当你处理数字时)。 Python 是一种非常糟糕的语言,无法实现紧密的内部循环。
默认的(目前是最流行和广泛支持的)实现是一个简单的字节码解释器。即使是最简单的操作,如整数除法,也可能需要数百个 CPU 周期、多次内存访问(类型检查是一个流行的例子)、多次 C 函数调用等,而不是几个(甚至在整数的情况下甚至是单个操作)分)指令。此外,该语言设计有许多抽象,这增加了开销。如果使用 xrange,则循环会在堆上分配 9999 个对象 - 如果使用
range
,则分配的数量会更多(对于缓存的小整数,99999999 整数减去大约 256256)。此外,xrange
版本在每次迭代时调用一个方法来前进 - 如果序列上的迭代没有经过专门优化,range
版本也会调用一个方法。但它仍然需要整个字节码分派,这本身就非常复杂(当然,与整数除法相比)。看看什么是 JIT 会很有趣(我推荐 PyPy 而不是 Psyco,后者不再被积极开发,而且范围非常有限 - 但它可能适合这个简单的例子)。经过一小部分迭代后,它应该产生一个近乎最优的机器代码循环,并添加了一些保护措施 - 简单的整数比较,如果失败则跳转 - 以保持正确性,以防你在该列表中得到一个字符串。 Java 可以做同样的事情,只是速度更快(它不必首先跟踪)并且需要更少的保护(至少如果您使用 int 的话)。这就是为什么它这么快。
Why is Java faster than Python on this example loops?
Novice Explanation: Think of a program like a freight train that lays its own train-track as it moves forward. Track must be laid before the train can move. The Java Freight train can send thousands of track-layers ahead of the train, all working in parallel laying track many miles in advance, wheras python can only send one laboror at a time, and can only lay track 10 feet in front of where the train is.
Java has strong types and that affords the compiler to use JIT features: (https://en.wikipedia.org/wiki/Just-in-time_compilation) which enable the CPU to fetch memory and execute instructions in the future in parallel, before the instruction is needed. Java can 'sort of' run the instructions in your for loop in parallel with itself. Python has no concrete types and so the nature of the work to be done has to be decided at every instruction. This causes your entire computer to stop and wait for all the memory in all of your variables to be re-scanned. Meaning loops in python are polynomial
O(n^2)
time, wheras Java loops can be, and often are linear time O(n), due to strong types.They're heavily using SciPy (NumPy being the most prominent component, but I've heard the ecosystem that developed around NumPy's API is even more important) which vastly speeds up all kinds operations these projects need. There's what you are doing wrong: You aren't writing your critical code in C. Python is great for developing in general, but well-placed extension modules are a vital optimization in its own right (at least when you're crunching numbers). Python is a really crappy language to implement tight inner loops in.
The default (and for the time being most popular and widely-supported) implementation is a simple bytecode interpreter. Even the simplest operations, like an integer division, can take hundreds of CPU cycles, multiple memory accesses (type checks being a popular example), several C function calls, etc. instead of a few (or even single, in the case of integer division) instruction. Moreover, the language is designed with many abstractions which add overhead. Your loop allocates 9999 objects on the heap if you use xrange - far more if you use
range
(99999999 integer minus around 256256 for small integers which are cached). Also, thexrange
version calls a method on each iteration to advance - therange
version would too if iteration over sequences hadn't been optimized specifically. It still takes a whole bytecode dispatch though, which is itself vastly complex (compared to an integer division, of course).It would be interesting to see what a JIT (I'd recommend PyPy over Psyco, the latter isn't actively developed anymore and very limited in scope anyway - it might work well for this simple example though). After a tiny fraction of iterations, it should produce a nigh-optimal machine code loop augmented with a few guards - simple integer comparisions, jumping if they fail - to maintain correctness in case you got a string in that list. Java can do the same thing, only sooner (it doesn't have to trace first) and with fewer guards (at least if you use
int
s). That's why it's so much faster.因为您提到了科学代码,所以请查看
numpy
。你正在做的事情可能已经完成了(或者更确切地说,它使用 LAPACK 来处理 SVD 之类的事情)。当您听说 python 用于科学代码时,人们可能并不是指按照您在示例中的方式使用它。举一个简单的例子:(
如果您使用的是 python3,您的示例将使用浮点除法。我的示例假设您使用的是 python2.x,因此使用整数除法。如果没有,请指定 i = np.arange(9999 ,dtype=np.float)等)
为了给出一些计时的想法...(我将在这里使用浮点除法,而不是像您的示例中那样的整数除法):
如果我们比较计时:
Because you mention scientific code, have a look at
numpy
. What you're doing has probably already been done (or rather, it uses LAPACK for things like SVD). When you hear about python being used for scientific code, people probably aren't referring to using it in the way you do in your example.As a quick example:
(If you're using python3, your example would use float division. My example assumes you're using python2.x, and therefore integer division. If not, specify
i = np.arange(9999, dtype=np.float)
, etc)To give some idea of timing... (I'll use floating point division here, instead of integer division as in your example):
If we compare timings:
我认为 NumPy 可以比 CPython for 循环更快(我没有在 PyPy 中测试)。
我想从 Joe Kington 的代码开始,因为这个答案使用了 NumPy。
我自己:
另外,高中数学可以将问题简化到计算机上。
因此,
另外,大学数学更能将问题简化到计算机上。
np.euler_gamma:Euler-Mascheroni 常数(0.57721566...)
由于 NumPy 中的 Euler-Mascheroni 常数不准确,您会失去准确性,例如
489223499.9991845 -> 489223500.0408554。
如果您可以忽略 0.0000000085% 的误差,您可以节省更多时间。
输入越大,NumPy 的好处就越大。
在特殊情况下,您可以使用 numba (不幸的是并不总是)。
所以,我建议将 NumPy、数学和 numba 一起使用。
I think NumPy can be faster than CPython for loops (I didn't test in PyPy).
I want to start from Joe Kington's code because this answer used NumPy.
by myself:
In addition, High School Mathematics can simplify the problem to computer.
Therefore,
In addition, University Mathematics can simplify the problem to computer more.
np.euler_gamma: Euler-Mascheroni constant (0.57721566...)
Because of inaccuracy of Euler-Mascheroni constant in NumPy, You lose accuracy like
489223499.9991845 -> 489223500.0408554.
If You can ignore 0.0000000085% inaccuracy, You can save more time.
Benefit of NumPy becomes larger with larger input.
In special case, You can use numba (Unfortunately not always).
So, I recommend to use NumPy, mathematics and numba together.
Python for 循环是静态类型和解释的。未编译。 Java 速度更快,因为它具有 Python 所没有的额外 JIT 加速功能。
http://en.wikipedia.org/wiki/Just-in-time_compilation
要说明 Java JIT 带来的巨大差异,请查看此内容python 程序需要大约 5 分钟:
虽然这个基本上等效的 Java 程序大约需要 23 毫秒:
就在 for 循环中执行任何操作而言,Java 通过快 1 到 1000 个数量级来清理 Python 的时钟。
这个故事的寓意是:如果需要快速的性能,应该不惜一切代价避免基本的 python for 循环。这可能是因为 Guido van Rossum 希望鼓励人们使用多处理器友好的结构,例如数组拼接,其运行速度比 Java 更快。
Python for loops are statically typed and interpreted. Not compiled. Java is faster because it has extra JIT acceleration features that Python does not have.
http://en.wikipedia.org/wiki/Just-in-time_compilation
To illustrate just how massive a difference Java JIT makes, look at this python program that takes about 5 minutes:
While this fundamentally equivalent Java program takes about 23 milliseconds:
In terms of doing anything in a for loop, Java cleans python's clock by being between 1 and 1000 orders of magnitude faster.
Moral of the story: basic python for loops should be avoided at all costs if speedy performance is required. This could be because Guido van Rossum wants to encourage people to use multi-processor friendly constructs like array splicing, which operate faster than Java.
Python 的好处是,与 Java(只有这种反射机制)相比,有更多的灵活性(例如,类是对象)。
这里没有提到的是 Cython。它允许引入类型变量并将示例反编译为 C/C++。然后就快多了。我还改变了循环中的界限......
然后
给出
The benefit of Python is that there is a lot more flexibility (e.g. classes are objects) compared to Java (where you only have this reflection mechanism)
What's not mentioned here is Cython. It allows to introduce typed variables and trans-compile your example to C/C++. Then it's much faster. I've also changed the bounds in the loop ...
Followed by
gives
这是一个众所周知的现象——python 代码是动态的和解释性的,而 java 代码是静态类型和编译的。那里没有什么惊喜。
人们更喜欢 python 的原因通常是:
但是,如果您使用用 C 编写的库(来自 python),性能可能会好得多(比较:
pickle< /code> 到
cpickle
)。This is a known phenomenon -- python code is dynamic and interpreted, java code is statically typed and compiled. No surprises there.
The reasons people give for preferring python are often:
However, if you use a library written in C (from python), the performance may be much better (compare:
pickle
tocpickle
).您会发现列表推导式或生成器表达式明显更快。例如:
在我的机器上执行大约需要 11 秒,而您的原始代码大约需要 26 秒。仍然比 Java 慢一个数量级,但这更符合您的预期。
顺便说一句,通过将
total
初始化为0
而不是0.0
以使用整数而不是浮点,可以稍微加快您的原始代码添加。您的除法都有整数结果,因此将结果求和为浮点数是没有意义的。在我的机器上,Psyco 实际上减慢了生成器表达式的速度,使其与原始循环的速度大致相同(它根本不加速)。
You will find that list comprehensions or generator expressions are significantly faster. For example:
This executes in ~11 seconds on my machine vs. ~26 for your original code. Still an order of magnitude slower than the Java, but that's more in line with what you'd expect.
Your original code can, by the way, be sped up slightly by initializing
total
to0
rather than0.0
to use integer rather than floating-point addition. Your divisions all have integer results, so there is no point in summing the results to a float.On my machine, Psyco actually slows down the generator expressions to about the same speed as your original loop (which it does not accelerate at all).
使用 kindall 的列表理解
是 10.2 秒,使用 pypy 1.7 是 2.5 秒。这很有趣,因为 pypy 也将原始版本的速度加快到 2.5 秒。因此,对于 pypy 列表理解来说,这是过早的优化;)。干得好,皮皮!
Using kindall's list comprehension
is 10.2 seconds and using pypy 1.7 it is 2.5 seconds. It is funny because pypy speeds up original version to 2.5 seconds also. So for pypy list comprehensions would be premature optimization ;). Good job pypy!
不确定是否已提出建议,但我喜欢用列表理解替换 for 循环。它更快、更干净、更Pythonic。
http://www.pythonforbeginners.com/basics/list-com 海伦斯-in-蟒蛇
Not sure if the recomendation has been made, but I like replacing for loops with list comprehension. Its faster, cleaner, and more pythonic.
http://www.pythonforbeginners.com/basics/list-comprehensions-in-python
使用Python进行科学计算通常意味着在最关键的部分使用一些用C/C++编写的计算软件,以Python作为内部脚本语言,例如ex Sage(也包含大量Python代码)。
我认为这可能有用:
http://blog. dhananjaynene.com/2008/07/performance-comparison-c-java-python-ruby-jython-jruby-groovy/
你可以可见,psyco/PyPy 可以带来一定的改进,但仍然可能比 C++ 或 Java 慢得多。
Doing scientific calculations with python often means using some calculation software written in C/C++ in the most crucial parts, with python as internal script language, as e.x. Sage (which contains also a lot of python code, too).
I think that this may be useful:
http://blog.dhananjaynene.com/2008/07/performance-comparison-c-java-python-ruby-jython-jruby-groovy/
As you can see, psyco/PyPy can bring a certain improvement, but still, it probably would be much slower than C++ or Java.
如果您使用 While 循环 而不是 For 循环,执行速度会快得多(在 Python 3 中测试)。它的运行速度与执行相同操作的已编译 C 程序一样快。
尝试以下示例(MIPS 计算仅供参考,因为没有考虑处理器的架构等):
Python 3 程序
C 程序
If you use While Loops instead of For Loops the execution will be much much faster (tested in Python 3). It will run as faster as a compiled C program that do the same thing.
Try the following examples (MIPS calculation is only indicative because does not consider the processor's architecture etc. etc.):
Python 3 Program
C Program