Python:为什么 * 和 ** 比 / 和 sqrt() 更快?
在优化我的代码时,我意识到以下几点:
>>> from timeit import Timer as T
>>> T(lambda : 1234567890 / 4.0).repeat()
[0.22256922721862793, 0.20560789108276367, 0.20530295372009277]
>>> from __future__ import division
>>> T(lambda : 1234567890 / 4).repeat()
[0.14969301223754883, 0.14155197143554688, 0.14141488075256348]
>>> T(lambda : 1234567890 * 0.25).repeat()
[0.13619112968444824, 0.1281130313873291, 0.12830305099487305]
而且:
>>> from math import sqrt
>>> T(lambda : sqrt(1234567890)).repeat()
[0.2597470283508301, 0.2498021125793457, 0.24994492530822754]
>>> T(lambda : 1234567890 ** 0.5).repeat()
[0.15409398078918457, 0.14059877395629883, 0.14049601554870605]
我认为这与 python 在 C 中实现的方式有关,但我想知道是否有人愿意解释为什么会这样?
While optimising my code I realised the following:
>>> from timeit import Timer as T
>>> T(lambda : 1234567890 / 4.0).repeat()
[0.22256922721862793, 0.20560789108276367, 0.20530295372009277]
>>> from __future__ import division
>>> T(lambda : 1234567890 / 4).repeat()
[0.14969301223754883, 0.14155197143554688, 0.14141488075256348]
>>> T(lambda : 1234567890 * 0.25).repeat()
[0.13619112968444824, 0.1281130313873291, 0.12830305099487305]
and also:
>>> from math import sqrt
>>> T(lambda : sqrt(1234567890)).repeat()
[0.2597470283508301, 0.2498021125793457, 0.24994492530822754]
>>> T(lambda : 1234567890 ** 0.5).repeat()
[0.15409398078918457, 0.14059877395629883, 0.14049601554870605]
I assume it has to do with the way python is implemented in C, but I wonder if anybody would care to explain why is so?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
结果的(有点出乎意料)的原因是Python似乎折叠涉及浮点乘法和求幂的常量表达式,但不折叠涉及除法的常量表达式。 math.sqrt() 是一个完全不同的野兽,因为它没有字节码并且涉及函数调用。
在 Python 2.6.5 上,以下代码:
编译为以下字节码:
如您所见,乘法和求幂根本不需要时间,因为它们是在编译代码时完成的。除法需要更长的时间,因为它发生在运行时。平方根不仅是四个运算中计算成本最高的运算,而且还会产生其他运算所没有的各种开销(属性查找、函数调用等)。
如果消除常量折叠的影响,则几乎不需要将乘法和除法分开:
math.sqrt(x)
实际上比x ** 0.5
快一点,大概是这样因为它是后者的特殊情况,因此可以更有效地完成,尽管有开销:编辑 2011-11-16: 常量表达式折叠是由 Python 的窥孔优化器完成的。源代码 (
peephole.c
) 包含以下注释,解释了为什么不折叠常量除法:-Qnew
标志启用 PEP 238。The (somewhat unexpected) reason for your results is that Python seems to fold constant expressions involving floating-point multiplication and exponentiation, but not division.
math.sqrt()
is a different beast altogether since there's no bytecode for it and it involves a function call.On Python 2.6.5, the following code:
compiles to the following bytecodes:
As you can see, multiplication and exponentiation take no time at all since they're done when the code is compiled. Division takes longer since it happens at runtime. Square root is not only the most computationally expensive operation of the four, it also incurs various overheads that the others do not (attribute lookup, function call etc).
If you eliminate the effect of constant folding, there's little to separate multiplication and division:
math.sqrt(x)
is actually a little bit faster thanx ** 0.5
, presumably because it's a special case of the latter and can therefore be done more efficiently, in spite of the overheads:edit 2011-11-16: Constant expression folding is done by Python's peephole optimizer. The source code (
peephole.c
) contains the following comment that explains why constant division isn't folded:The
-Qnew
flag enables "true division" defined in PEP 238.