python2和python3之间的执行时间略有不同

发布于 2024-10-09 23:47:37 字数 1141 浏览 6 评论 0原文

最后，我用 python 编写了一个简单的排列生成器（实现 Knuth 在“The Art...4”中描述的“plainchanges”算法）。我很好奇 python2 和 python3 之间执行时间的差异。这是我的函数：

def perms(s):
    s = tuple(s)
    N = len(s)
    if N <= 1:
        yield s[:]
        raise StopIteration()
    for x in perms(s[1:]):
        for i in range(0,N):
        yield x[:i] + (s[0],) + x[i:]

在 python3 版本中，我只是将 print x 更改为 print(x)，因为 print 是 py3 中的函数。我使用 timeit 模块测试了两者。

我的测试：

$ echo "python2.6:" && ./testing.py && echo "python3:" && ./testing3.py

python2.6:

args time[ms]
1 0.003811
2 0.008268
3 0.015907
4 0.042646
5 0.166755
6 0.908796
7 6.117996
8 48.346996
9 433.928967
10 4379.904032

python3：

args time[ms]
1   0.00246778964996
2   0.00656183719635
3   0.01419159912
4   0.0406293644678
5   0.165960511097
6   0.923101452814
7   6.24257639835
8   53.0099868774
9   454.540967941
10  4585.83498001

如您所见，对于参数数量小于 6 的情况，python 3 更快，但角色颠倒了，python2.6 做得更好。由于我是Python编程新手，我想知道为什么会这样？或者也许我的脚本针对 python2 进行了更优化？

预先感谢您的热情答复:)

原文

Lastly I wrote a simple generator of permutations in python (implementation of "plain changes" algorithm described by Knuth in "The Art... 4").
I was curious about the differences in execution time of it between python2 and python3.
Here is my function:

def perms(s):
    s = tuple(s)
    N = len(s)
    if N <= 1:
        yield s[:]
        raise StopIteration()
    for x in perms(s[1:]):
        for i in range(0,N):
        yield x[:i] + (s[0],) + x[i:]

In the python3 version I just changed print x to print(x) as print is a function in py3.
I tested both using timeit module.

My tests:

$ echo "python2.6:" && ./testing.py && echo "python3:" && ./testing3.py

python2.6:

args time[ms]
1 0.003811
2 0.008268
3 0.015907
4 0.042646
5 0.166755
6 0.908796
7 6.117996
8 48.346996
9 433.928967
10 4379.904032

python3:

args time[ms]
1   0.00246778964996
2   0.00656183719635
3   0.01419159912
4   0.0406293644678
5   0.165960511097
6   0.923101452814
7   6.24257639835
8   53.0099868774
9   454.540967941
10  4585.83498001

As you can see, for number of arguments less than 6, python 3 is faster, but then roles are reversed and python2.6 does better.
As I am a novice in python programming, I wonder why is that so? Or maybe my script is more optimized for python2?

Thank you in advance for kind answer :)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

我还不会笑 2024-10-16 23:47:37

引用：

3.0的最终结果
概括来说就是Python 3.0
运行 pystone 基准测试大约 10%
比 Python 2.5 慢。最有可能的
最大的原因是删除
小整数的特殊情况。
有改进的空间，但是
3.0发布后将会发生！

>>> 4585.83498001/4379.904032
1.0470172283468882

所以你会看到大约 5% 的减速。引用的文字称预计增速将放缓 10%。所以我认为这是合理的放缓。

然而，正如此处和此处。因此，如果您担心 5% 的速度下降，请尝试 3.1 或 3.2。

Quoting:

The net result of the 3.0
generalizations is that Python 3.0
runs the pystone benchmark around 10%
slower than Python 2.5. Most likely
the biggest cause is the removal of
special-casing for small integers.
There’s room for improvement, but it
will happen after 3.0 is released!

>>> 4585.83498001/4379.904032
1.0470172283468882

So you're seeing about a 5% slowdown. The quoted text says to expect a 10% slowdown. So I'd be accept that as a reasonable slowdown.

However, is has been improving as can be seen here and here. So give 3.1 or 3.2 a try if you're concerned about the 5% slowdown.

回复收藏 0 原文

小…楫夜泊 2024-10-16 23:47:37

这其实是一个非常有趣的问题。

我使用了在 Python 2.6、2.7、3.0、3.1 和 3.2 上运行的以下脚本。

from __future__ import print_function
from timeit import Timer
from math import factorial

try:
    range = xrange
except:
    pass

def perms(s):
    s = tuple(s)
    N = len(s)
    if N <= 1:
        yield s[:]
        raise StopIteration()
    for x in perms(s[1:]):
        for i in range(0,N):
            yield x[:i] + (s[0],) + x[i:]

def testcase(s):
    for x in perms(s):
        pass

def test():
    for i in range(1,11):
        s = "".join(["%d" % x for x in range(i)])
        s = "testcase(\"%s\")" % s
        t = Timer(s,"from __main__ import testcase")
        factor = 100000
        factor = int(factor/factorial(i))
        factor = (factor>0) and factor or 1
        yield (i,(1000*min(t.repeat(5,factor))/factor))

if __name__=="__main__":
    print("args\ttime[ms]")
    for x in test():
        print("%i\t%f" % x)

平台是Ubuntu 10.10，64位，所有版本的Python都是从源代码编译的。我得到以下结果：

case@quad:~$ py27 perms.py
args    time[ms]
1   0.002221
2   0.005072
3   0.010352
4   0.027648
5   0.111339
6   0.618658
7   4.207046
8   33.213019
9   294.044971
10  2976.780891

case@quad:~$ py32 perms.py
args    time[ms]
1   0.001725
2   0.004997
3   0.011208
4   0.032815
5   0.139474
6   0.761153
7   5.068729
8   39.760470
9   356.358051
10  3566.874027

经过更多实验，我跟踪了片段的性能差异： x[:i] + (s[0],) + x[i:] 如果我只是在循环开始时计算一个元组并为每个yield 语句返回它，两个版本的Python 都以相同的速度运行。（排列是错误的，但这不是重点。）

如果我单独为该片段计时，它会明显变慢。

case@quad:~$ py27 -m timeit -s "s=(1,2,3,4,5);x=(1,2,3,4,5,6,7,8)" "x[:3] + (s[0],) + x[3:]"
1000000 loops, best of 3: 0.549 usec per loop
case@quad:~$ py32 -m timeit -s "s=(1,2,3,4,5);x=(1,2,3,4,5,6,7,8)" "x[:3] + (s[0],) + x[3:]"
1000000 loops, best of 3: 0.687 usec per loop

接下来我使用 dis.dis() 来查看两个版本生成的字节码。

case@quad:~/src/Python-3.0.1$ py32
Python 3.2b2 (r32b2:87398, Dec 21 2010, 21:39:59) 
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import dis
>>> s=(1,2,3,4,5)
>>> x=(1,2,3,4,5,6,7,8)
>>> def f(s,x):
...   return x[:3] + (s[0],) + x[3:]
... 
>>> dis.dis(f)
  2           0 LOAD_FAST                1 (x) 
              3 LOAD_CONST               0 (None) 
              6 LOAD_CONST               1 (3) 
              9 BUILD_SLICE              2 
             12 BINARY_SUBSCR        
             13 LOAD_FAST                0 (s) 
             16 LOAD_CONST               2 (0) 
             19 BINARY_SUBSCR        
             20 BUILD_TUPLE              1 
             23 BINARY_ADD           
             24 LOAD_FAST                1 (x) 
             27 LOAD_CONST               1 (3) 
             30 LOAD_CONST               0 (None) 
             33 BUILD_SLICE              2 
             36 BINARY_SUBSCR        
             37 BINARY_ADD           
             38 RETURN_VALUE         
>>> exit()
case@quad:~/src/Python-3.0.1$ py26
Python 2.6.6 (r266:84292, Oct 24 2010, 15:27:46) 
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import dis
>>> s=(1,2,3,4,5)
>>> x=(1,2,3,4,5,6,7,8)
>>> def f(s,x):
...   return x[:3] + (s[0],) + x[3:]
... 
>>> dis.dis(f)
  2           0 LOAD_FAST                1 (x)
              3 LOAD_CONST               1 (3)
              6 SLICE+2             
              7 LOAD_FAST                0 (s)
             10 LOAD_CONST               2 (0)
             13 BINARY_SUBSCR       
             14 BUILD_TUPLE              1
             17 BINARY_ADD          
             18 LOAD_FAST                1 (x)
             21 LOAD_CONST               1 (3)
             24 SLICE+1             
             25 BINARY_ADD          
             26 RETURN_VALUE        
>>>

两个版本生成的字节码有很大不同。不幸的是，我不知道为什么字节码不同，所以我真的没有回答这个问题。但切片和构建元组的性能确实存在显着差异。

This is actually a very interesting question.

I used the following script which runs on Python 2.6, 2.7, 3.0, 3.1, and 3.2.

from __future__ import print_function
from timeit import Timer
from math import factorial

try:
    range = xrange
except:
    pass

def perms(s):
    s = tuple(s)
    N = len(s)
    if N <= 1:
        yield s[:]
        raise StopIteration()
    for x in perms(s[1:]):
        for i in range(0,N):
            yield x[:i] + (s[0],) + x[i:]

def testcase(s):
    for x in perms(s):
        pass

def test():
    for i in range(1,11):
        s = "".join(["%d" % x for x in range(i)])
        s = "testcase(\"%s\")" % s
        t = Timer(s,"from __main__ import testcase")
        factor = 100000
        factor = int(factor/factorial(i))
        factor = (factor>0) and factor or 1
        yield (i,(1000*min(t.repeat(5,factor))/factor))

if __name__=="__main__":
    print("args\ttime[ms]")
    for x in test():
        print("%i\t%f" % x)

The platform is Ubuntu 10.10, 64 bit, and all versions of Python were compiled from source. I get the following results:

case@quad:~$ py27 perms.py
args    time[ms]
1   0.002221
2   0.005072
3   0.010352
4   0.027648
5   0.111339
6   0.618658
7   4.207046
8   33.213019
9   294.044971
10  2976.780891

case@quad:~$ py32 perms.py
args    time[ms]
1   0.001725
2   0.004997
3   0.011208
4   0.032815
5   0.139474
6   0.761153
7   5.068729
8   39.760470
9   356.358051
10  3566.874027

After some more experimentation, I tracked the difference in performance to the fragment: x[:i] + (s[0],) + x[i:] If I just calculate one tuple at the beginning of the loop and return it for every yield statement, both versions of Python run at the same speed. (And the permutations are wrong, but that's not the point.)

If I time that fragment by itself, it is significantly slower.

case@quad:~$ py27 -m timeit -s "s=(1,2,3,4,5);x=(1,2,3,4,5,6,7,8)" "x[:3] + (s[0],) + x[3:]"
1000000 loops, best of 3: 0.549 usec per loop
case@quad:~$ py32 -m timeit -s "s=(1,2,3,4,5);x=(1,2,3,4,5,6,7,8)" "x[:3] + (s[0],) + x[3:]"
1000000 loops, best of 3: 0.687 usec per loop

I next used dis.dis() to look at the bytecode generated by both versions.

case@quad:~/src/Python-3.0.1$ py32
Python 3.2b2 (r32b2:87398, Dec 21 2010, 21:39:59) 
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import dis
>>> s=(1,2,3,4,5)
>>> x=(1,2,3,4,5,6,7,8)
>>> def f(s,x):
...   return x[:3] + (s[0],) + x[3:]
... 
>>> dis.dis(f)
  2           0 LOAD_FAST                1 (x) 
              3 LOAD_CONST               0 (None) 
              6 LOAD_CONST               1 (3) 
              9 BUILD_SLICE              2 
             12 BINARY_SUBSCR        
             13 LOAD_FAST                0 (s) 
             16 LOAD_CONST               2 (0) 
             19 BINARY_SUBSCR        
             20 BUILD_TUPLE              1 
             23 BINARY_ADD           
             24 LOAD_FAST                1 (x) 
             27 LOAD_CONST               1 (3) 
             30 LOAD_CONST               0 (None) 
             33 BUILD_SLICE              2 
             36 BINARY_SUBSCR        
             37 BINARY_ADD           
             38 RETURN_VALUE         
>>> exit()
case@quad:~/src/Python-3.0.1$ py26
Python 2.6.6 (r266:84292, Oct 24 2010, 15:27:46) 
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import dis
>>> s=(1,2,3,4,5)
>>> x=(1,2,3,4,5,6,7,8)
>>> def f(s,x):
...   return x[:3] + (s[0],) + x[3:]
... 
>>> dis.dis(f)
  2           0 LOAD_FAST                1 (x)
              3 LOAD_CONST               1 (3)
              6 SLICE+2             
              7 LOAD_FAST                0 (s)
             10 LOAD_CONST               2 (0)
             13 BINARY_SUBSCR       
             14 BUILD_TUPLE              1
             17 BINARY_ADD          
             18 LOAD_FAST                1 (x)
             21 LOAD_CONST               1 (3)
             24 SLICE+1             
             25 BINARY_ADD          
             26 RETURN_VALUE        
>>>

The generated bytecode is very different between the two versions. Unfortunately, I don't know why the bytecode is different so I really haven't answered the question. But there really is a significant difference in performance for slicing and building tuples.

回复收藏 0 原文