Python导入模块的优化

发布于 2024-11-05 13:36:56 字数 883 浏览 4 评论 0原文

我正在阅读 David Beazley 的 Python 参考书，他提出了一个观点：

例如，如果您正在执行很多平方根运算，它是使用“from math import sqrt”更快和 'sqrt(x)' 而不是输入 'math.sqrt(x)'。

和：

对于涉及大量使用的计算方法或模块查找，它是几乎总是更好地消除通过放置属性查找您想要执行的操作首先是局部变量。

我决定尝试一下：

first()

def first():
    from collections import defaultdict
    x = defaultdict(list)

Second()

def second():
    import collections
    x = collections.defaultdict(list)

结果是：

2.15461492538
1.39850616455

诸如此类的优化对我来说可能并不重要。但我很好奇为什么比兹利所写的相反。请注意，存在 1 秒的差异，考虑到任务很微不足道，这很重要。

为什么会发生这种情况？

更新：

我得到的时间如下：

print timeit('first()', 'from __main__ import first');
print timeit('second()', 'from __main__ import second');

原文

I am reading David Beazley's Python Reference book and he makes a point:

For example, if you were performing a
lot of square root operations, it is
faster to use 'from math import sqrt'
and 'sqrt(x)' rather than typing
'math.sqrt(x)'.

and:

For calculations involving heavy use
of methods or module lookups, it is
almost always better to eliminate the
attribute lookup by putting the
operation you want to perform into a
local variable first.

I decided to try it out:

first()

def first():
    from collections import defaultdict
    x = defaultdict(list)

second()

def second():
    import collections
    x = collections.defaultdict(list)

The results were:

2.15461492538
1.39850616455

Optimizations such as these probably don't matter to me. But I am curious as to why the opposite of what Beazley has written comes out to be true. And note that there is a difference of 1 second, which is singificant given the task is trivial.

Why is this happening?

UPDATE:

I am getting the timings like:

print timeit('first()', 'from __main__ import first');
print timeit('second()', 'from __main__ import second');

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

Spring初心 2024-11-12 13:36:56

from collections import defaultdict 和 import collections 应该位于迭代计时循环之外，因为您不会重复执行它们。

我猜想 from 语法必须比 import 语法做更多的工作。

使用此测试代码：

#!/usr/bin/env python

import timeit

from collections import defaultdict
import collections

def first():
    from collections import defaultdict
    x = defaultdict(list)

def firstwithout():
    x = defaultdict(list)

def second():
    import collections
    x = collections.defaultdict(list)

def secondwithout():
    x = collections.defaultdict(list)

print "first with import",timeit.timeit('first()', 'from __main__ import first');
print "second with import",timeit.timeit('second()', 'from __main__ import second');

print "first without import",timeit.timeit('firstwithout()', 'from __main__ import firstwithout');
print "second without import",timeit.timeit('secondwithout()', 'from __main__ import secondwithout');

我得到结果：

first with import 1.61359190941
second with import 1.02904295921
first without import 0.344709157944
second without import 0.449721097946

这显示了重复导入的成本。

The from collections import defaultdict and import collections should be outside the iterated timing loops, since you won't repeat doing them.

I guess that the from syntax has to do more work that the import syntax.

Using this test code:

#!/usr/bin/env python

import timeit

from collections import defaultdict
import collections

def first():
    from collections import defaultdict
    x = defaultdict(list)

def firstwithout():
    x = defaultdict(list)

def second():
    import collections
    x = collections.defaultdict(list)

def secondwithout():
    x = collections.defaultdict(list)

print "first with import",timeit.timeit('first()', 'from __main__ import first');
print "second with import",timeit.timeit('second()', 'from __main__ import second');

print "first without import",timeit.timeit('firstwithout()', 'from __main__ import firstwithout');
print "second without import",timeit.timeit('secondwithout()', 'from __main__ import secondwithout');

I get results:

first with import 1.61359190941
second with import 1.02904295921
first without import 0.344709157944
second without import 0.449721097946

Which shows how much the repeated imports cost.

回复收藏 0 原文

独闯女儿国 2024-11-12 13:36:56

我也会得到 first(.) 和 second(.) 之间类似的比率，唯一的区别是计时是微秒级别的。

我认为你的时间安排没有任何有用的衡量标准。尝试找出更好的测试用例！

更新：
FWIW，这里有一些测试来支持 David Beazley 的观点。

import math
from math import sqrt

def first(n= 1000):
    for k in xrange(n):
        x= math.sqrt(9)

def second(n= 1000):
    for k in xrange(n):
        x= sqrt(9)

In []: %timeit first()
1000 loops, best of 3: 266 us per loop
In [: %timeit second()
1000 loops, best of 3: 221 us per loop
In []: 266./ 221
Out[]: 1.2036199095022624

因此，first() 比 second() 慢 20% 左右。

I'll get also similar ratios between first(.) and second(.), only difference is that the timings are in microsecond level.

I don't think that your timings measure anything useful. Try to figure out better test cases!

Update:
FWIW, here is some tests to support David Beazley's point.

import math
from math import sqrt

def first(n= 1000):
    for k in xrange(n):
        x= math.sqrt(9)

def second(n= 1000):
    for k in xrange(n):
        x= sqrt(9)

In []: %timeit first()
1000 loops, best of 3: 266 us per loop
In [: %timeit second()
1000 loops, best of 3: 221 us per loop
In []: 266./ 221
Out[]: 1.2036199095022624

So first() is some 20% slower than second().

回复收藏 0 原文

最单纯的乌龟 2024-11-12 13:36:56

first() 不会保存任何内容，因为仍然必须访问模块才能导入名称。

另外，您没有给出计时方法，但给出了函数名称，似乎 first() 执行初始导入，它总是比后续导入长，因为必须编译和执行模块。

回复收藏 0 原文

Hello爱情风 2024-11-12 13:36:56

我的猜测，您的测试是有偏差的，第二个实现从第一个已经加载模块的实现中获益，或者只是从最近加载的模块中获益。

你尝试了多少次？你有没有调换顺序等等。

回复收藏 0 原文

[旋木] 2024-11-12 13:36:56

还有阅读/理解源代码的效率问题。这是一个真实的例子（代码来自 stackoverflow 问题）

原始：

import math

def midpoint(p1, p2):
   lat1, lat2 = math.radians(p1[0]), math.radians(p2[0])
   lon1, lon2 = math.radians(p1[1]), math.radians(p2[1])
   dlon = lon2 - lon1
   dx = math.cos(lat2) * math.cos(dlon)
   dy = math.cos(lat2) * math.sin(dlon)
   lat3 = math.atan2(math.sin(lat1) + math.sin(lat2), math.sqrt((math.cos(lat1) + dx) * (math.cos(lat1) + dx) + dy * dy))
   lon3 = lon1 + math.atan2(dy, math.cos(lat1) + dx)
   return(math.degrees(lat3), math.degrees(lon3))

替代：

from math import radians, degrees, sin, cos, atan2, sqrt

def midpoint(p1, p2):
   lat1, lat2 = radians(p1[0]), radians(p2[0])
   lon1, lon2 = radians(p1[1]), radians(p2[1])
   dlon = lon2 - lon1
   dx = cos(lat2) * cos(dlon)
   dy = cos(lat2) * sin(dlon)
   lat3 = atan2(sin(lat1) + sin(lat2), sqrt((cos(lat1) + dx) * (cos(lat1) + dx) + dy * dy))
   lon3 = lon1 + atan2(dy, cos(lat1) + dx)
   return(degrees(lat3), degrees(lon3))

There is also the question of efficiency of reading/understanding the source code. Here's a real live example (code from a stackoverflow question)

Original:

import math

def midpoint(p1, p2):
   lat1, lat2 = math.radians(p1[0]), math.radians(p2[0])
   lon1, lon2 = math.radians(p1[1]), math.radians(p2[1])
   dlon = lon2 - lon1
   dx = math.cos(lat2) * math.cos(dlon)
   dy = math.cos(lat2) * math.sin(dlon)
   lat3 = math.atan2(math.sin(lat1) + math.sin(lat2), math.sqrt((math.cos(lat1) + dx) * (math.cos(lat1) + dx) + dy * dy))
   lon3 = lon1 + math.atan2(dy, math.cos(lat1) + dx)
   return(math.degrees(lat3), math.degrees(lon3))

Alternative:

from math import radians, degrees, sin, cos, atan2, sqrt

def midpoint(p1, p2):
   lat1, lat2 = radians(p1[0]), radians(p2[0])
   lon1, lon2 = radians(p1[1]), radians(p2[1])
   dlon = lon2 - lon1
   dx = cos(lat2) * cos(dlon)
   dy = cos(lat2) * sin(dlon)
   lat3 = atan2(sin(lat1) + sin(lat2), sqrt((cos(lat1) + dx) * (cos(lat1) + dx) + dy * dy))
   lon3 = lon1 + atan2(dy, cos(lat1) + dx)
   return(degrees(lat3), degrees(lon3))

回复收藏 0 原文

雪化雨蝶 2024-11-12 13:36:56

像往常一样编写代码，导入模块并将其模块和常量作为 module.attribute 引用。然后，在函数前面加上用于绑定常量的装饰器或使用下面的 bind_all_modules 函数绑定程序中的所有模块：

def bind_all_modules():
    from sys import modules
    from types import ModuleType
    for name, module in modules.iteritems():
        if isinstance(module, ModuleType):
            bind_all(module)

def bind_all(mc, builtin_only=False, stoplist=[],  verbose=False):
    """Recursively apply constant binding to functions in a module or class.

    Use as the last line of the module (after everything is defined, but
    before test code).  In modules that need modifiable globals, set
    builtin_only to True.

    """
    try:
        d = vars(mc)
    except TypeError:
        return
    for k, v in d.items():
        if type(v) is FunctionType:
            newv = _make_constants(v, builtin_only, stoplist,  verbose)
            try: setattr(mc, k, newv)
            except AttributeError: pass
        elif type(v) in (type, ClassType):
            bind_all(v, builtin_only, stoplist, verbose)

Write your code as usual, importing a module and referencing its modules and constants as module.attribute. Then, either prefix your functions with the decorator for binding constants or bind all modules in your program by using the bind_all_modules function below:

def bind_all_modules():
    from sys import modules
    from types import ModuleType
    for name, module in modules.iteritems():
        if isinstance(module, ModuleType):
            bind_all(module)

def bind_all(mc, builtin_only=False, stoplist=[],  verbose=False):
    """Recursively apply constant binding to functions in a module or class.

    Use as the last line of the module (after everything is defined, but
    before test code).  In modules that need modifiable globals, set
    builtin_only to True.

    """
    try:
        d = vars(mc)
    except TypeError:
        return
    for k, v in d.items():
        if type(v) is FunctionType:
            newv = _make_constants(v, builtin_only, stoplist,  verbose)
            try: setattr(mc, k, newv)
            except AttributeError: pass
        elif type(v) in (type, ClassType):
            bind_all(v, builtin_only, stoplist, verbose)

回复收藏 0 原文

~没有更多了~