import 语句应该始终位于模块的顶部吗？

发布于 2024-07-06 13:40:48 字数 565 浏览 15 评论 0原文

PEP 8 指出：

导入始终放在文件的顶部，紧接在任何模块注释和文档字符串之后，以及模块全局变量和常量之前。

但是，如果我导入的类/方法/函数仅在极少数情况下使用，那么在需要时进行导入肯定会更有效吗？

这不是

class SomeClass(object):

    def not_often_called(self)
        from datetime import datetime
        self.datetime = datetime.now()

比这更有效率吗？

from datetime import datetime

class SomeClass(object):

    def not_often_called(self)
        self.datetime = datetime.now()

原文

PEP 8 states:

Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants.

However if the class/method/function that I am importing is only used in rare cases, surely it is more efficient to do the import when it is needed?

Isn't this:

class SomeClass(object):

    def not_often_called(self)
        from datetime import datetime
        self.datetime = datetime.now()

more efficient than this?

from datetime import datetime

class SomeClass(object):

    def not_often_called(self)
        self.datetime = datetime.now()

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

┾廆蒐ゝ 2024-07-13 13:40:48

模块导入速度相当快，但不是即时的。这意味着：

将导入放在模块的顶部是可以的，因为这是一个微不足道的成本，只需支付一次。
将导入放入函数中将导致对该函数的调用花费更长的时间。

因此，如果您关心效率，请将进口放在首位。仅当您的分析表明有帮助时，才将它们移至函数中（您确实分析了哪些地方最能提高性能，对吗？？）

我见过执行延迟导入的最佳原因是：

可选图书馆支持。如果您的代码有多个使用不同库的路径，如果未安装可选库，也不会中断。
在插件的 __init__.py 中，可能会导入但未实际使用。例如 Bazaar 插件，它使用 bzrlib 的延迟加载框架。

回复收藏 0 原文

安人多梦 2024-07-13 13:40:48

将 import 语句放在函数内部可以防止循环依赖。
例如，如果您有 2 个模块，X.py 和 Y.py，并且它们都需要相互导入，那么当您导入其中一个模块时，这将导致循环依赖，从而导致无限循环。如果您将 import 语句移到其中一个模块中，那么在调用该函数之前，它不会尝试导入另一个模块，并且该模块已经被导入，因此不会出现无限循环。阅读此处了解更多信息 - effbot.org /zone/import-confusion.htm

回复收藏 0 原文

御守 2024-07-13 13:40:48

我采用的做法是将所有导入放入使用它们的函数中，而不是放在模块的顶部。

我得到的好处是能够更可靠地进行重构。当我将某个功能从一个模块移动到另一个模块时，我知道该功能将继续使用其所有遗留的测试完好无损。如果我的导入位于模块的顶部，那么当我移动函数时，我发现我最终会花费大量时间来完成新模块的导入并使其最小化。重构 IDE 可能会使这变得无关紧要。

正如其他地方提到的，存在速度损失。我在我的应用程序中对此进行了测量，发现它对于我的目的来说微不足道。

能够预先查看所有模块依赖项而无需借助搜索（例如 grep）也很不错。然而，我关心模块依赖关系的原因通常是因为我正在安装、重构或移动包含多个文件的整个系统，而不仅仅是单个模块。在这种情况下，我无论如何都会执行全局搜索，以确保我具有系统级依赖项。所以我还没有找到全局导入来帮助我在实践中理解系统。

我通常将 sys 的导入放在 if __name__=='__main__' 检查中，然后传递参数（如 sys.argv[1:]) 到 main() 函数。这允许我在尚未导入 sys 的上下文中使用 main。

回复收藏 0 原文

眼中杀气 2024-07-13 13:40:48

大多数时候，这对于清晰和明智的做法很有用，但情况并非总是如此。下面是模块导入可能位于其他地方的一些情况示例。

首先，您可以拥有一个具有以下形式的单元测试的模块：

if __name__ == '__main__':
    import foo
    aa = foo.xyz()         # initiate something for the test

其次，您可能需要在运行时有条件地导入一些不同的模块。

if [condition]:
    import foo as plugin_api
else:
    import bar as plugin_api
xx = plugin_api.Plugin()
[...]

可能还有其他情况，您可能会将导入放置在代码的其他部分。

Most of the time this would be useful for clarity and sensible to do but it's not always the case. Below are a couple of examples of circumstances where module imports might live elsewhere.

Firstly, you could have a module with a unit test of the form:

if __name__ == '__main__':
    import foo
    aa = foo.xyz()         # initiate something for the test

Secondly, you might have a requirement to conditionally import some different module at runtime.

if [condition]:
    import foo as plugin_api
else:
    import bar as plugin_api
xx = plugin_api.Plugin()
[...]

There are probably other situations where you might place imports in other parts in the code.

回复收藏 0 原文

青丝拂面 2024-07-13 13:40:48

当函数被调用零次或一次时，第一个变体确实比第二个变体更有效。然而，对于第二次及后续调用，“导入每个调用”方法实际上效率较低。请参阅此链接，了解结合了两者优点的延迟加载技术通过“惰性导入”来实现。

但除了效率之外，还有其他原因让您更喜欢其中一种。一种方法是让阅读代码的人更清楚地了解该模块所具有的依赖关系。它们还具有非常不同的失败特征——如果没有“datetime”模块，第一个将在加载时失败，而第二个在调用方法之前不会失败。

添加注释：在 IronPython 中，导入可能比 CPython 中昂贵得多，因为代码基本上是在导入时进行编译的。

回复收藏 0 原文

韶华倾负 2024-07-13 13:40:48

以下是该问题答案的更新摘要
和
 相关
问题。

PEP 8
建议将导入放在顶部。
通常更方便获得
导入错误
当你第一次运行你的程序时
而不是当你的程序第一次调用你的函数时。
将导入放入函数作用域中
可以帮助避免循环导入出现问题。
将导入放入函数作用域中
帮助保持干净的模块命名空间，
这样它就不会出现在制表符补全建议中。
启动时间：
函数中的导入将不会运行，直到（如果）该函数被调用。
对于重量级库来说可能会变得很重要。
尽管 import 语句在后续运行中速度非常快，
他们仍然会受到速度损失
如果该功能很简单但经常使用，那么这可能很重要。
在 __name__ == "__main__" 保护下的导入似乎非常合理。
重构
如果导入位于函数中可能会更容易
使用它们的地方（便于将其移动到另一个模块）。
也可以说这有利于可读性。
然而，大多数人会持相反的观点，请参阅下一条。
顶部的导入增强了可读性，
因为您可以一目了然地看到所有依赖项。
似乎不清楚动态（可能条件）导入是否更喜欢一种风格而不是另一种风格。

回复收藏 0 原文

笑梦风尘 2024-07-13 13:40:48

Curt 提出了一个很好的观点：第二个版本更清晰，并且会在加载时而不是稍后意外地失败。

通常我不担心加载模块的效率，因为它（a）相当快，并且（b）大多数只在启动时发生。

如果您必须在意外的时间加载重量级模块，那么使用 __import__ 函数动态加载它们可能更有意义，并且确保捕获ImportError 异常，并以合理的方式处理它们。

回复收藏 0 原文

爱人如己 2024-07-13 13:40:48

我很惊讶没有看到已经发布的重复负载检查的实际成本数字，尽管有很多关于预期结果的很好的解释。

如果您在顶部导入，则无论如何都会承受负载。这是相当小的，但通常以毫秒为单位，而不是纳秒。

如果您在函数内导入，则仅在首次调用其中一个函数时如果和当时加载。正如许多人指出的那样，如果这种情况根本没有发生，您就可以节省加载时间。但是，如果函数被多次调用，您会受到重复但小得多的点击（用于检查它是否已加载；而不是用于实际重新加载）。另一方面，正如 @aaronasterling 指出的那样，您还可以节省一点，因为在函数内导入可以让函数使用稍微更快的局部变量查找来稍后识别名称（http://stackoverflow.com/questions/477096/python-import-coding-style/4789963#4789963 ）。

以下是从函数内部导入一些内容的简单测试的结果。报告的时间（在 2.3 GHz Intel Core i7 上的 Python 2.7.14 中）如下所示（第二个调用比后面的调用占用的时间似乎一致，但我不知道为什么）。

 0 foo:   14429.0924 µs
 1 foo:      63.8962 µs
 2 foo:      10.0136 µs
 3 foo:       7.1526 µs
 4 foo:       7.8678 µs
 0 bar:       9.0599 µs
 1 bar:       6.9141 µs
 2 bar:       7.1526 µs
 3 bar:       7.8678 µs
 4 bar:       7.1526 µs

代码：

from __future__ import print_function
from time import time

def foo():
    import collections
    import re
    import string
    import math
    import subprocess
    return

def bar():
    import collections
    import re
    import string
    import math
    import subprocess
    return

t0 = time()
for i in xrange(5):
    foo()
    t1 = time()
    print("    %2d foo: %12.4f \xC2\xB5s" % (i, (t1-t0)*1E6))
    t0 = t1
for i in xrange(5):
    bar()
    t1 = time()
    print("    %2d bar: %12.4f \xC2\xB5s" % (i, (t1-t0)*1E6))
    t0 = t1

I was surprised not to see actual cost numbers for the repeated load-checks posted already, although there are many good explanations of what to expect.

If you import at the top, you take the load hit no matter what. That's pretty small, but commonly in the milliseconds, not nanoseconds.

If you import within a function(s), then you only take the hit for loading if and when one of those functions is first called. As many have pointed out, if that doesn't happen at all, you save the load time. But if the function(s) get called a lot, you take a repeated though much smaller hit (for checking that it has been loaded; not for actually re-loading). On the other hand, as @aaronasterling pointed out you also save a little because importing within a function lets the function use slightly-faster local variable lookups to identify the name later (http://stackoverflow.com/questions/477096/python-import-coding-style/4789963#4789963).

Here are the results of a simple test that imports a few things from inside a function. The times reported (in Python 2.7.14 on a 2.3 GHz Intel Core i7) are shown below (the 2nd call taking more than later calls seems consistent, though I don't know why).

 0 foo:   14429.0924 µs
 1 foo:      63.8962 µs
 2 foo:      10.0136 µs
 3 foo:       7.1526 µs
 4 foo:       7.8678 µs
 0 bar:       9.0599 µs
 1 bar:       6.9141 µs
 2 bar:       7.1526 µs
 3 bar:       7.8678 µs
 4 bar:       7.1526 µs

The code:

from __future__ import print_function
from time import time

def foo():
    import collections
    import re
    import string
    import math
    import subprocess
    return

def bar():
    import collections
    import re
    import string
    import math
    import subprocess
    return

t0 = time()
for i in xrange(5):
    foo()
    t1 = time()
    print("    %2d foo: %12.4f \xC2\xB5s" % (i, (t1-t0)*1E6))
    t0 = t1
for i in xrange(5):
    bar()
    t1 = time()
    print("    %2d bar: %12.4f \xC2\xB5s" % (i, (t1-t0)*1E6))
    t0 = t1

回复收藏 0 原文

迟到的我 2024-07-13 13:40:48

我不会过多担心预先加载模块的效率。模块占用的内存不会很大（假设它足够模块化）并且启动成本可以忽略不计。

在大多数情况下，您希望加载源文件顶部的模块。对于阅读你的代码的人来说，它可以更容易地判断哪个函数或对象来自哪个模块。

在代码中的其他位置导入模块的一个很好的理由是它是否在调试语句中使用。

例如：

do_something_with_x(x)

我可以这样调试：

from pprint import pprint
pprint(x)
do_something_with_x(x)

当然，在代码中其他地方导入模块的另一个原因是如果您需要动态导入它们。这是因为你几乎别无选择。

我不会过多担心预先加载模块的效率。模块占用的内存不会很大（假设它足够模块化）并且启动成本可以忽略不计。

I wouldn't worry about the efficiency of loading the module up front too much. The memory taken up by the module won't be very big (assuming it's modular enough) and the startup cost will be negligible.

In most cases you want to load the modules at the top of the source file. For somebody reading your code, it makes it much easier to tell what function or object came from what module.

One good reason to import a module elsewhere in the code is if it's used in a debugging statement.

For example:

do_something_with_x(x)

I could debug this with:

from pprint import pprint
pprint(x)
do_something_with_x(x)

Of course, the other reason to import modules elsewhere in the code is if you need to dynamically import them. This is because you pretty much don't have any choice.

回复收藏 0 原文

萤火眠眠 2024-07-13 13:40:48

这是一个只有程序员才能决定的权衡。

情况 1 通过在需要时不导入 datetime 模块（并执行它可能需要的任何初始化）来节省一些内存和启动时间。请注意，“仅在调用时”执行导入也意味着“每次调用时”执行导入，因此第一个调用之后的每个调用仍然会产生执行导入的额外开销。

情况 2 通过预先导入日期时间来节省一些执行时间和延迟，以便 not_often_used() 在调用时更快地返回，并且不会在每次调用时产生导入开销。

除了效率之外，如果导入语句是......在前面，则更容易预先看到模块依赖关系。将它们隐藏在代码中可能会导致更难以轻松找到某些模块所依赖的模块。

就我个人而言，我通常遵循 PEP，除了单元测试之类的东西，这样我不想总是加载，因为我知道除了测试代码之外它们不会被使用。

回复收藏 0 原文

花辞树 2024-07-13 13:40:48

这是一个示例，其中所有导入都位于最顶部（这是我唯一一次需要这样做）。我希望能够终止 Un*x 和 Windows 上的子进程。

import os
# ...
try:
    kill = os.kill  # will raise AttributeError on Windows
    from signal import SIGTERM
    def terminate(process):
        kill(process.pid, SIGTERM)
except (AttributeError, ImportError):
    try:
        from win32api import TerminateProcess  # use win32api if available
        def terminate(process):
            TerminateProcess(int(process._handle), -1)
    except ImportError:
        def terminate(process):
            raise NotImplementedError  # define a dummy function

（回顾：约翰米利金说。）

Here's an example where all the imports are at the very top (this is the only time I've needed to do this). I want to be able to terminate a subprocess on both Un*x and Windows.

import os
# ...
try:
    kill = os.kill  # will raise AttributeError on Windows
    from signal import SIGTERM
    def terminate(process):
        kill(process.pid, SIGTERM)
except (AttributeError, ImportError):
    try:
        from win32api import TerminateProcess  # use win32api if available
        def terminate(process):
            TerminateProcess(int(process._handle), -1)
    except ImportError:
        def terminate(process):
            raise NotImplementedError  # define a dummy function

(On review: what John Millikin said.)

回复收藏 0 原文

梦太阳 2024-07-13 13:40:48

这就像许多其他优化一样——为了速度而牺牲了一些可读性。正如约翰提到的，如果您已经完成了分析作业并发现这是一个非常有用的更改并且您需要额外的速度，那么就去做吧。与所有其他导入一起添加注释可能会很好：

from foo import bar
from baz import qux
# Note: datetime is imported in SomeClass below

This is like many other optimizations - you sacrifice some readability for speed. As John mentioned, if you've done your profiling homework and found this to be a significantly useful enough change and you need the extra speed, then go for it. It'd probably be good to put a note up with all the other imports:

from foo import bar
from baz import qux
# Note: datetime is imported in SomeClass below

回复收藏 0 原文

紫轩蝶泪 2024-07-13 13:40:48

模块初始化仅发生一次 - 第一次导入时。如果相关模块来自标准库，那么您也可能会从程序中的其他模块导入它。对于像 datetime 这样流行的模块来说，它也可能是许多其他标准库的依赖项。由于模块初始化已经发生，因此导入语句的成本非常低。此时它所做的就是将现有模块对象绑定到本地范围。

将这些信息与可读性参数结合起来，我想说最好在模块范围内使用 import 语句。

回复收藏 0 原文

野の 2024-07-13 13:40:48

只是为了完成 Moe 的回答和原始问题：

当我们必须处理循环依赖时，我们可以做一些“假设我们正在使用包含 x() 和 b < 的模块 a.py 和 b.py分别为 code>y()。然后：

我们可以将其中一个 from import 移动到模块底部。
我们可以将其中一个 from import 移动到实际需要导入的函数或方法中（这并不总是可行，因为您可以从多个位置使用它）。
我们可以将两个 from import 之一更改为如下所示的导入： import a

所以，总结一下。如果您不处理循环依赖项并采取某种技巧来避免它们，那么最好将所有导入放在顶部，因为这个问题的其他答案中已经解释了原因。请在执行此“技巧”时发表评论，我们总是欢迎的！ :)

回复收藏 0 原文

￠好甜 2024-07-13 13:40:48

除了已经给出的出色答案之外，值得注意的是导入的放置不仅仅是风格问题。有时，模块具有需要首先导入或初始化的隐式依赖项，并且顶级导入可能会导致违反所需的执行顺序。

此问题经常出现在 Apache Spark 的 Python API 中，您需要在导入任何 pyspark 包或模块之前初始化 SparkContext。最好将 pyspark 导入放置在保证 SparkContext 可用的范围内。

回复收藏 0 原文

深爱不及久伴 2024-07-13 13:40:48

我不想提供完整的答案，因为其他人已经做得很好了。当我发现在函数内部导入模块特别有用时，我只想提一个用例。我的应用程序使用存储在特定位置的 python 包和模块作为插件。在应用程序启动期间，应用程序会遍历该位置中的所有模块并导入它们，然后它会查看模块内部，如果找到插件的一些安装点（在我的例子中，它是某个基类的子类，具有唯一的ID）它注册它们。插件的数量很大（现在有几十个，但将来可能有数百个），而且每个插件都很少使用。在应用程序启动期间，在我的插件模块顶部导入第三方库会造成一些损失。特别是一些第三方库的导入很繁重（例如，plotly 的导入甚至尝试连接到互联网并下载一些东西，这会增加大约一秒钟的启动时间）。通过优化插件中的导入（仅在使用它们的函数中调用它们），我设法将启动时间从 10 秒缩短到大约 2 秒。这对我的用户来说是一个很大的区别。

所以我的答案是否定的，不要总是将导入放在模块的顶部。

回复收藏 0 原文

蒲公英的约定 2024-07-13 13:40:48

有趣的是，到目前为止，没有一个答案提到并行处理，当序列化函数代码被推送到其他核心时，可能需要导入在函数中，例如在 ipyparallel 的情况下。

回复收藏 0 原文

我三岁 2024-07-13 13:40:48

通过在函数内部导入变量/局部作用域可以提高性能。这取决于函数内导入的东西的用法。如果您多次循环并访问模块全局对象，将其导入为本地对象会有所帮助。

test.py

X=10
Y=11
Z=12
def add(i):
  i = i + 10

runlocal.py

from test import add, X, Y, Z

    def callme():
      x=X
      y=Y
      z=Z
      ladd=add 
      for i  in range(100000000):
        ladd(i)
        x+y+z

    callme()

run.py

from test import add, X, Y, Z

def callme():
  for i in range(100000000):
    add(i)
    X+Y+Z

callme()

Linux 上的时间显示了一个小增益，

/usr/bin/time -f "\t%E real,\t%U user,\t%S sys" python run.py 
    0:17.80 real,   17.77 user, 0.01 sys
/tmp/test$ /usr/bin/time -f "\t%E real,\t%U user,\t%S sys" python runlocal.py 
    0:14.23 real,   14.22 user, 0.01 sys

实际是挂钟。用户是程序中的时间。 sys 是系统调用的时间。

https://docs.python.org/3.5/reference/ executionmodel.html#名称解析

There can be a performance gain by importing variables/local scoping inside of a function. This depends on the usage of the imported thing inside the function. If you are looping many times and accessing a module global object, importing it as local can help.

test.py

X=10
Y=11
Z=12
def add(i):
  i = i + 10

runlocal.py

from test import add, X, Y, Z

    def callme():
      x=X
      y=Y
      z=Z
      ladd=add 
      for i  in range(100000000):
        ladd(i)
        x+y+z

    callme()

run.py

from test import add, X, Y, Z

def callme():
  for i in range(100000000):
    add(i)
    X+Y+Z

callme()

A time on Linux shows a small gain

/usr/bin/time -f "\t%E real,\t%U user,\t%S sys" python run.py 
    0:17.80 real,   17.77 user, 0.01 sys
/tmp/test$ /usr/bin/time -f "\t%E real,\t%U user,\t%S sys" python runlocal.py 
    0:14.23 real,   14.22 user, 0.01 sys

real is wall clock. user is time in program. sys is time for system calls.

https://docs.python.org/3.5/reference/executionmodel.html#resolution-of-names

回复收藏 0 原文

唐婉 2024-07-13 13:40:48

可读性

除了启动性能之外，本地化 import 语句还需要考虑可读性。例如，在我当前的第一个 python 项目中，使用 python 行号 1283 到 1296：

listdata.append(['tk font version', font_version])
listdata.append(['Gtk version', str(Gtk.get_major_version())+"."+
                 str(Gtk.get_minor_version())+"."+
                 str(Gtk.get_micro_version())])

import xml.etree.ElementTree as ET

xmltree = ET.parse('/usr/share/gnome/gnome-version.xml')
xmlroot = xmltree.getroot()
result = []
for child in xmlroot:
    result.append(child.text)
listdata.append(['Gnome version', result[0]+"."+result[1]+"."+
                 result[2]+" "+result[3]])

如果 import 语句位于文件顶部，我将不得不向上滚动很长一段距离，或者按 Home< /kbd>，找出 ET 是什么。然后我必须导航回第 1283 行才能继续阅读代码。

事实上，即使 import 语句位于函数（或类）的顶部（许多人会将其放置在函数（或类）的顶部），也需要向上和向下分页。

显示 Gnome 版本号的情况很少发生，因此文件顶部的 import 会引入不必要的启动延迟。

Readability

In addition to startup performance, there is a readability argument to be made for localizing import statements. For example take python line numbers 1283 through 1296 in my current first python project:

listdata.append(['tk font version', font_version])
listdata.append(['Gtk version', str(Gtk.get_major_version())+"."+
                 str(Gtk.get_minor_version())+"."+
                 str(Gtk.get_micro_version())])

import xml.etree.ElementTree as ET

xmltree = ET.parse('/usr/share/gnome/gnome-version.xml')
xmlroot = xmltree.getroot()
result = []
for child in xmlroot:
    result.append(child.text)
listdata.append(['Gnome version', result[0]+"."+result[1]+"."+
                 result[2]+" "+result[3]])

If the import statement was at the top of file I would have to scroll up a long way, or press Home, to find out what ET was. Then I would have to navigate back to line 1283 to continue reading code.

Indeed even if the import statement was at the top of the function (or class) as many would place it, paging up and back down would be required.

Displaying the Gnome version number will rarely be done so the import at top of file introduces unnecessary startup lag.

回复收藏 0 原文

櫻之舞 2024-07-13 13:40:48

我想提一下我的一个用例，与 @John Millikin 和 @VK 提到的用例非常相似：

可选导入

我使用 Jupyter Notebook 进行数据分析，并使用相同的 IPython Notebook 作为所有分析的模板。在某些情况下，我需要导入 Tensorflow 来进行一些快速模型运行，但有时我在 Tensorflow 未设置/导入缓慢的地方工作。在这些情况下，我将依赖于 Tensorflow 的操作封装在辅助函数中，在该函数内导入 Tensorflow，并将其绑定到按钮。

这样，我可以执行“重新启动并运行全部”，而不必等待导入，或者在失败时必须恢复其余单元。

回复收藏 0 原文