Python 中的本地导入语句

发布于 2024-08-11 06:07:55 字数 326 浏览 5 评论 0原文

我认为将 import 语句放置在靠近使用它的片段的位置，可以使其依赖关系更加清晰，从而提高可读性。 Python 会缓存这个吗？我应该关心吗？这是一个坏主意吗？

def Process():
    import StringIO
    file_handle=StringIO.StringIO('hello world')
    #do more stuff

for i in xrange(10): Process()

更多的理由是：它适用于使用库的神秘部分的方法，但是当我将该方法重构到另一个文件中时，我没有意识到我错过了外部依赖项，直到出现运行时错误。

原文

I think putting the import statement as close to the fragment that uses it helps readability by making its dependencies more clear. Will Python cache this? Should I care? Is this a bad idea?

def Process():
    import StringIO
    file_handle=StringIO.StringIO('hello world')
    #do more stuff

for i in xrange(10): Process()

A little more justification: it's for methods which use arcane bits of the library, but when I refactor the method into another file, I don't realize I missed the external dependency until I get a runtime error.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

冷情妓 2024-08-18 06:07:55

其他答案对 import 的实际工作原理表现出轻微的困惑。

该语句：

import foo

大致相当于以下语句：

foo = __import__('foo', globals(), locals(), [], -1)

即，它在当前作用域中创建一个与请求的模块同名的变量，并将该模块调用 __import__() 的结果赋给它名称和大量默认参数。

__import__() 函数句柄在概念上将字符串 ('foo') 转换为模块对象。模块缓存在 sys.modules 中，这是 __import__() 首先查找的地方 - 如果 sys.modules 有一个 'foo' 条目>，这就是 __import__('foo') 将返回的内容，无论它是什么。它确实不关心类型。您可以亲自看到这一点；尝试运行以下代码：

import sys
sys.modules['boop'] = (1, 2, 3)
import boop
print boop

暂时抛开风格问题，在函数内使用 import 语句可以按照您想要的方式工作。如果该模块以前从未导入过，它将被导入并缓存在 sys.modules 中。然后它将模块分配给具有该名称的局部变量。它不会修改任何模块级状态。它确实可能会修改某些全局状态（向 sys.modules 添加新条目）。

也就是说，我几乎从不在函数内使用 import 。如果导入模块会导致程序明显变慢——就像它在静态初始化中执行长时间计算，或者它只是一个巨大的模块——并且您的程序实际上很少需要该模块来完成任何事情，那么仅在内部进行导入是完全可以的使用它的函数。（如果这令人反感，Guido 会跳进他的时间机器并更改 Python 以阻止我们这样做。）但作为一项规则，我和一般 Python 社区将所有 import 语句放在模块范围内的模块顶部。

The other answers evince a mild confusion as to how import really works.

This statement:

import foo

is roughly equivalent to this statement:

foo = __import__('foo', globals(), locals(), [], -1)

That is, it creates a variable in the current scope with the same name as the requested module, and assigns it the result of calling __import__() with that module name and a boatload of default arguments.

The __import__() function handles conceptually converts a string ('foo') into a module object. Modules are cached in sys.modules, and that's the first place __import__() looks--if sys.modules has an entry for 'foo', that's what __import__('foo') will return, whatever it is. It really doesn't care about the type. You can see this in action yourself; try running the following code:

import sys
sys.modules['boop'] = (1, 2, 3)
import boop
print boop

Leaving aside stylistic concerns for the moment, having an import statement inside a function works how you'd want. If the module has never been imported before, it gets imported and cached in sys.modules. It then assigns the module to the local variable with that name. It does not not not modify any module-level state. It does possibly modify some global state (adding a new entry to sys.modules).

That said, I almost never use import inside a function. If importing the module creates a noticeable slowdown in your program—like it performs a long computation in its static initialization, or it's simply a massive module—and your program rarely actually needs the module for anything, it's perfectly fine to have the import only inside the functions in which it's used. (If this was distasteful, Guido would jump in his time machine and change Python to prevent us from doing it.) But as a rule, I and the general Python community put all our import statements at the top of the module in module scope.

回复收藏 0 原文

黎歌 2024-08-18 06:07:55

撇开风格不谈，导入的模块确实只会导入一次（除非在所述模块上调用 reload）。但是，每次调用 import Foo 都会隐式检查该模块是否已加载（通过检查 sys.modules）。

还要考虑两个其他相同的函数的“反汇编”，其中一个尝试导入模块，而另一个则不导入：

>>> def Foo():
...     import random
...     return random.randint(1,100)
... 
>>> dis.dis(Foo)
  2           0 LOAD_CONST               1 (-1)
              3 LOAD_CONST               0 (None)
              6 IMPORT_NAME              0 (random)
              9 STORE_FAST               0 (random)

  3          12 LOAD_FAST                0 (random)
             15 LOAD_ATTR                1 (randint)
             18 LOAD_CONST               2 (1)
             21 LOAD_CONST               3 (100)
             24 CALL_FUNCTION            2
             27 RETURN_VALUE        
>>> def Bar():
...     return random.randint(1,100)
... 
>>> dis.dis(Bar)
  2           0 LOAD_GLOBAL              0 (random)
              3 LOAD_ATTR                1 (randint)
              6 LOAD_CONST               1 (1)
              9 LOAD_CONST               2 (100)
             12 CALL_FUNCTION            2
             15 RETURN_VALUE

我不确定为虚拟机翻译了多少字节码，但如果这是一个重要的内部循环对于您的程序，您肯定希望对 Bar 方法给予一定的重视，而不是 Foo 方法。

快速而肮脏的 timeit 测试确实显示使用 Bar 时速度有一定的提高：

$ python -m timeit -s "from a import Foo,Bar" -n 200000 "Foo()"
200000 loops, best of 3: 10.3 usec per loop
$ python -m timeit -s "from a import Foo,Bar" -n 200000 "Bar()"
200000 loops, best of 3: 6.45 usec per loop

Style aside, it is true that an imported module will only be imported once (unless reload is called on said module). However, each call to import Foo will have implicitly check to see if that module is already loaded (by checking sys.modules).

Consider also the "disassembly" of two otherwise equal functions where one tries to import a module and the other doesn't:

>>> def Foo():
...     import random
...     return random.randint(1,100)
... 
>>> dis.dis(Foo)
  2           0 LOAD_CONST               1 (-1)
              3 LOAD_CONST               0 (None)
              6 IMPORT_NAME              0 (random)
              9 STORE_FAST               0 (random)

  3          12 LOAD_FAST                0 (random)
             15 LOAD_ATTR                1 (randint)
             18 LOAD_CONST               2 (1)
             21 LOAD_CONST               3 (100)
             24 CALL_FUNCTION            2
             27 RETURN_VALUE        
>>> def Bar():
...     return random.randint(1,100)
... 
>>> dis.dis(Bar)
  2           0 LOAD_GLOBAL              0 (random)
              3 LOAD_ATTR                1 (randint)
              6 LOAD_CONST               1 (1)
              9 LOAD_CONST               2 (100)
             12 CALL_FUNCTION            2
             15 RETURN_VALUE

I'm not sure how much more the bytecode gets translated for the virtual machine, but if this was an important inner loop to your program, you'd certainly want to put some weight on the Bar approach over the Foo approach.

A quick and dirty timeit test does show a modest speed improvement when using Bar:

$ python -m timeit -s "from a import Foo,Bar" -n 200000 "Foo()"
200000 loops, best of 3: 10.3 usec per loop
$ python -m timeit -s "from a import Foo,Bar" -n 200000 "Bar()"
200000 loops, best of 3: 6.45 usec per loop

回复收藏 0 原文

晌融 2024-08-18 06:07:55

请参阅 PEP 8：

导入总是放在最前面
文件，紧接在任何模块之后
注释和文档字符串，以及模块全局变量和常量之前。

请注意，这纯粹是一种风格选择，因为 Python 会将所有 import 语句视为相同，无论它们在源文件中的何处声明。尽管如此，我还是建议您遵循常见做法，因为这将使您的代码对其他人更具可读性。

回复收藏 0 原文

白况 2024-08-18 06:07:55

我已经这样做了，然后希望我没有这么做。通常，如果我正在编写一个函数，并且该函数需要使用 StringIO，我可以查看模块的顶部，看看它是否正在导入，如果没有则添加它。

假设我不这样做；假设我在我的函数中本地添加它。然后假设在某个时候我或其他人添加了一堆使用 StringIO 的其他函数。该人将查看模块的顶部并添加import StringIO。现在，您的函数包含的代码不仅是意外的，而且是多余的。

此外，它违反了我认为非常重要的原则：不要直接从函数内部修改模块级状态。

编辑：

其实，事实证明以上都是废话。

导入模块不会修改模块级状态（它会初始化正在导入的模块，如果还没有其他东西的话，但这根本不是同一回事）。导入已经在其他地方导入的模块除了查找 sys.modules 并在本地范围内创建变量之外不需要任何成本。

知道了这一点，我觉得修复代码中所有我修复过的地方有点愚蠢，但这是我要承受的十字架。

回复收藏 0 原文

弱骨蛰伏 2024-08-18 06:07:55

当Python解释器遇到导入语句时，它开始读取正在导入的文件中的所有函数定义。这解释了为什么有时导入可能需要一段时间。

正如安德鲁·黑尔 (Andrew Hare) 指出的那样，从一开始就进行所有导入的想法是一种风格惯例。但是，您必须记住，这样做会隐式地让解释器在第一次导入该文件后检查该文件是否已被导入。当您的代码文件变大并且您想要“升级”代码以删除或替换某些依赖项时，这也会成为一个问题。这将要求您搜索整个代码文件以找到导入此模块的所有位置。

我建议遵循约定并将导入保留在代码文件的顶部。如果您确实想跟踪函数的依赖关系，那么我建议将它们添加到 docstring 中对于该功能。

回复收藏 0 原文