Python 中的本地导入语句

发布于 2024-08-11 06:07:55 字数 326 浏览 5 评论 0原文

我认为将 import 语句放置在靠近使用它的片段的位置,可以使其依赖关系更加清晰,从而提高可读性。 Python 会缓存这个吗?我应该关心吗?这是一个坏主意吗?

def Process():
    import StringIO
    file_handle=StringIO.StringIO('hello world')
    #do more stuff

for i in xrange(10): Process()

更多的理由是:它适用于使用库的神秘部分的方法,但是当我将该方法重构到另一个文件中时,我没有意识到我错过了外部依赖项,直到出现运行时错误。

I think putting the import statement as close to the fragment that uses it helps readability by making its dependencies more clear. Will Python cache this? Should I care? Is this a bad idea?

def Process():
    import StringIO
    file_handle=StringIO.StringIO('hello world')
    #do more stuff

for i in xrange(10): Process()

A little more justification: it's for methods which use arcane bits of the library, but when I refactor the method into another file, I don't realize I missed the external dependency until I get a runtime error.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

冷情妓 2024-08-18 06:07:55

其他答案对 import 的实际工作原理表现出轻微的困惑。

该语句:

import foo

大致相当于以下语句:

foo = __import__('foo', globals(), locals(), [], -1)

即,它在当前作用域中创建一个与请求的模块同名的变量,并将该模块调用 __import__() 的结果赋给它名称和大量默认参数。

__import__() 函数句柄在概念上将字符串 ('foo') 转换为模块对象。模块缓存在 sys.modules 中,这是 __import__() 首先查找的地方 - 如果 sys.modules 有一个 'foo' 条目>,这就是 __import__('foo') 将返回的内容,无论它是什么。它确实不关心类型。您可以亲自看到这一点;尝试运行以下代码:

import sys
sys.modules['boop'] = (1, 2, 3)
import boop
print boop

暂时抛开风格问题,在函数内使用 import 语句可以按照您想要的方式工作。如果该模块以前从未导入过,它将被导入并缓存在 sys.modules 中。然后它将模块分配给具有该名称的局部变量。它不会修改任何模块级状态。它确实可能会修改某些全局状态(向 sys.modules 添加新条目)。

也就是说,我几乎从不在函数内使用 import 。如果导入模块会导致程序明显变慢——就像它在静态初始化中执行长时间计算,或者它只是一个巨大的模块——并且您的程序实际上很少需要该模块来完成任何事情,那么仅在内部进行导入是完全可以的使用它的函数。 (如果这令人反感,Guido 会跳进他的时间机器并更改 Python 以阻止我们这样做。)但作为一项规则,我和一般 Python 社区将所有 import 语句放在模块范围内的模块顶部。

The other answers evince a mild confusion as to how import really works.

This statement:

import foo

is roughly equivalent to this statement:

foo = __import__('foo', globals(), locals(), [], -1)

That is, it creates a variable in the current scope with the same name as the requested module, and assigns it the result of calling __import__() with that module name and a boatload of default arguments.

The __import__() function handles conceptually converts a string ('foo') into a module object. Modules are cached in sys.modules, and that's the first place __import__() looks--if sys.modules has an entry for 'foo', that's what __import__('foo') will return, whatever it is. It really doesn't care about the type. You can see this in action yourself; try running the following code:

import sys
sys.modules['boop'] = (1, 2, 3)
import boop
print boop

Leaving aside stylistic concerns for the moment, having an import statement inside a function works how you'd want. If the module has never been imported before, it gets imported and cached in sys.modules. It then assigns the module to the local variable with that name. It does not not not modify any module-level state. It does possibly modify some global state (adding a new entry to sys.modules).

That said, I almost never use import inside a function. If importing the module creates a noticeable slowdown in your program—like it performs a long computation in its static initialization, or it's simply a massive module—and your program rarely actually needs the module for anything, it's perfectly fine to have the import only inside the functions in which it's used. (If this was distasteful, Guido would jump in his time machine and change Python to prevent us from doing it.) But as a rule, I and the general Python community put all our import statements at the top of the module in module scope.

黎歌 2024-08-18 06:07:55

撇开风格不谈,导入的模块确实只会导入一次(除非在所述模块上调用 reload)。但是,每次调用 import Foo 都会隐式检查该模块是否已加载(通过检查 sys.modules)。

还要考虑两个其他相同的函数的“反汇编”,其中一个尝试导入模块,而另一个则不导入:

>>> def Foo():
...     import random
...     return random.randint(1,100)
... 
>>> dis.dis(Foo)
  2           0 LOAD_CONST               1 (-1)
              3 LOAD_CONST               0 (None)
              6 IMPORT_NAME              0 (random)
              9 STORE_FAST               0 (random)

  3          12 LOAD_FAST                0 (random)
             15 LOAD_ATTR                1 (randint)
             18 LOAD_CONST               2 (1)
             21 LOAD_CONST               3 (100)
             24 CALL_FUNCTION            2
             27 RETURN_VALUE        
>>> def Bar():
...     return random.randint(1,100)
... 
>>> dis.dis(Bar)
  2           0 LOAD_GLOBAL              0 (random)
              3 LOAD_ATTR                1 (randint)
              6 LOAD_CONST               1 (1)
              9 LOAD_CONST               2 (100)
             12 CALL_FUNCTION            2
             15 RETURN_VALUE        

我不确定为虚拟机翻译了多少字节码,但如果这是一个重要的内部循环对于您的程序,您肯定希望对 Bar 方法给予一定的重视,而不是 Foo 方法。

快速而肮脏的 timeit 测试确实显示使用 Bar 时速度有一定的提高:

$ python -m timeit -s "from a import Foo,Bar" -n 200000 "Foo()"
200000 loops, best of 3: 10.3 usec per loop
$ python -m timeit -s "from a import Foo,Bar" -n 200000 "Bar()"
200000 loops, best of 3: 6.45 usec per loop

Style aside, it is true that an imported module will only be imported once (unless reload is called on said module). However, each call to import Foo will have implicitly check to see if that module is already loaded (by checking sys.modules).

Consider also the "disassembly" of two otherwise equal functions where one tries to import a module and the other doesn't:

>>> def Foo():
...     import random
...     return random.randint(1,100)
... 
>>> dis.dis(Foo)
  2           0 LOAD_CONST               1 (-1)
              3 LOAD_CONST               0 (None)
              6 IMPORT_NAME              0 (random)
              9 STORE_FAST               0 (random)

  3          12 LOAD_FAST                0 (random)
             15 LOAD_ATTR                1 (randint)
             18 LOAD_CONST               2 (1)
             21 LOAD_CONST               3 (100)
             24 CALL_FUNCTION            2
             27 RETURN_VALUE        
>>> def Bar():
...     return random.randint(1,100)
... 
>>> dis.dis(Bar)
  2           0 LOAD_GLOBAL              0 (random)
              3 LOAD_ATTR                1 (randint)
              6 LOAD_CONST               1 (1)
              9 LOAD_CONST               2 (100)
             12 CALL_FUNCTION            2
             15 RETURN_VALUE        

I'm not sure how much more the bytecode gets translated for the virtual machine, but if this was an important inner loop to your program, you'd certainly want to put some weight on the Bar approach over the Foo approach.

A quick and dirty timeit test does show a modest speed improvement when using Bar:

$ python -m timeit -s "from a import Foo,Bar" -n 200000 "Foo()"
200000 loops, best of 3: 10.3 usec per loop
$ python -m timeit -s "from a import Foo,Bar" -n 200000 "Bar()"
200000 loops, best of 3: 6.45 usec per loop
晌融 2024-08-18 06:07:55

请参阅 PEP 8

导入总是放在最前面
文件,紧接在任何模块之后
注释和文档字符串,以及模块全局变量和常量之前。

请注意,这纯粹是一种风格选择,因为 Python 会将所有 import 语句视为相同,无论它们在源文件中的何处声明。尽管如此,我还是建议您遵循常见做法,因为这将使您的代码对其他人更具可读性。

Please see PEP 8:

Imports are always put at the top of
the file, just after any module
comments and docstrings, and before module globals and constants.

Please note that this is purely a stylistic choice as Python will treat all import statements the same regardless of where they are declared in the source file. Still I would recommend that you follow common practice as this will make your code more readable to others.

白况 2024-08-18 06:07:55

我已经这样做了,然后希望我没有这么做。通常,如果我正在编写一个函数,并且该函数需要使用 StringIO,我可以查看模块的顶部,看看它是否正在导入,如果没有则添加它。

假设我不这样做;假设我在我的函数中本地添加它。然后假设在某个时候我或其他人添加了一堆使用 StringIO 的其他函数。该人将查看模块的顶部并添加import StringIO。现在,您的函数包含的代码不仅是意外的,而且是多余的。

此外,它违反了我认为非常重要的原则:不要直接从函数内部修改模块级状态。

编辑:

其实,事实证明以上都是废话。

导入模块不会修改模块级状态(它会初始化正在导入的模块,如果还没有其他东西的话,但这根本不是同一回事)。导入已经在其他地方导入的模块除了查找 sys.modules 并在本地范围内创建变量之外不需要任何成本。

知道了这一点,我觉得修复代码中所有我修复过的地方有点愚蠢,但这是我要承受的十字架。

I've done this, and then wished I hadn't. Ordinarily, if I'm writing a function, and that function needs to use StringIO, I can look at the top of the module, see if it's being imported, and then add it if it's not.

Suppose I don't do this; suppose I add it locally within my function. And then suppose at someone point I, or someone else, adds a bunch of other functions that use StringIO. That person is going to look at the top of the module and add import StringIO. Now your function contains code that's not only unexpected but redundant.

Also, it violates what I think is a pretty important principle: don't directly modify module-level state from inside a function.

Edit:

Actually, it turns out that all of the above is nonsense.

Importing a module doesn't modify module-level state (it initializes the module being imported, if nothing else has yet, but that's not at all the same thing). Importing a module that you've already imported elsewhere costs you nothing except a lookup to sys.modules and creating a variable in the local scope.

Knowing this, I feel kind of dumb fixing all of the places in my code where I fixed it, but that's my cross to bear.

弱骨蛰伏 2024-08-18 06:07:55

当Python解释器遇到导入语句时,它开始读取正在导入的文件中的所有函数定义。这解释了为什么有时导入可能需要一段时间。

正如安德鲁·黑尔 (Andrew Hare) 指出的那样,从一开始就进行所有导入的想法是一种风格惯例。但是,您必须记住,这样做会隐式地让解释器在第一次导入该文件后检查该文件是否已被导入。当您的代码文件变大并且您想要“升级”代码以删除或替换某些依赖项时,这也会成为一个问题。这将要求您搜索整个代码文件以找到导入此模块的所有位置。

我建议遵循约定并将导入保留在代码文件的顶部。如果您确实想跟踪函数的依赖关系,那么我建议将它们添加到 docstring 中 对于该功能。

When the Python interpreter hits an import statement, it starts reading all the function definitions in the file that is being imported. This explains why sometimes, imports can take a while.

The idea behind doing all the importing at the start IS a stylistic convention as Andrew Hare points out. However, you have to keep in mind that by doing so, you are implicitly making the interpreter check if this file has already been imported after the first time you import it. It also becomes a problem when your code file becomes large and you want to "upgrade" your code to remove or replace certain dependencies. This will require you to search your whole code file to find all the places where you have imported this module.

I would suggest following the convention and keeping the imports at the top of your code file. If you really do want to keep track of dependencies for functions, then I would suggest adding them in the docstring for that function.

相守太难 2024-08-18 06:07:55

当你需要在本地导入它时,我可以看到两种方法

  1. 出于测试目的或临时使用,你需要导入一些东西,在这种情况下你应该将导入放在使用的地方。

  2. 有时为了避免循环依赖,您需要将其导入到函数内,但这意味着您在其他地方遇到问题。

    有时为了避免循环依赖,

否则,为了效率和一致性,始终将其放在首位。

I can see two ways when you need to import it locally

  1. For testing purpose or for temporary usage, you need to import something, in that case you should put import at the place of usage.

  2. Sometime to avoid cyclic dependency you will need to import it inside a function but that would mean you have problem else where.

Otherwise always put it at top for efficiency and consistency sake.

枕梦 2024-08-18 06:07:55

我相信本地导入的最强用例是防止您的程序加载当前调用中未使用的不必要的模块。

一个简单的示例是具有大量子命令的 CLI 或代码库,其中许多子命令具有截然不同的模块依赖性。

所有这些依赖项都需要可以从主模块访问,但是如果代码在开始执行之前加载了每个代码路径中可能需要的每个模块,则可能会出现严重的延迟和不必要的内存使用。

如果使用本地导入的代码被执行,则本地导入实际上只是延迟导入,因此,如果项目中的所有大型依赖项都是本地导入,则使用代码库的简单程序可以在毫秒和兆字节内启动,而不是秒和千兆字节。

I believe that the strongest use case for local imports is to prevent your program from loading unnecessary modules that aren't being used in its current invocation.

A simple example is a CLI or codebase that has a lot of subcommands, many of which have dramatically different module dependencies.

All these dependencies need to be accessible from the main module, but if the code loaded every module that it might need in every codepath before starting execution, there might be a significant delay and unnecessary memory use.

A local import is only actually imported lazily, if the code that uses it is executed, so if all the large dependencies in a project are local imports, a trivial program using the codebase can start up in millseconds and megabytes, not seconds and gigabytes.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文