在Python/其他语言中如何/应该如何管理跨包的模块中的全局数据？

发布于 2024-08-15 03:55:26 字数 467 浏览 5 评论 0原文

我正在尝试为一种可以编译的编程语言（Heron）设计包和模块系统并进行了解释，从我所看到的来看，我真的很喜欢 Python 方法。 Python 有丰富的模块可供选择，这似乎在很大程度上促成了它的成功。

我不知道的是，如果一个模块包含在两个不同的编译包中，Python 中会发生什么：数据是否有单独的副本，或者是共享的？

与此相关的是一系列附带问题：

我假设包可以在 Python 中编译是正确的吗？
这两种方法（复制或共享模块数据）有何优缺点？
从 Python 社区的角度来看，Python 模块系统是否存在众所周知的问题？例如，是否正在考虑使用 PEP 来增强模块/包？
Python 模块/包系统的某些方面是否不适用于编译语言？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

韬韬不绝 2024-08-22 03:55:26

嗯，你问了很多问题。以下是一些进一步了解的提示：

a. Python 代码被词法分析并编译为 Python 特定指令，但不会编译为机器可执行代码。每当您运行与现有 .pyc 时间戳不匹配的 python 代码时，都会自动创建“.pyc”文件。该功能可以关闭。您可以使用 dis 模块来查看这些说明。
b.当导入模块时，它会在自己的命名空间中执行（从上到下），并且该命名空间会全局缓存。当您从另一个模块导入时，该模块不会再次执行。请记住 def 只是一个声明。您可能需要在代码中添加 print('compiling this module') 语句来跟踪它。
这取决于。
最近有一些增强功能，主要是围绕指定需要加载的模块。模块可以具有相对路径，因此一个巨大的项目可能有多个同名的模块。
Python 本身不适用于编译语言。谷歌搜索“unladen咽下博客”，看看尝试加速一种语言的艰辛，其中“a = sum(b)”可以改变执行之间的含义。除了极端情况之外，模块系统在源代码和编译的库系统之间架起了一座良好的桥梁。该方法效果很好，Python 轻松包装 C 代码（swig 等）也很有帮助。

回复收藏 0 原文

余生共白头 2024-08-22 03:55:26

模块是 Python 中唯一真正的全局对象，所有其他全局数据都基于模块系统（使用 sys.modules 作为注册表）。包只是具有用于导入子模块的特殊语义的模块。将 .py 文件“编译”为 .pyc 或 .pyo 并不是大多数语言所理解的编译：它仅检查语法并创建一个代码对象，该代码对象在解释器中执行时会创建模块对象。

example.py：

print "Creating %s module." % __name__

def show_def(f):
  print "Creating function %s.%s." % (__name__, f.__name__)
  return f

@show_def
def a():
  print "called: %s.a" % __name__

交互式会话：

>>> import example
# first sys.modules['example'] is checked
# since it doesn't exist, example.py is found and "compiled" to example.pyc
# (since example.pyc doesn't exist, same would happen if it was outdated, etc.)
Creating example module. # module code is executed
Creating function example.a. # def statement executed
>>> example.a()
called: example.a
>>> import example
# sys.modules['example'] found, local variable example assigned to that object
# no 'Creating ..' output
>>> d = {"__name__": "fake"}
>>> exec open("example.py") in d
# the first import in this session is very similar to this
# in that it creates a module object (which has a __dict__), initializes a few
# variables in it (__builtins__, __name__, and others---packages' __init__
# modules have their own as well---look at some_module.__dict__.keys() or
# dir(some_module))
# and executes the code from example.py in this dict (or the code object stored
# in example.pyc, etc.)
Creating fake module. # module code is executed
Creating function fake.a. # def statement executed
>>> d.keys()
['__builtins__', '__name__', 'a', 'show_def']
>>> d['a']()
called: fake.a

您的问题：

从某种意义上说，它们是经过编译的，但如果您熟悉 C 编译器的工作原理，它们并不像您所期望的那样。
如果数据是不可变的，则复制是可行的，并且除了对象标识（Python 中的 is 运算符和 id()）之外，应该与共享没有区别。
导入可能会也可能不会执行代码（它们总是将局部变量分配给对象，但这不会造成问题）并且可能会也可能不会修改 sys.modules。您必须小心，不要在线程中导入，通常最好在每个模块的顶部进行所有导入：这会导致级联图，因此所有导入都会立即完成，然后 __main__ 继续并执行实际工作™ 。
- 我不知道目前有任何 PEP，但也已经有很多复杂的机制到位。例如，包可以有一个 __path__ 属性（实际上是路径列表）因此子模块不必位于同一目录中，并且这些路径甚至可以在运行时计算！（下面的 mungepath 包示例。）您可以拥有自己的导入钩子，在函数内使用 import 语句，直接调用 __import__，并且我不会惊讶地发现 2-3 种其他独特的方法来使用包和模块。
导入系统的一个子集可以在传统编译语言中工作，只要它类似于 C 的 #include 之类的东西。您可以在编译器中运行“第一级”执行（创建模块对象），并编译这些结果。然而，这样做有很大的缺点，并且相当于模块级代码和运行时执行的函数的单独执行上下文（并且某些函数必须在两个上下文中运行！）。（请记住，在 Python 中，每个语句都在运行时执行，甚至是 def 和 class 语句。）
- 我相信这是传统编译语言将“顶级”代码限制为类、函数和对象声明的主要原因，从而消除了第二个上下文。即使如此，除非仔细管理，否则 C/C++（和其他）中的全局对象会出现初始化问题。

mungepath/__init__.py：

print __path__
__path__.append(".") # CWD, would be different in non-example code
print __path__
from . import example # this is example.py from above, and is NOT in mungepath/
# note that this is a degenerate case, in that we now have two names for the
# 'same' module: example and mungepath.example, but they're really different
# modules with different functions (use 'is' or 'id()' to verify)

交互式会话：

>>> import example
Creating example module.
Creating function example.a.
>>> example.__dict__.keys()
['a', '__builtins__', '__file__', 'show_def', '__package__',
 '__name__', '__doc__']
>>> import mungepath
['mungepath']
['mungepath', '.']
Creating mungepath.example module.
Creating function mungepath.example.a.
>>> mungepath.example.a()
called: mungepath.example.a
>>> example is mungepath.example
False
>>> example.a is mungepath.example.a
False

Modules are the only truly global objects in Python, with all other global data based around the module system (which uses sys.modules as a registry). Packages are simply modules with special semantics for importing submodules. "Compiling" a .py file into a .pyc or .pyo isn't compilation as understood for most languages: it only checks the syntax and creates a code object which, when executed in the interpreter, creates the module object.

example.py:

print "Creating %s module." % __name__

def show_def(f):
  print "Creating function %s.%s." % (__name__, f.__name__)
  return f

@show_def
def a():
  print "called: %s.a" % __name__

Interactive session:

>>> import example
# first sys.modules['example'] is checked
# since it doesn't exist, example.py is found and "compiled" to example.pyc
# (since example.pyc doesn't exist, same would happen if it was outdated, etc.)
Creating example module. # module code is executed
Creating function example.a. # def statement executed
>>> example.a()
called: example.a
>>> import example
# sys.modules['example'] found, local variable example assigned to that object
# no 'Creating ..' output
>>> d = {"__name__": "fake"}
>>> exec open("example.py") in d
# the first import in this session is very similar to this
# in that it creates a module object (which has a __dict__), initializes a few
# variables in it (__builtins__, __name__, and others---packages' __init__
# modules have their own as well---look at some_module.__dict__.keys() or
# dir(some_module))
# and executes the code from example.py in this dict (or the code object stored
# in example.pyc, etc.)
Creating fake module. # module code is executed
Creating function fake.a. # def statement executed
>>> d.keys()
['__builtins__', '__name__', 'a', 'show_def']
>>> d['a']()
called: fake.a

Your questions:

They are compiled, in a sense, but not as you would expect if you're familiar with how C compilers work.
If the data is immutable, copying is feasible, and should be indistinguishable from sharing except for object identity (is operator and id() in Python).
Imports may or may not execute code (they always assign a local variable to an object, but that poses no problems) and may or may not modify sys.modules. You must be careful to not import in threads, and generally it is best to do all imports at the top of every module: this leads to a cascading graph so all the imports are done at once and then __main__ continues and does the Real Work™.
- I don't know of any current PEP, but there's already a lot of complex machinery in place, too. For example packages can have a __path__ attribute (really a list of paths) so submodules don't have to be in the same directory, and these paths can even be computed at runtime! (Example mungepath package below.) You can have your own import hooks, use import statements inside functions, directly call __import__, and I wouldn't be surprised to find 2-3 other unique ways to work with packages and modules.
A subset of the import system would work in a traditionally-compiled language, as long as it was similar to something like C's #include. You could run the "first level" of execution (creating the module objects) in the compiler, and compile those results. There are significant drawbacks to this, however, and amounts to separate execution contexts for module-level code and functions executed at runtime (and some functions would have to run in both contexts!). (Remember in Python that every statement is executed at runtime, even def and class statements.)
- I believe this is the main reason traditionally-compiled languages restrict "top-level" code to class, function, and object declarations, eliminating this second context. Even then, you have initialization problems for global objects in C/C++ (and others), unless managed carefully.

mungepath/__init__.py:

print __path__
__path__.append(".") # CWD, would be different in non-example code
print __path__
from . import example # this is example.py from above, and is NOT in mungepath/
# note that this is a degenerate case, in that we now have two names for the
# 'same' module: example and mungepath.example, but they're really different
# modules with different functions (use 'is' or 'id()' to verify)

Interactive session:

>>> import example
Creating example module.
Creating function example.a.
>>> example.__dict__.keys()
['a', '__builtins__', '__file__', 'show_def', '__package__',
 '__name__', '__doc__']
>>> import mungepath
['mungepath']
['mungepath', '.']
Creating mungepath.example module.
Creating function mungepath.example.a.
>>> mungepath.example.a()
called: mungepath.example.a
>>> example is mungepath.example
False
>>> example.a is mungepath.example.a
False

回复收藏 0 原文