是否可以以编程方式构建 Python 堆栈帧并在代码中的任意点开始执行？

发布于 2024-07-13 09:22:29 字数 2286 浏览 10 评论 0原文

是否可以在 CPython 中以编程方式构造一个堆栈（一个或多个堆栈帧）并在任意代码点开始执行？想象一下以下场景：

您有一个工作流引擎，其中可以使用 Python 编写工作流脚本，并使用一些调用工作流引擎的构造（例如分支、等待/加入）。
阻塞调用（例如 wait 或 join）会在具有某种持久后备存储的事件分派引擎中设置侦听器条件。
阻塞调用（例如
您有一个工作流脚本，它调用引擎中的等待条件，等待稍后发出信号的某些条件。这会在事件调度引擎中设置侦听器。
工作流脚本的状态、包括程序计数器（或等效状态）的相关堆栈帧将被保留 - 因为等待条件可能会在几天或几个月后发生。
工作流脚本的状态
在此期间，工作流引擎可能会停止并重新启动，这意味着必须能够以编程方式存储和重建工作流脚本的上下文。
事件调度引擎触发等待条件拾取的事件。
工作流引擎读取序列化状态和堆栈，并使用堆栈重建线程。然后，它在调用等待服务的位置继续执行。

问题

这可以用未经修改的 Python 解释器来完成吗？更好的是，任何人都可以向我指出一些可能涵盖此类内容的文档，或者以编程方式构造堆栈帧并在代码块中间的某个位置开始执行的代码示例吗？

编辑：为了澄清“未修改的Python解释器”，我不介意使用C API（PyThreadState中有足够的信息来做到这一点吗？），但我不想去探究Python 解释器的内部结构，并且必须构建一个修改后的解释器。

更新：通过一些初步调查，可以使用 PyThreadState_Get() 获取执行上下文。这将返回 PyThreadState 中的线程状态（在 pystate.h 中定义），该状态具有对 frame 中堆栈帧的引用。堆栈帧保存在 PyFrameObject 的类型定义结构中，该结构在 frameobject.h 中定义。 PyFrameObject 有一个字段 f_lasti （支持 bobince），它有一个程序计数器，表示为距代码块开头的偏移量。

最后一点是个好消息，因为这意味着只要保留实际编译的代码块，就应该能够根据需要重建尽可能多的堆栈帧的局部变量并重新启动代码。我想说这意味着理论上是可能的，无需修改 python 解释器，尽管这意味着代码仍然可能与特定版本的解释器紧密耦合。

剩下的三个问题是：

事务状态和“saga”回滚，这可能可以通过用于构建 O/R 映射器的元类黑客技术来完成。我确实构建过一次原型，所以我很清楚如何实现这一点。
稳健地序列化事务状态和任意局部变量。这可以通过读取 __locals__ （可从堆栈框架中获得）并以编程方式构造对 pickle 的调用来完成。但是，我不知道这里可能存在什么陷阱（如果有的话）。
工作流程的版本控制和升级。这有点棘手，因为系统没有为工作流节点提供任何符号锚点。我们所拥有的只是锚为了做到这一点，必须识别所有入口点的偏移并将它们映射到新版本。手动完成可能可行，但我怀疑很难自动化。如果您想支持此功能，这可能是最大的障碍。

更新2： PyCodeObject (code.h)有一个addr列表(f_lasti)->; PyCodeObject.co_lnotab 中的行号映射（如果此处错误请纠正我）。这可以用于促进将工作流程更新到新版本的迁移过程，因为可以根据行号将冻结的指令指针映射到新脚本中的适当位置。仍然相当混乱，但更有希望。

更新3：我认为这个问题的答案可能是Stackless Python。你可以暂停任务并将其序列化。我还没有弄清楚这是否也适用于堆栈。

原文

Is it possible to programmatically construct a stack (one or more stack frames) in CPython and start execution at an arbitrary code point? Imagine the following scenario:

You have a workflow engine where workflows can be scripted in Python with some constructs (e.g. branching, waiting/joining) that are calls to the workflow engine.
A blocking call, such as a wait or join sets up a listener condition in an event-dispatching engine with a persistent backing store of some sort.
You have a workflow script, which calls the Wait condition in the engine, waiting for some condition that will be signalled later. This sets up the listener in the event dispatching engine.
The workflow script's state, relevant stack frames including the program counter (or equivalent state) are persisted - as the wait condition could occur days or months later.
In the interim, the workflow engine might be stopped and re-started, meaning that it must be possible to programmatically store and reconstruct the context of the workflow script.
The event dispatching engine fires the event that the wait condition picks up.
The workflow engine reads the serialised state and stack and reconstructs a thread with the stack. It then continues execution at the point where the wait service was called.

The Question

Can this be done with an unmodified Python interpreter? Even better, can anyone point me to some documentation that might cover this sort of thing or an example of code that programmatically constructs a stack frame and starts execution somewhere in the middle of a block of code?

Edit: To clarify 'unmodified python interpreter', I don't mind using the C API (is there enough information in a PyThreadState to do this?) but I don't want to go poking around the internals of the Python interpreter and having to build a modified one.

Update: From some initial investigation, one can get the execution context with PyThreadState_Get(). This returns the thread state in a PyThreadState (defined in pystate.h), which has a reference to the stack frame in frame. A stack frame is held in a struct typedef'd to PyFrameObject, which is defined in frameobject.h. PyFrameObject has a field f_lasti (props to bobince) which has a program counter expressed as an offset from the beginning of the code block.

This last is sort of good news, because it means that as long as you preserve the actual compiled code block, you should be able to reconstruct locals for as many stack frames as necessary and re-start the code. I'd say this means that it is theoretically possible without having to make a modified python interpereter, although it means that the code is still probably going to be fiddly and tightly coupled to specific versions of the interpreter.

The three remaining problems are:

Transaction state and 'saga' rollback, which can probably be accomplished by the sort of metaclass hacking one would use to build an O/R mapper. I did build a prototype once, so I have a fair idea of how this might be accomplished.
Robustly serialising transaction state and arbitrary locals. This might be accomplished by reading __locals__ (which is available from the stack frame) and programatically constructing a call to pickle. However, I don't know what, if any, gotchas there might be here.
Versioning and upgrade of workflows. This is somewhat trickier, as the system is not providing any symbolic anchors for workflow nodes. All we have is the anchor
In order to do this, one would have to identify the offsets of all of the entry points and map them to the new version. Probably feasible to do manually, but I suspect it would be hard to automate. This is probably the biggest obstacle if you want to support this capability.

Update 2: PyCodeObject (code.h) has a list of addr (f_lasti)-> line number mappings in PyCodeObject.co_lnotab (correct me if wrong here). This might be used to facilitate a migration process to update workflows to a new version, as frozen instruction pointers could be mapped to the appropriate place in the new script, done in terms of the line numbers. Still quite messy but a little more promising.

Update 3: I think the answer to this might be Stackless Python. You can suspend tasks and serialise them. I haven't worked out whether this will also work with the stack as well.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

久伴你 2024-07-20 09:22:29

普通 Python 发行版中包含的 expat python 绑定正在以编程方式构建堆栈帧。但请注意，它依赖于未记录的私有 API。

http://svn.python.org /view/python/trunk/Modules/pyexpat.c?rev=64048&view=auto

回复收藏 0 原文

末蓝 2024-07-20 09:22:29

您通常想要的是延续，我看到它已经是这个问题的标签。

如果您有能力使用系统中的所有代码，您可能想尝试
这样做而不是处理解释器堆栈内部。我不确定这种情况有多容易坚持下去。

http://www.ps.uni-sb.de/~duchier /python/continuations.html

在实践中，我会构建您的工作流引擎，以便您的脚本将操作对象提交给管理器。经理可以在任何时候挑选一组操作并允许
它们被加载并再次开始执行（通过恢复提交操作）。

换句话说：创建您自己的应用程序级堆栈。

回复收藏 0 原文

心如狂蝶 2024-07-20 09:22:29

Stackless python 可能是最好的……如果你不介意完全转向不同的 python 发行版。 stackless 可以序列化 Python 中的一切，以及它们的 tasklet。如果你想留在标准的Python发行版中，那么我会使用dill，它可以序列化几乎Python中的任何东西。

>>> import dill
>>> 
>>> def foo(a):
...   def bar(x):
...     return a*x
...   return bar
... 
>>> class baz(object):
...   def __call__(self, a,x):
...     return foo(a)(x)
... 
>>> b = baz()
>>> b(3,2)
6
>>> c = baz.__call__
>>> c(b,3,2)
6
>>> g = dill.loads(dill.dumps(globals()))
>>> g
{'dill': <module 'dill' from '/Library/Frameworks/Python.framework/Versions/7.2/lib/python2.7/site-packages/dill-0.2a.dev-py2.7.egg/dill/__init__.pyc'>, 'c': <unbound method baz.__call__>, 'b': <__main__.baz object at 0x4d61970>, 'g': {...}, '__builtins__': <module '__builtin__' (built-in)>, 'baz': <class '__main__.baz'>, '_version': '2', '__package__': None, '__name__': '__main__', 'foo': <function foo at 0x4d39d30>, '__doc__': None}

Dill 将其类型注册到 pickle 注册表中，因此，如果您有一些使用 pickle 的黑盒代码并且您无法真正编辑它，那么只需导入 dill 就可以神奇地使它无需对第 3 方代码进行猴子修补即可工作。

这是 dill pickling 整个解释器会话...

>>> # continuing from above
>>> dill.dump_session('foobar.pkl')
>>>
>>> ^D
dude@sakurai>$ python
Python 2.7.5 (default, Sep 30 2013, 20:15:49) 
[GCC 4.2.1 (Apple Inc. build 5566)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> dill.load_session('foobar.pkl')
>>> c(b,3,2)
6

dill 还有一些好的工具可帮助您在代码失败时了解导致酸洗失败的原因。

你还问它是用来保存解释器状态的？

IPython 可以使用 dill 将解释器会话保存到文件中。 https://nbtest.herokuapp.com /github/ipython/ipython/blob/master/examples/parallel/Using%20Dill.ipynb

klepto 使用 dill 来支持内存中、磁盘或数据库缓存，从而避免重新计算。 https://github.com/uqfoundation/klepto/blob/master/tests /test_cache_info.py

mystic 使用 dill 保存检查点通过在优化器正在进行时保存优化器的状态来执行大型优化作业。 https://github.com/uqfoundation/mystic/blob/master/tests /test_solver_state.py

还有一些其他包使用 dill 来保存对象或会话的状态。

Stackless python is probably the best… if you don't mind totally going over to a different python distribution. stackless can serialize everything in python, plus their tasklets. If you want to stay in the standard python distribution, then I'd use dill, which can serialize almost anything in python.

>>> import dill
>>> 
>>> def foo(a):
...   def bar(x):
...     return a*x
...   return bar
... 
>>> class baz(object):
...   def __call__(self, a,x):
...     return foo(a)(x)
... 
>>> b = baz()
>>> b(3,2)
6
>>> c = baz.__call__
>>> c(b,3,2)
6
>>> g = dill.loads(dill.dumps(globals()))
>>> g
{'dill': <module 'dill' from '/Library/Frameworks/Python.framework/Versions/7.2/lib/python2.7/site-packages/dill-0.2a.dev-py2.7.egg/dill/__init__.pyc'>, 'c': <unbound method baz.__call__>, 'b': <__main__.baz object at 0x4d61970>, 'g': {...}, '__builtins__': <module '__builtin__' (built-in)>, 'baz': <class '__main__.baz'>, '_version': '2', '__package__': None, '__name__': '__main__', 'foo': <function foo at 0x4d39d30>, '__doc__': None}

Dill registers it's types into the pickle registry, so if you have some black box code that uses pickle and you can't really edit it, then just importing dill can magically make it work without monkeypatching the 3rd party code.

Here's dill pickling the whole interpreter session...

>>> # continuing from above
>>> dill.dump_session('foobar.pkl')
>>>
>>> ^D
dude@sakurai>$ python
Python 2.7.5 (default, Sep 30 2013, 20:15:49) 
[GCC 4.2.1 (Apple Inc. build 5566)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> dill.load_session('foobar.pkl')
>>> c(b,3,2)
6

dill also has some good tools for helping you understand what is causing your pickling to fail when your code fails.

You also asked for where it's used to save interpreter state?

IPython can use dill to save the interpreter session to a file. https://nbtest.herokuapp.com/github/ipython/ipython/blob/master/examples/parallel/Using%20Dill.ipynb

klepto uses dill to support in-memory, to-disk, or to-database caching that avoids recomputation. https://github.com/uqfoundation/klepto/blob/master/tests/test_cache_info.py

mystic uses dill to save the checkpoints for large optimization jobs by saving the state of the optimizer as it's in progress. https://github.com/uqfoundation/mystic/blob/master/tests/test_solver_state.py

There are a couple other packages that use dill to save state of objects or sessions.

回复收藏 0 原文