如何创建并行堆栈并在其上运行协程?
我决定我应该尝试实现协程(我认为我应该这样称呼它们)以获得乐趣和利润。我希望必须使用汇编程序,如果我想让它对任何事情都有用的话,可能还需要一些 C 语言。
请记住,这是出于教育目的。使用已经构建的协程库太容易了(而且真的没什么乐趣)。
你们知道setjmp
和longjmp
吗?它们允许您将堆栈展开到预定义的位置,并从那里恢复执行。但是,它无法回退到堆栈上的“稍后”。只能早点回来。
jmpbuf_t checkpoint;
int retval = setjmp(&checkpoint); // returns 0 the first time
/* lots of stuff, lots of calls, ... We're not even in the same frame anymore! */
longjmp(checkpoint, 0xcafebabe); // execution resumes where setjmp is, and now it returns 0xcafebabe instead of 0
我想要的是一种无需线程即可在不同堆栈上运行两个函数的方法。 (显然,一次只能运行一个。我说过,没有线程。)这两个函数必须能够恢复另一个函数的执行(并停止自己的执行)。有点像他们longjmp
到另一个。一旦它返回到另一个函数,它必须从离开的地方恢复(即,在将控制权交给另一个函数的调用期间或之后),有点像 longjmp
返回到 setjmp
。
我是这样想的:
- 函数
A
创建并行堆栈并将其归零(分配内存等)。 - 函数
A
将其所有寄存器压入当前堆栈。 - 函数
A
将堆栈指针和基指针设置到该新位置,并推送一个神秘的数据结构,指示跳回何处以及将指令指针设置回何处。 - 函数
A
将其大部分寄存器清零,并将指令指针设置为函数B
的开头。
那是为了初始化。现在,以下情况将无限循环:
- 函数
B
在该堆栈上工作,执行它需要执行的任何操作。 - 函数
B
到达需要中断并再次给予A
控制权的程度。 - 函数
B
将其所有寄存器压入堆栈,获取一开始就给它的神秘数据结构A
,并设置堆栈指针以及指向A
告诉它的位置的指令指针。在此过程中,它交回A
一个新的、修改过的数据结构,该结构告诉在哪里恢复B
。 - 函数
A
醒来,弹出它推入堆栈的所有寄存器,并继续工作,直到它需要中断并再次给予B
控制权。
这一切对我来说听起来不错。然而,有很多事情我不太放心。
- 显然,在好的 x86 上,有一条
pusha
指令可以将所有寄存器发送到堆栈。然而,处理器架构不断发展,现在使用 x86_64,我们有了更多通用寄存器,可能还有几个 SSE 寄存器。我找不到任何证据表明pusha
确实推动了它们。现代 x86 CPU 中有大约 40 个公共寄存器。我必须自己完成所有的push
操作吗?此外,SSE 寄存器没有push
(尽管肯定有一个等价的东西——我对整个“x86 汇编器”这件事很陌生)。 改变指令指针有那么简单吗?我可以执行这只是mov rip, rax
(英特尔语法)吗?此外,从中获取价值必须有些特殊,因为它不断变化。如果我确实喜欢mov rax, rip
(又是英特尔语法),rip
是否会定位在mov
指令上,直到它后面的指令,或者介于两者之间?jmp foo
。假的。- 我曾多次提到神秘的数据结构。到目前为止,我假设它至少需要包含三件事:基指针、堆栈指针和指令指针。还有什么吗?
- 我是不是忘记了什么?
- 虽然我真的很想了解事情是如何工作的,但我很确定有一些库可以做到这一点。你知道吗?是否有任何 POSIX 或 BSD 定义的标准方法可以做到这一点,例如线程的
pthread
?
感谢您阅读我的问题文本墙。
I decided I should try to implement coroutines (I think that's how I should call them) for fun and profits. I expect to have to use assembler, and probably some C if I want to make this actually useful for anything.
Bear in mind that this is for educational purposes. Using an already built coroutine library is too easy (and really no fun).
You guys know setjmp
and longjmp
? They allow you to unwind the stack up to a predefined location, and resumes execution from there. However, it can't rewind to "later" on the stack. Only come back earlier.
jmpbuf_t checkpoint;
int retval = setjmp(&checkpoint); // returns 0 the first time
/* lots of stuff, lots of calls, ... We're not even in the same frame anymore! */
longjmp(checkpoint, 0xcafebabe); // execution resumes where setjmp is, and now it returns 0xcafebabe instead of 0
What I'd like is a way to run, without threading, two functions on different stacks. (Obviously, only one runs at a time. No threading, I said.) These two functions must be able to resume the other's execution (and halt their own). Somewhat like if they were longjmp
ing to the other. Once it returns to the other function, it must resume where it left (that is, during or after the call that gave control to the other function), a bit like how longjmp
returns to setjmp
.
This is how I thought it:
- Function
A
creates and zeroes a parallel stack (allocates memory and all that). - Function
A
pushes all its registers to the current stack. - Function
A
sets the stack pointer and the base pointer to that new location, and pushes a mysterious data structure indicating where to jump back and where to set the instruction pointer back. - Function
A
zeroes most of its registers and sets the instruction pointer to the beginning of functionB
.
That's for the initialization. Now, the following situation will indefinitely loop:
- Function
B
works on that stack, does whatever work it needs to. - Function
B
comes to a point where it needs to interrupt and giveA
control again. - Function
B
pushes all of its registers to its stack, takes the mysterious data structureA
gave it at the very beginning, and sets the stack pointer and the instruction pointer to whereA
told it to. In the process, it hands backA
a new, modified data structure that tells where to resumeB
. - Function
A
wakes up, popping back all the registers it pushed to its stack, and does work until it comes to a point where it needs to interrupt and giveB
control again.
All this sounds good to me. However, there is a number of things I'm not exactly at ease with.
- Apparently, on good ol' x86, there was this
pusha
instruction that would send all registers to the stack. However, processor architectures evolve, and now with x86_64 we've got a lot more general-purpose registers, and likely several SSE registers. I couldn't find any evidence thatpusha
does push them. There are about 40 public registers in a mordern x86 CPU. Do I have to do all thepush
es myself? Moreover, there is nopush
for SSE registers (though there's bound to be an equivalent—I'm new to this whole "x86 assembler" thing). Is changing the instruction pointer as easy as saying it? Can I do, like,It's justmov rip, rax
(Intel syntax)? Also, getting the value from it must be somewhat special as it constantly changes. If I do likemov rax, rip
(Intel syntax again), willrip
be positioned on themov
instruction, to the instruction after it, or somewhere between?jmp foo
. Dummy.- I've mentioned a mysterious data structure a few times. Up to now I've assumed it needs to contain at least three things: the base pointer, the stack pointer and the instruction pointer. Is there anything else?
- Did I forget anything?
- While I'd really like to understand how things work, I'm pretty sure there are a handful of libraries that do just that. Do you know any? Is there any POSIX- or BSD-defined standard way to do it, like
pthread
for threads?
Thanks for reading my question textwall.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您是正确的,
PUSHA
无法在 x64 上工作,它会引发异常#UD
,因为PUSHA
only 推送16 位或 32 位通用寄存器。请参阅英特尔手册,了解您想了解的所有信息。设置
RIP
很简单,jmp rax
会将RIP
设置为RAX
。要检索 RIP,如果您已经知道所有协程退出来源,则可以在编译时获取它,或者可以在运行时获取它,您可以在该调用之后调用下一个地址。像这样:RAX
现在将是b
。这是有效的,因为CALL
压入下一条指令的地址。这项技术也适用于 IA32(尽管我认为在 x64 上有更好的方法,因为它支持 RIP 相对寻址,但我不知道有哪一种)。当然,如果你创建一个函数coroutine_yield,它只能拦截调用者地址:)由于你无法在一条指令中将所有寄存器推送到堆栈,所以我不建议存储协程堆栈上的状态,因为无论如何这都会使事情变得复杂。我认为最好的做法是为每个协程实例分配一个数据结构。
为什么要将函数
A
中的内容归零?那可能没有必要。以下是我处理整个事情的方法,试图使其尽可能简单:
创建一个包含以下内容的结构
coroutine_state
:initarg
arg
registers
(也包含标志)caller_registers
创建一个函数:
coroutine_state* coroutine_init(void (*coro_func)(coroutine_state*), void* initarg);
其中 coro_func 是指向协程函数体的指针。
该函数执行以下操作:
coroutine_state
结构cs
initarg
分配给cs.initarg
,这些将是协程的初始参数coro_func
分配给cs.registers.rip
cs.registers
(不是寄存器,只有标志,因为我们需要一些理智的标志来防止灾难)现在 我们还有另一个函数:
void* coroutine_next(coroutine_state cs, void* arg)
,其中
cs
是从coroutine_init
返回的结构,它表示一个协程实例和 arg 将在协程恢复执行时被输入到协程中。该函数由协程调用者调用,将一些新参数传递给协程并恢复它,该函数的返回值是协程返回(产生)的任意数据结构。
cs.caller_registers
中(RSP
除外),请参阅步骤 3。arg
存储在cs.arg< 中/code>
cs.caller_registers.rsp
),如果幸运的话,添加2*sizeof(void*)
将修复它,您可以必须查找这一点来确认它,您可能希望这个函数是 stdcall,这样在调用它之前不会篡改寄存器mov rax, [rsp]
,将RAX
分配给cs.caller_registers.rip
;解释:除非您的编译器处于破解状态,否则[RSP]
将保存指向调用该函数的调用指令后面的指令的指令指针(即:返回地址)jmp cs.registers.rip
,有效恢复协程的执行请注意,我们永远不会从此函数返回,我们跳转到的协程会为我们“返回”(请参阅协程_yield)。另请注意,在该函数内部,您可能会遇到许多复杂情况,例如 C 编译器生成的函数序言和结尾,也许还有寄存器参数,您必须处理所有这些。就像我说的,stdcall 会为你省去很多麻烦,我认为 gcc 的 -fomit-frame_pointer 会删除尾声的东西。
最后一个函数声明为:
void coroutine_yield(void* ret);
该函数在协程内部调用,以“暂停”协程的执行并返回到
coroutine_next
的调用者>。在cs.registers
cs.registers.rsp
),再次添加2*sizeof(void*)
加载cs.caller_registers
的 flags/registersjmp cs.caller_registers.rip
这本质上是从协程调用程序堆栈帧上的coroutine_next
调用返回的,因为返回值在RAX
中传递,我们返回了arg
。假设如果arg
为NULL
,则协程已终止,否则它是任意数据结构。回顾一下,您可以使用 coroutine_init 初始化协程,然后可以使用 coroutine_next 重复调用实例化的协程。
协程的函数本身声明为:
void my_coro(coroutine_state cs)
cs.initarg
保存初始函数参数(想想构造函数)。每次调用my_coro
时,cs.arg
都会有一个由coroutine_next
指定的不同参数。这就是协程调用者与协程通信的方式。最后,每次协程想要暂停自己时,它都会调用coroutine_yield,并向其传递一个参数,该参数是协程调用程序的返回值。好吧,您现在可能会认为“这很简单!”,但我忽略了以正确的顺序加载寄存器和标志的所有复杂性,同时仍然保持未损坏的堆栈帧并以某种方式保留协程数据结构的地址(您只需以线程安全的方式覆盖所有寄存器)。对于这一部分,您需要了解编译器内部是如何工作的...祝您好运:)
You are correct in that
PUSHA
wont work on x64 it will raise the exception#UD
, asPUSHA
only pushes the 16-bit or 32-bit general purpose registers. See the Intel manuals for all the info you ever wanted to know.Setting
RIP
is simple,jmp rax
will setRIP
toRAX
. To retrieve RIP, you could either get it at compile time if you already know all the coroutine exit origins, or you could get it at run time, you can make a call to the next address after that call. Like this:RAX
will now beb
. This works becauseCALL
pushes the address of the next instruction. This technique works on IA32 as well (although I'd suppose there's a nicer way to do it on x64, as it supports RIP-relative addressing, but I don't know of one). Of course if you make a functioncoroutine_yield
, it can just intercept the caller address :)Since you can't push all the registers to the stack in a single instruction, I wouldn't recommend storing the coroutine state on the stack, as that complicates things anyways. I think the nicest thing to do would be to allocate a data structure for every coroutine instance.
Why are you zeroing things in function
A
? That's probably not necessary.Here's how I would approach the entire thing, trying to make it as simple as possible:
Create a structure
coroutine_state
that holds the following:initarg
arg
registers
(also contains the flags)caller_registers
Create a function:
coroutine_state* coroutine_init(void (*coro_func)(coroutine_state*), void* initarg);
where
coro_func
is a pointer to the coroutine function body.This function does the following:
coroutine_state
structurecs
initarg
tocs.initarg
, these will be the initial argument to the coroutinecoro_func
tocs.registers.rip
cs.registers
(not registers, only flags, as we need some sane flags to prevent an apocalypse)cs.registers.rsp
coroutine_state
structureNow we have another function:
void* coroutine_next(coroutine_state cs, void* arg)
where
cs
is the structure returned fromcoroutine_init
which represents a coroutine instance, andarg
will be fed into the coroutine as it resumes execution.This function is called by the coroutine invoker to pass in some new argument to the coroutine and resume it, the return value of this function is an arbitrary data structure returned (yielded) by the coroutine.
cs.caller_registers
except forRSP
, see step 3.arg
incs.arg
cs.caller_registers.rsp
), adding2*sizeof(void*)
will fix it if you're lucky, you'd have to look this up to confirm it, you probably want this function to be stdcall so no registers are tampered with before calling itmov rax, [rsp]
, assignRAX
tocs.caller_registers.rip
; explanation: unless your compiler is on crack,[RSP]
will hold the instruction pointer to the instruction that follows the call instruction that called this function (ie: the return address)cs.registers
jmp cs.registers.rip
, efectively resuming execution of the coroutineNote that we never return from this function, the coroutine we jump to "returns" for us (see
coroutine_yield
). Also note that inside this function you may run into many complications such as function prologue and epilogue generated by the C compiler, and perhaps register arguments, you have to take care of all this. Like I said, stdcall will save you lots of trouble, I think gcc's -fomit-frame_pointer will remove the epilogue stuff.The last function is declared as:
void coroutine_yield(void* ret);
This function is called inside the coroutine to "pause" execution of the coroutine and return to the caller of
coroutine_next
.in cs.registers
cs.registers.rsp
), once again, add2*sizeof(void*)
to it, and you want this function to be stdcall as wellmov rax, arg
(lets just pretend all the functions in your compiler return their arguments inRAX
)cs.caller_registers
jmp cs.caller_registers.rip
This essentially returns from thecoroutine_next
call on the coroutine invoker's stack frame, and since the return value is passed inRAX
, we returnedarg
. Let's just say ifarg
isNULL
, then the coroutine has terminated, otherwise it's an arbitrary data structure.So to recap, you initialize a coroutine using
coroutine_init
, then you can repeatedly invoke the instantiated coroutine withcoroutine_next
.The coroutine's function itself is declared:
void my_coro(coroutine_state cs)
cs.initarg
holds the initial function argument (think constructor). Each timemy_coro
is called,cs.arg
has a different argument that was specified bycoroutine_next
. This is how the coroutine invoker communicates with the coroutine. Finally, every time the coroutine wants to pause itself, it callscoroutine_yield
, and passes one argument to it, which is the return value to the coroutine invoker.Okay, you may now think "thats easy!", but I left out all the complications of loading the registers and flags in the correct order while still maintaining a non corrupt stack frame and somehow keeping the address of your coroutine data structure (you just overwrote all your registers), in a thread-safe manner. For that part you will need to find out how your compiler works internally... good luck :)
很好的学习参考:libcoroutine,特别是他们的 setjmp/longjmp 实现。我知道使用现有的库并不有趣,但您至少可以大致了解您要去的地方。
Good learning reference: libcoroutine, especially their setjmp/longjmp implementation. I know its not fun to use an existing library, but you can at least get a general bearing on where you are going.
Simon Tatham 有一个有趣的 C 协程实现不需要任何特定于体系结构的知识或堆栈摆弄。这不完全是你所追求的,但我认为它至少可能具有学术兴趣。
Simon Tatham has an interesting implementation of coroutines in C that doesn't require any architecture-specific knowledge or stack fiddling. It's not exactly what you're after, but I thought it might nonetheless be of at least academic interest.
boost.org 上的 boost.coroutine (boost.context) 可以为您完成所有工作
boost.coroutine (boost.context) at boost.org does all for you