无堆栈语言如何工作?
我听说过无堆栈语言。 但是我不知道如何实现这样的语言。 有人可以解释一下吗?
I've heard of stackless languages. However I don't have any idea how such a language would be implemented. Can someone explain?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
如果我错了,请随时纠正我,但我认为为每个函数调用帧在堆上分配内存会导致极端的内存抖动。 毕竟操作系统必须管理这些内存。 我认为避免这种内存抖动的方法是调用帧的缓存。 因此,如果您无论如何都需要缓存,我们不妨将其在内存中连续并称为堆栈。
Please feel free to correct me if I'm wrong, but I would think that allocating memory on the heap for each function call frame would cause extreme memory thrashing. The operating system does after all have to manage this memory. I would think that the way to avoid this memory thrashing would be a cache for call frames. So if you need a cache anyway, we might as well make it contigous in memory and call it a stack.
Stackless Python 仍然有一个 Python 堆栈(尽管它可能有尾调用优化和其他调用框架合并技巧),但是它完全脱离了解释器的C堆栈。
Haskell(通常实现的)没有调用堆栈; 评估基于图缩减。
Stackless Python still has a Python stack (though it may have tail call optimization and other call frame merging tricks), but it is completely divorced from the C stack of the interpreter.
Haskell (as commonly implemented) does not have a call stack; evaluation is based on graph reduction.
有一篇关于 Parrot 语言框架。 Parrot 不使用堆栈进行调用,本文对此技术进行了一些解释。
There is a nice article about the language framework Parrot. Parrot does not use the stack for calling and this article explains the technique a bit.
在我或多或少熟悉的无堆栈环境(图灵机、汇编和 Brainfuck)中,实现自己的堆栈是很常见的。 将堆栈内置到语言中并没有什么基础的。
在最实用的汇编中,您只需选择可用的内存区域,将堆栈寄存器设置为指向底部,然后递增或递减以实现推送和弹出。
编辑:我知道有些架构有专用堆栈,但它们不是必需的。
In the stackless environments I'm more or less familiar with (Turing machine, assembly, and Brainfuck), it's common to implement your own stack. There is nothing fundamental about having a stack built into the language.
In the most practical of these, assembly, you just choose a region of memory available to you, set the stack register to point to the bottom, then increment or decrement to implement your pushes and pops.
EDIT: I know some architectures have dedicated stacks, but they aren't necessary.
你可以说我很古老,但我记得 FORTRAN 标准和 COBOL 不支持递归调用,因此不需要堆栈。 事实上,我记得 CDC 6000 系列机器的实现没有堆栈,如果您尝试递归调用子例程,FORTRAN 会做奇怪的事情。
根据记录,CDC 6000 系列指令集使用 RJ 指令来调用子例程,而不是调用堆栈。 这将当前 PC 值保存在调用目标位置,然后分支到其后面的位置。 最后,子例程将执行间接跳转到调用目标位置。 这会重新加载保存的 PC,有效地返回给调用者。
显然,这对于递归调用不起作用。 我记得如果你尝试递归,CDC FORTRAN IV 编译器会生成损坏的代码。
但是 CDC 也有 Pascal 实现(我怀疑来自 ETH),其中递归 >did< 工作。 所以大概 Pascal 运行时>做了< 有一个正确的堆栈并且>没有< 使用
RJ
。 我的记忆是它是一个编译器而不是解释器,但我只是一个本科生用户,而且那是很久以前的事了。Call me ancient, but I can remember when the FORTRAN standards and COBOL did not support recursive calls, and therefore didn't require a stack. Indeed, I recall the implementations for CDC 6000 series machines where there wasn't a stack, and FORTRAN would do strange things if you tried to call a subroutine recursively.
For the record, instead of a call-stack, the CDC 6000 series instruction set used the
RJ
instruction to call a subroutine. This saved the current PC value at the call target location and then branches to the location following it. At the end, a subroutine would perform an indirect jump to the call target location. That reloaded saved PC, effectively returning to the caller.Obviously, that does not work with recursive calls. And my recollection is that the CDC FORTRAN IV compiler would generate broken code if you did attempt recursion.
But there was also Pascal implementation for CDC (from ETH I suspect) where recursion >did< work. So presumably the Pascal runtime >did< have a proper stack and >didn't< use
RJ
. My recollection is that it was a compiler rather than an interpreter, but I was a mere undergraduate user and it was a long time ago.本文有一个易于理解的延续描述:http://www.defmacro.org /ramblings/fp.html
延续是可以传递到基于堆栈的语言中的函数中的东西,但也可以由语言自身的语义使用以使其成为“无堆栈”。 当然,堆栈仍然存在,但正如 Ira Baxter 所描述的,它不是一个大的连续段。
There is an easy to understand description of continuations on this article: http://www.defmacro.org/ramblings/fp.html
Continuations are something you can pass into a function in a stack-based language, but which can also be used by a language's own semantics to make it "stackless". Of course the stack is still there, but as Ira Baxter described, it's not one big contiguous segment.
假设您想实现无堆栈 C。首先要意识到这不需要堆栈:
但是,这是吗?
不会。因为智能编译器会内联调用
isequal
,将它们转换为a == b
。 那么,为什么不内联所有内容呢? 当然,您将生成更多代码,但如果摆脱堆栈对您来说是值得的,那么这很容易,只需做一些小小的权衡。那么递归呢? 没问题。 像这样的尾递归函数:
仍然可以内联,因为实际上它只是一个变相的 for 循环:
理论上,一个真正聪明的编译器可以为您解决这个问题。 但不太聪明的人仍然可以将其扁平化为首选:
在一种情况下,您必须做出小小的权衡。 这无法内联:
Stackless C 根本无法做到这一点。 你是否放弃了很多? 并不真地。 这也是普通C语言也做不到的。 如果您不相信我,请致电
fib(1000)
看看您珍贵的计算机会发生什么情况。Say you wanted to implement stackless C. The first thing to realize is that this doesn't need a stack:
But, does this?
No. Because a smart compiler will inline calls to
isequal
, turning them intoa == b
. So, why not just inline everything? Sure, you will generate more code but if getting rid of the stack is worth it to you then this is easy with a small tradeoff.What about recursion? No problem. A tail-recursive function like:
Can still be inlined, because really it's just a for loop in disguise:
In theory a really smart compiler could figure that out for you. But a less-smart one could still flatten it as a goto:
There is one case where you have to make a small trade off. This can't be inlined:
Stackless C simply cannot do this. Are you giving up a lot? Not really. This is something normal C can't do well very either. If you don't believe me just call
fib(1000)
and see what happens to your precious computer.我们拥有的现代操作系统(Windows、Linux)采用我所说的“大堆栈模型”运行。 有时,这种模型是错误的,并且激发了对“无堆栈”语言的需求。
“大堆栈模型”假设编译的程序将在连续的内存区域中为函数调用分配“堆栈帧”,使用机器指令非常快速地调整包含堆栈指针(和可选的堆栈帧指针)的寄存器。 这会导致快速的函数调用/返回,但代价是需要一个大的、连续的堆栈区域。 因为在这些现代操作系统下运行的所有程序中 99.99% 都可以与大堆栈模型很好地配合,所以编译器、加载器甚至操作系统都“知道”这个堆栈区域。
所有此类应用程序都存在的一个常见问题是,“我的堆栈应该有多大?”。 由于内存非常便宜,最常见的情况是为堆栈预留了一大块(MS 默认为 1Mb),而典型的应用程序调用结构永远不会接近用完它。 但是,如果应用程序确实用完了所有内存,它会由于超出堆栈末尾而因非法内存引用而终止(“对不起戴夫,我不能这样做”)。
大多数所谓的“无堆栈”语言并不是真正的无堆栈。 他们只是不使用这些系统提供的连续堆栈。 相反,他们所做的是在每次函数调用时从堆中分配一个堆栈帧。 每个函数调用的成本有所上升; 如果函数通常很复杂,或者语言是解释性的,则这种额外成本是微不足道的。 (我们还可以确定程序调用图中的调用 DAG,并分配一个堆段来覆盖整个 DAG;这样,对于调用 DAG 内的所有调用,您可以获得堆分配和经典大堆栈函数调用的速度)。
对堆栈帧使用堆分配有几个原因:
如果程序根据其正在解决的特定问题进行深度递归,
提前预分配“大堆栈”区域非常困难,因为所需的大小未知。 人们可以笨拙地安排函数调用来检查是否有足够的堆栈剩余,如果没有,则重新分配更大的块,复制旧堆栈并将所有指针重新调整到堆栈中; 这太尴尬了,我不知道有任何实现。
分配堆栈帧意味着应用程序永远不必说对不起,直到有
实际上已经没有剩余可分配内存了。
程序分叉子任务。 每个子任务都需要自己的堆栈,因此不能使用提供的一个“大堆栈”。 因此,需要为每个子任务分配堆栈。 如果您有数千个可能的子任务,那么您现在可能需要数千个“大堆栈”,并且内存需求突然变得荒谬。 分配堆栈帧可以解决这个问题。 通常,子任务“堆栈”会引用父任务来实现词法作用域; 作为子任务分支,会创建一棵“子堆栈”树,称为“仙人掌堆栈”。
你的语言有延续。 这些要求以某种方式保留当前函数可见的词法范围内的数据以供以后重用。 这可以通过复制父堆栈帧、爬上 cactus 堆栈并继续进行来实现。
我实现的 PARLANSE 编程语言执行 1) 和 2)。 我正在研究3)。 有趣的是,PARLANSE 从非常快速访问的每线程堆中分配堆栈帧; 它通常需要 4 条机器指令。 当前的实现是基于 x86 的,分配的帧被放置在 x86 EBP/ESP 寄存器中,就像其他传统的基于 x86 的语言实现一样。 因此,它确实以块的形式使用硬件“连续堆栈”(包括压入和弹出)。 它还生成“帧本地”子例程调用,不会为预先知道堆栈需求的大量生成的实用程序代码切换堆栈。
The modern operating systems we have (Windows, Linux) operate with what I call the "big stack model". And that model is wrong, sometimes, and motivates the need for "stackless" languages.
The "big stack model" assumes that a compiled program will allocate "stack frames" for function calls in a contiguous region of memory, using machine instructions to adjust registers containing the stack pointer (and optional stack frame pointer) very rapidly. This leads to fast function call/return, at the price of having a large, contiguous region for the stack. Because 99.99% of all programs run under these modern OSes work well with the big stack model, the compilers, loaders, and even the OS "know" about this stack area.
One common problem all such applications have is, "how big should my stack be?". With memory being dirt cheap, mostly what happens is that a large chunk is set aside for the stack (MS defaults to 1Mb), and typical application call structure never gets anywhere near to using it up. But if an application does use it all up, it dies with an illegal memory reference ("I'm sorry Dave, I can't do that"), by virtue of reaching off the end of its stack.
Most so-called called "stackless" languages aren't really stackless. They just don't use the contiguous stack provided by these systems. What they do instead is allocate a stack frame from the heap on each function call. The cost per function call goes up somewhat; if functions are typically complex, or the language is interpretive, this additional cost is insignificant. (One can also determine call DAGs in the program call graph and allocate a heap segment to cover the entire DAG; this way you get both heap allocation and the speed of classic big-stack function calls for all calls inside the call DAG).
There are several reasons for using heap allocation for stack frames:
If the program does deep recursion dependent on the specific problem it is solving,
it is very hard to preallocate a "big stack" area in advance because the needed size isn't known. One can awkwardly arrange function calls to check to see if there's enough stack left, and if not, reallocate a bigger chunk, copy the old stack and readjust all the pointers into the stack; that's so awkward that I don't know of any implementations.
Allocating stack frames means the application never has to say its sorry until there's
literally no allocatable memory left.
The program forks subtasks. Each subtask requires its own stack, and therefore can't use the one "big stack" provided. So, one needs to allocate stacks for each subtask. If you have thousands of possible subtasks, you might now need thousands of "big stacks", and the memory demand suddenly gets ridiculous. Allocating stack frames solves this problem. Often the subtask "stacks" refer back to the parent tasks to implement lexical scoping; as subtasks fork, a tree of "substacks" is created called a "cactus stack".
Your language has continuations. These require that the data in lexical scope visible to the current function somehow be preserved for later reuse. This can be implemented by copying parent stack frames, climbing up the cactus stack, and proceeding.
The PARLANSE programming language I implemented does 1) and 2). I'm working on 3). It is amusing to note that PARLANSE allocates stack frames from a very fast-access heap-per-thread; it costs typically 4 machine instructions. The current implementation is x86 based, and the allocated frame is placed in the x86 EBP/ESP register much like other conventional x86 based language implementations. So it does use the hardware "contiguous stack" (including pushing and poppping) just in chunks. It also generates "frame local" subroutine calls the don't switch stacks for lots of generated utility code where the stack demand is known in advance.