当前位置：文江博客话题详情

堆栈的目的是什么？为什么我们需要它？

发布于 2024-12-12 01:07:33 字数 400 浏览 6 评论 0原文

因此，我现在正在学习 MSIL，以学习调试我的 C# .NET 应用程序。

我一直想知道：堆栈的目的是什么？

只是将我的问题放在上下文中：
为什么会有从内存到堆栈的传输或“加载”？另一方面，为什么会有从堆栈到内存或“存储”的传输？ 为什么不把它们全部放在内存中？

是因为这样更快吗？
是因为它是基于 RAM 的吗？
为了效率？

我试图掌握这一点，以帮助我更深入地理解 CIL 代码。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

合约呢 2024-12-19 01:07:33

更新：我非常喜欢这个问题，所以我将其作为11 月 18 日博客的主题2011 年。感谢您提出的好问题！

我一直想知道：堆栈的目的是什么？

我假设您指的是 MSIL 语言的评估堆栈，而不是运行时的实际每线程堆栈。

为什么会有从内存到堆栈的传输或“加载”？另一方面，为什么会有从堆栈到内存或“存储”的传输？为什么不把它们全部放在内存中呢？

MSIL 是一种“虚拟机”语言。像 C# 编译器这样的编译器生成 CIL，然后在运行时另一个称为 JIT（Just In Time）的编译器) 编译器将 IL 转换为可以执行的实际机器代码。

首先让我们回答“为什么要有 MSIL？”这个问题。为什么不直接让 C# 编译器写出机器代码呢？

因为这样做更便宜。假设我们没有这样做；假设每种语言都必须有自己的机器代码生成器。您有二十种不同的语言：C#、JScript .NET、Visual Basic、IronPython, F#... 假设您有十个不同的处理器。您必须编写多少个代码生成器？ 20 x 10 = 200 个代码生成器。这是很多工作。现在假设您想要添加一个新处理器。您必须为其编写代码生成器二十次，每种语言一次。

此外，这是一项困难且危险的工作。为您不是专家的芯片编写高效的代码生成器是一项艰巨的工作！编译器设计者是语言语义分析方面的专家，而不是新芯片组的有效寄存器分配方面的专家。

现在假设我们用 CIL 方式来做。您需要编写多少个 CIL 生成器？每种语言一个。您必须编写多少个 JIT 编译器？每个处理器一个。总数：20 + 10 = 30 个代码生成器。此外，语言到CIL生成器很容易编写，因为CIL是一种简单语言，并且CIL到机器代码生成器也很容易编写，因为CIL是一种简单语言。我们摆脱了 C# 和 VB 之类的所有复杂性，并将所有内容“降低”为一种易于编写抖动的简单语言。

拥有中间语言可以显着降低生成新语言编译器的成本。它还大大降低了支持新芯片的成本。你想支持一个新芯片，你找到一些该芯片的专家，让他们写一个 CIL 抖动，然后你就完成了；然后你的芯片上就支持所有这些语言。

好的，我们已经确定了为什么需要 MSIL；因为拥有中间语言可以降低成本。那么为什么该语言是“堆栈机器”呢？

因为堆栈机在概念上对于语言编译器编写者来说非常简单。堆栈是一种简单、易于理解的描述计算的机制。从概念上讲，堆栈机对于 JIT 编译器编写者来说也非常容易处理。使用堆栈是一种简化的抽象，因此它再次降低了我们的成本。

你问“为什么要有堆栈？”为什么不直接从内存中完成所有事情呢？好吧，让我们考虑一下。假设您要为以下内容生成 CIL 代码：

int x = A() + B() + C() + 10;

假设我们有这样的约定：“add”、“call”、“store”等始终将其参数从堆栈中取出，并将其结果（如果有）放入堆栈中。要为这个 C# 生成 CIL 代码，我们只需说：

load the address of x // The stack now contains address of x
call A()              // The stack contains address of x and result of A()
call B()              // Address of x, result of A(), result of B()
add                   // Address of x, result of A() + B()
call C()              // Address of x, result of A() + B(), result of C()
add                   // Address of x, result of A() + B() + C()
load 10               // Address of x, result of A() + B() + C(), 10
add                   // Address of x, result of A() + B() + C() + 10
store in address      // The result is now stored in x, and the stack is empty.

现在假设我们在没有堆栈的情况下完成了它。我们将按照您的方式进行操作，其中每个操作码都获取其操作数的地址以及存储其结果的地址：

Allocate temporary store T1 for result of A()
Call A() with the address of T1
Allocate temporary store T2 for result of B()
Call B() with the address of T2
Allocate temporary store T3 for the result of the first addition
Add contents of T1 to T2, then store the result into the address of T3
Allocate temporary store T4 for the result of C()
Call C() with the address of T4
Allocate temporary store T5 for result of the second addition
...

您知道这是怎么回事吗？我们的代码变得越来越庞大，因为我们必须显式分配所有临时存储空间，而按照惯例，这些临时存储空间通常只存在于堆栈中。更糟糕的是，我们的操作码本身变得巨大，因为它们现在都必须将要写入结果的地址以及每个操作数的地址作为参数。知道将从堆栈中取出两个内容并放入一个内容的“添加”指令可以是单个字节。需要两个操作数地址和一个结果地址的加法指令将是巨大的。

我们使用基于堆栈的操作码，因为堆栈解决了常见问题。即：我想分配一些临时存储空间，很快使用它，然后在完成后快速删除它。通过假设我们有一个堆栈可供使用，我们可以使操作码非常小并且代码非常简洁。

更新：一些额外的想法

顺便说一句，这种通过（1）指定虚拟机，（2）编写针对VM语言的编译器，以及（3）在各种硬件上编写VM实现来大幅降低成本的想法并不适用。完全是一个新想法。它并非源自 MSIL、LLVM、Java 字节码或任何其他现代基础设施。据我所知，最早实施这一策略的是 1966 年的 pcode machine

。我个人第一次听说这个概念是当我了解到 Infocom 实现者如何设法让 Zork 运行在这么多不同的机器太好了。他们指定了一个名为 Z-machine 的虚拟机，然后为所有用户制作了 Z-machine 模拟器他们想要运行游戏的硬件。这带来了额外的巨大好处，他们可以在原始 8 位系统上实现虚拟内存管理；游戏可能比内存大，因为他们可以在需要时从磁盘调入代码，并在需要加载新代码时丢弃它。

UPDATE: I liked this question so much I made it the subject of my blog on November 18th 2011. Thanks for the great question!

I've always wondered: what is the purpose of the stack?

I assume you mean the evaluation stack of the MSIL language, and not the actual per-thread stack at runtime.

Why is there a transfer from memory to stack or "loading?" On the other hand, why is there a transfer from stack to memory or "storing"? Why not just have them all placed in the memory?

MSIL is a "virtual machine" language. Compilers like the C# compiler generate CIL, and then at runtime another compiler called the JIT (Just In Time) compiler turns the IL into actual machine code that can execute.

So first let's answer the question "why have MSIL at all?" Why not just have the C# compiler write out machine code?

Because it is cheaper to do it this way. Suppose we didn't do it that way; suppose each language has to have its own machine code generator. You have twenty different languages: C#, JScript .NET, Visual Basic, IronPython, F#... And suppose you have ten different processors. How many code generators do you have to write? 20 x 10 = 200 code generators. That's a lot of work. Now suppose you want to add a new processor. You have to write the code generator for it twenty times, one for each language.

Furthermore, it is difficult and dangerous work. Writing efficient code generators for chips that you are not an expert on is a hard job! Compiler designers are experts on the semantic analysis of their language, not on efficient register allocation of new chip sets.

Now suppose we do it the CIL way. How many CIL generators do you have to write? One per language. How many JIT compilers do you have to write? One per processor. Total: 20 + 10 = 30 code generators. Moreover, the language-to-CIL generator is easy to write because CIL is a simple language, and the CIL-to-machine-code generator is also easy to write because CIL is a simple language. We get rid of all of the intricacies of C# and VB and whatnot and "lower" everything to a simple language that is easy to write a jitter for.

Having an intermediate language lowers the cost of producing a new language compiler dramatically. It also lowers the cost of supporting a new chip dramatically. You want to support a new chip, you find some experts on that chip and have them write an CIL jitter and you're done; you then support all those languages on your chip.

OK, so we've established why we have MSIL; because having an intermediate language lowers costs. Why then is the language a "stack machine"?

Because stack machines are conceptually very simple for language compiler writers to deal with. Stacks are a simple, easily understood mechanism for describing computations. Stack machines are also conceptually very easy for JIT compiler writers to deal with. Using a stack is a simplifying abstraction, and therefore again, it lowers our costs.

You ask "why have a stack at all?" Why not just do everything directly out of memory? Well, let's think about that. Suppose you want to generate CIL code for:

int x = A() + B() + C() + 10;

Suppose we have the convention that "add", "call", "store" and so on always take their arguments off the stack and put their result (if there is one) on the stack. To generate CIL code for this C# we just say something like:

load the address of x // The stack now contains address of x
call A()              // The stack contains address of x and result of A()
call B()              // Address of x, result of A(), result of B()
add                   // Address of x, result of A() + B()
call C()              // Address of x, result of A() + B(), result of C()
add                   // Address of x, result of A() + B() + C()
load 10               // Address of x, result of A() + B() + C(), 10
add                   // Address of x, result of A() + B() + C() + 10
store in address      // The result is now stored in x, and the stack is empty.

Now suppose we did it without a stack. We'll do it your way, where every opcode takes the addresses of its operands and the address to which it stores its result:

Allocate temporary store T1 for result of A()
Call A() with the address of T1
Allocate temporary store T2 for result of B()
Call B() with the address of T2
Allocate temporary store T3 for the result of the first addition
Add contents of T1 to T2, then store the result into the address of T3
Allocate temporary store T4 for the result of C()
Call C() with the address of T4
Allocate temporary store T5 for result of the second addition
...

You see how this goes? Our code is getting huge because we have to explicitly allocate all the temporary storage that would normally by convention just go on the stack. Worse, our opcodes themselves are all getting enormous because they all now have to take as an argument the address that they're going to write their result into, and the address of each operand. An "add" instruction that knows that it is going to take two things off the stack and put one thing on can be a single byte. An add instruction that takes two operand addresses and a result address is going to be enormous.

We use stack-based opcodes because stacks solve the common problem. Namely: I want to allocate some temporary storage, use it very soon and then get rid of it quickly when I'm done. By making the assumption that we have a stack at our disposal we can make the opcodes very small and the code very terse.

UPDATE: Some additional thoughts

Incidentally, this idea of drastically lowering costs by (1) specifing a virtual machine, (2) writing compilers that target the VM language, and (3) writing implementations of the VM on a variety of hardware, is not a new idea at all. It did not originate with MSIL, LLVM, Java bytecode, or any other modern infrastructures. The earliest implementation of this strategy I'm aware of is the pcode machine from 1966.

The first I personally heard of this concept was when I learned how the Infocom implementors managed to get Zork running on so many different machines so well. They specified a virtual machine called the Z-machine and then made Z-machine emulators for all the hardware they wanted to run their games on. This had the added enormous benefit that they could implement virtual memory management on primitive 8-bit systems; a game could be larger than would fit into memory because they could just page the code in from disk when they needed it and discard it when they needed to load new code.

回复收藏 0 原文

梦开始←不甜 2024-12-19 01:07:33

请记住，当您谈论 MSIL 时，您就是在谈论虚拟机的指令。 .NET中使用的VM是基于堆栈的虚拟机。与基于寄存器的 VM 不同，Android 操作系统中使用的 Dalvik VM 就是一个示例的。

VM中的堆栈是虚拟的，由解释器或即时编译器将VM指令转换为在处理器上运行的实际代码。在 .NET 的情况下几乎总是抖动，MSIL 指令集被设计为从一开始就抖动。例如，与 Java 字节码相反，它对特定数据类型的操作具有不同的指令。这使得它被优化以被解释。 MSIL 解释器实际上存在，它用于 .NET Micro Framework。它运行在资源非常有限的处理器上，无法承担存储机器代码所需的 RAM。

实际的机器代码模型是混合的，既有堆栈又有寄存器。 JIT 代码优化器的一大工作就是想办法将保存在堆栈上的变量存储在寄存器中，从而大大提高执行速度。 Dalvik 抖动存在相反的问题。

否则，机器堆栈是一种非常基本的存储设施，它在处理器设计中已经存在很长时间了。它具有非常好的引用局部性，这是现代 CPU 的一个非常重要的功能，它处理数据的速度比 RAM 提供的速度快得多，并且支持递归。语言设计很大程度上受到堆栈的影响，堆栈在支持局部变量和范围仅限于方法体方面可见。该堆栈的一个重要问题是该站点的命名原因。

回复收藏 0 原文

木有鱼丸 2024-12-19 01:07:33

有一篇关于此的非常有趣/详细的维基百科文章，堆栈机器指令集的优点。我需要完整地引用它，所以简单地放置一个链接会更容易。我将简单地引用副标题

非常紧凑的目标代码
简单的编译器/简单的解释器
最小的处理器状态

回复收藏 0 原文

燃情 2024-12-19 01:07:33

为堆栈问题添加更多内容。堆栈概念源自 CPU 设计，其中算术逻辑单元 (ALU) 中的机器代码对位于堆栈上的操作数进行操作。例如，乘法运算可以从堆栈中取出两个顶部操作数，将它们相乘并将结果放回堆栈中。机器语言通常有两个基本函数，用于在堆栈中添加和删除操作数：推入和弹出。在许多 cpu 的 dsp（数字信号处理器）和机器控制器（例如控制洗衣机的控制器）中，堆栈位于芯片本身上。这样可以更快地访问 ALU 并将所需的功能整合到单个芯片中。

回复收藏 0 原文

他不在意 2024-12-19 01:07:33

如果不遵循堆栈/堆的概念，并且将数据加载到随机内存位置或从随机内存位置存储数据......它将非常非结构化和非托管。

这些概念用于将数据存储在预定义的结构中，以提高性能、内存使用率……因此称为数据结构。

回复收藏 0 原文

苦笑流年记忆 2024-12-19 01:07:33

通过使用编码的连续传递风格，可以让系统在没有堆栈的情况下工作。然后调用帧成为在垃圾收集堆中分配的延续（垃圾收集器将需要一些堆栈）。

请参阅 Andrew Appel 的旧著作：Compiling with Continuations 和垃圾收集速度可以比堆栈分配

（由于缓存问题，他今天可能有点错误）

回复收藏 0 原文

百合的盛世恋 2024-12-19 01:07:33

我寻找“中断”，但没有人将其视为优势。对于每个中断微控制器或其他处理器的设备，通常会将寄存器压入堆栈，调用中断服务例程，完成后，将寄存器从堆栈中弹出，并放回原来的位置。是。然后指令指针恢复，正常活动从中断处继续，几乎就像中断从未发生过一样。通过堆栈，您实际上可以让多个设备（理论上）互相中断，并且这一切都可以正常工作——因为堆栈。

还有一系列基于堆栈的语言，称为连接语言。它们都是（我相信）函数式语言，因为堆栈是隐式传入的参数，并且更改后的堆栈是每个函数的隐式返回。 Forth 和 Factor （非常好）和其他例子都是例子。 Factor 与 Lua 类似，用于编写游戏脚本，由目前在 Apple 工作的天才 Slava Pestov 编写。他的 YouTube 上的 Google TechTalk 我看过几次。他谈论了 Boa 构造函数，但我不确定他的意思;-)。

我真的认为当前的一些虚拟机，比如 JVM、微软的 CIL，甚至我看到的为 Lua 编写的虚拟机，都应该用其中一些基于堆栈的语言编写，以使它们可移植到更多平台。我认为这些连接语言在某种程度上错过了它们作为虚拟机创建工具包和可移植平台的使命。甚至还有 pForth，一个用 ANSI C 编写的“可移植”Forth，可用于实现更通用的可移植性。有人尝试使用 Emscripten 或 WebAssembly 编译它吗？

对于基于堆栈的语言，有一种称为零点的代码风格，因为您可以仅列出要调用的函数而不传递任何参数（有时）。如果这些函数完美地组合在一起，那么您将只有所有零点函数的列表，这将是您的应用程序（理论上）。如果您深入研究 Forth 或 Factor，您就会明白我在说什么。

Easy Forth 是一个用 JavaScript 编写的不错的在线教程，这里有一个小示例（请注意“ sq sq sq sq”作为零点调用风格的示例）：

: sq dup * ;  ok
2 sq . 4  ok
: ^4 sq sq ;  ok
2 ^4 . 16  ok
: ^8 sq sq sq sq ;  ok
2 ^8 . 65536  ok

此外，如果您查看 Easy Forth 网页源代码，您会在底部看到它非常模块化，由大约 8 个 JavaScript 文件编写。

我花了很多钱买了几乎每一本我能拿到的 Forth 书，试图融入 Forth，但现在我才刚刚开始更好地理解它。我想提醒那些后来的人，如果你真的想得到它（我发现得太晚了），请获取 FigForth 上的书并实现它。商业化的Forth太复杂了，而Forth最伟大的一点就是可以理解整个系统，从上到下。不知何故，Forth 在新处理器上实现了整个开发环境，尽管这种需求似乎已经在所有方面都通过了 C 来实现，但作为从头开始编写 Forth 的仪式，它仍然很有用。因此，如果您选择这样做，请尝试 FigForth 书——它是在各种处理器上同时实现的多个 Forth。福斯罗塞塔石碑的一种。

为什么我们需要堆栈——效率、优化、零点、中断时保存寄存器，对于递归算法来说，它是“正确的形状”。

I looked for "interrupt" and nobody included that as an advantage. For each device that interrupts a microcontroller or other processor, there are usually registers that are pushed onto a stack, an interrupt service routine is called, and when it is done, the registers are popped back off of the stack, and put back where they were. Then the instruction pointer is restored, and normal activity picks up where it left off, almost as if the interrupt never happened. With the stack, you can actually have several devices (theoretically) interrupt each other, and it all just works -- because of the stack.

There is also a family of stack-based languages called concatenative languages. They are all (I believe) functional languages, because the stack is an implicit parameter passed-in, and also the changed stack is an implicit return from each function. Both Forth and Factor (which is excellent) are examples, along with others. Factor has been used similar to Lua, for scripting games, and was written by Slava Pestov, a genius currently working at Apple. His Google TechTalk on youtube I have watched a few times. He talks about Boa constructors, but I'm not sure what he means ;-).

I really think that some of the current VM's, like the JVM, Microsoft's CIL, and even the one I saw was written for Lua, should be written in some of these stack-based languages, to make them portable to even more platforms. I think that these concatenative languages are somehow missing their callings as VM creation kits, and portability platforms. There is even pForth, a "portable" Forth written in ANSI C, that could be used for even more universal portability. Anybody tried to compile it using Emscripten or WebAssembly?

With the stack based languages, there is a style of code called zero-point, because you can just list the functions to be called without passing any parameters at all (at times). If the functions fit together perfectly, you would have nothing but a list of all of the zero-point functions, and that would be your application (theoretically). If you delve into either Forth or Factor, you'll see what I'm talking about.

At Easy Forth, a nice online tutorial written in JavaScript, here is a small sample (note the "sq sq sq sq" as an example of zero-point calling style):

: sq dup * ;  ok
2 sq . 4  ok
: ^4 sq sq ;  ok
2 ^4 . 16  ok
: ^8 sq sq sq sq ;  ok
2 ^8 . 65536  ok

Also, if you look at the Easy Forth web page source, you'll see at the bottom that it is very modular, written in about 8 JavaScript files.

I have spent a lot of money on just about every Forth book I could get my hands on in an attempt to assimilate Forth, but am now just beginning to grok it better. I want to give a heads up to those who come after, if you really want to get it (I found this out too late), get the book on FigForth and implement that. The commercial Forths are all too complicated, and the greatest thing about Forth is that it is possible to comprehend the whole system, from top to bottom. Somehow, Forth implements a whole development environment on a new processor, and though the need for that has seemed to pass with C on everything, it is still useful as a rite of passage to write a Forth from scratch. So, if you choose to do this, try the FigForth book -- it is several Forths implemented simultaneously on a variety of processors. A kind of Rosetta Stone of Forths.

Why do we need a stack -- efficiency, optimization, zero-point, saving registers upon interrupt, and for recursive algorithms it's "the right shape."

回复收藏 0 原文

~没有更多了~