实现 VM 的教程/资源

发布于 2024-08-17 11:28:17 字数 164 浏览 11 评论 0原文

我想要自学目的为动态语言实现一个简单的虚拟机,更喜欢用 C 语言。类似于 Lua VM、Parrot 或 Python VM,但更简单。除了查看现有虚拟机的代码和设计文档之外,是否有任何好的资源/教程来实现这一目标?

编辑:为什么要关闭投票?我不明白-这不是编程吗?如果我的问题有具体问题,请评论。

I want self-education purpose implement a simple virtual machine for a dynamic language, prefer in C. Something like the Lua VM, or Parrot, or Python VM, but simpler. Are there any good resources/tutorials on achieving this, apart from looking at code and design documentations of the existing VMs?

Edit: why close vote? I don't understand - is this not programming. Please comment if there is specific problem with my question.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

橪书 2024-08-24 11:28:17

我假设您想要一个虚拟机而不仅仅是一个解释器。我认为它们是连续体上的两个点。解释器的工作接近于程序的原始表示。虚拟机运行在更原始(且独立)的指令上。这意味着您需要一个编译阶段将一个阶段转换为另一个阶段。我不知道您是否想先解决这个问题,或者您是否已经想到了输入语法。

对于动态语言,您需要在某个地方存储数据(作为键/值对)和一些对其进行操作的操作。 VM 维护存储。其上运行的程序是指令序列(包括控制流)。您需要定义指令集。我建议从一个简单的集合开始,例如:

  • 基本算术运算,包括算术比较、访问存储
  • 基本控制流
  • 内置打印

您可能希望使用基于堆栈的计算方法来进行算术,就像许多虚拟机所做的那样。上面还没有太多动态。为此,我们需要两件事:在运行时计算变量名称的能力(这仅意味着字符串操作),以及将代码作为数据处理。这可能就像允许函数引用一样简单。

虚拟机的输入最好是字节码。如果您还没有编译器,则可以从基本汇编器(可能是虚拟机的一部分)生成它。

VM 本身由循环组成:

1. Look at the bytecode instruction pointed to by the instruction pointer.
2. Execute the instruction:
   * If it's an arithmetic instruction, update the store accordingly.
   * If it's control flow, perform the test (if there is one) and set the instruction pointer.
   * If it's print, print a value from the store.
3. Advance the instruction pointer to the next instruction.
4. Repeat from 1.

处理计算变量名称可能很棘手:指令需要指定计算名称所在的变量。这可以通过允许指令引用输入中提供的字符串常量池来完成。

示例程序(汇编和字节码):

offset  bytecode (hex)   source
 0      01 05 0E         //      LOAD 5, .x
 3      01 03 10         // .l1: LOAD 3, .y
 6      02 0E 10 0E      //      ADD .x, .y, .x
10      03 0E            //      PRINT .x
12      04 03            //      GOTO .l1
14      78 00            //      .x: "x"
16      79 00            //      .y: "y"

隐含的指令代码是:

"LOAD x, k" (01 x k) Load single byte x as an integer into variable named by string constant at offset k.
"ADD k1, k2, k3" (02 v1 v2 v3) Add two variables named by string constants k1 and k2 and put the sum in variable named by string constant k3.
"PRINT k" (03 k) Print variable named by string constant k.
"GOTO a" (04 a) Go to offset given by byte a.

当变量由其他变量命名时,您需要变体等(并且间接级别很难推理)。汇编器查看“ADD .x、.y、.x”等参数,并生成正确的字节码以从字符串常量(而不是计算变量)进行添加。

I assume you want a virtual machine rather than a mere interpreter. I think they are two points on a continuum. An interpreter works on something close to the original representation of the program. A VM works on more primitive (and self-contained) instructions. This means you need a compilation stage to translate the one to the other. I don't know if you want to work on that first or if you even have an input syntax in mind yet.

For a dynamic language, you want somewhere that stores data (as key/value pairs) and some operations that act on it. The VM maintains the store. The program running on it is a sequence of instructions (including control flow). You need to define the set of instructions. I'd suggest a simple set to start with, like:

  • basic arithmetic operations, including arithmetic comparisons, accessing the store
  • basic control flow
  • built-in print

You may want to use a stack-based computation approach to arithmetic, as many VMs do. There isn't yet much dynamic in the above. To get to that we want two things: the ability to compute the names of variables at runtime (this just means string operations), and some treatment of code as data. This might be as simple as allowing function references.

Input to the VM would ideally be in bytecode. If you haven't got a compiler yet this could be generated from a basic assembler (which could be part of the VM).

The VM itself consists of the loop:

1. Look at the bytecode instruction pointed to by the instruction pointer.
2. Execute the instruction:
   * If it's an arithmetic instruction, update the store accordingly.
   * If it's control flow, perform the test (if there is one) and set the instruction pointer.
   * If it's print, print a value from the store.
3. Advance the instruction pointer to the next instruction.
4. Repeat from 1.

Dealing with computed variable names might be tricky: an instruction needs to specify which variables the computed names are in. This could be done by allowing instructions to refer to a pool of string constants provided in the input.

An example program (in assembly and bytecode):

offset  bytecode (hex)   source
 0      01 05 0E         //      LOAD 5, .x
 3      01 03 10         // .l1: LOAD 3, .y
 6      02 0E 10 0E      //      ADD .x, .y, .x
10      03 0E            //      PRINT .x
12      04 03            //      GOTO .l1
14      78 00            //      .x: "x"
16      79 00            //      .y: "y"

The instruction codes implied are:

"LOAD x, k" (01 x k) Load single byte x as an integer into variable named by string constant at offset k.
"ADD k1, k2, k3" (02 v1 v2 v3) Add two variables named by string constants k1 and k2 and put the sum in variable named by string constant k3.
"PRINT k" (03 k) Print variable named by string constant k.
"GOTO a" (04 a) Go to offset given by byte a.

You need variants for when variables are named by other variables, etc. (and the levels of indirection get tricky to reason about). The assembler looks at the arguments like "ADD .x, .y, .x" and generates the correct bytecode for adding from string constants (and not computed variables).

朱染 2024-08-24 11:28:17

好吧,这不是关于用 C 实现虚拟机,但由于这是我在看到这个问题之前打开的最后一个选项卡,我觉得我需要指出一个 关于使用 标签在 JavaScript 中实现 QBASIC 字节码编译器和虚拟机的文章进行显示。它包含了足够的 QBASIC 实现来运行“nibbles”游戏的所有源代码,并且是有关编译器和字节码解释器的系列文章中的第一篇;这篇文章描述了虚拟机,并且他承诺未来也会发布描述编译器的文章。

顺便说一句,我没有投票结束你的问题,但你得到的接近投票是 去年的问题,关于如何学习实现虚拟机。我认为这个问题(关于教程或相对简单的问题)与那个问题足够不同,因此它应该保持开放状态,但您可能想参考该问题以获得更多建议。

Well, it's not about implementing a VM in C, but since it was the last tab I had open before I saw this question, I feel like I need point out an article about implementing a QBASIC bytecode compiler and virtual machine in JavaScript using the <canvas> tag for display. It includes all of the source code to get enough of QBASIC implemented to run the "nibbles" game, and is the first in a series of articles on the compiler and bytecode interpreter; this one describes the VM, and he's promising future articles describing the compiler as well.

By the way, I didn't vote to close your question, but the close vote you got was as a duplicate of a question from last year on how to learn about implementing a virtual machine. I think this question (about a tutorial or something relatively simple) is different enough from that one that it should remain open, but you might want to refer to that one for some more advice.

绿萝 2024-08-24 11:28:17

另一个值得关注的资源是 Lua 语言 的实现。它是一个基于寄存器的虚拟机,在性能方面享有盛誉。 源代码采用 ANSI C89 格式,通常非常可读。

与大多数高性能脚本语言一样,最终用户看到的是一种可读的高级动态语言(具有闭包、尾部调用、不可变字符串、数字和哈希表等功能作为主要数据类型,函数作为第一类值等) 。源文本被编译为 VM 的字节码,以便由 VM 实现执行,其概要几乎如 埃德蒙的回答

为了保持虚拟机本身的可移植性和高效性,我们付出了巨大的努力。如果需要更高的性能,可以使用针对 32 位 x86 的从 VM 字节代码到本机指令的即时编译器 ,并且是 64 位的测试版。

Another resource to look at is the implementation of the Lua language. It is a register-based VM that has a good reputation for performance. The source code is in ANSI C89, and is generally very readable.

As with most high performance scripting languages, the end user sees a readable, high level dynamic language (with features like closures, tail calls, immutable strings, numbers and hash tables as the primary data types, functions as first class values, and more). Source text is compiled to the VM's bytecode for execution by a VM implementation whose outline is pretty much as described by Edmund's answer.

A great deal of effort has gone into keeping the implementation of the VM itself both portable and efficient. If even more performance is needed, a just in time compiler from VM byte code to native instructions exists for 32-bit x86, and is in beta release for 64-bit.

孤芳又自赏 2024-08-24 11:28:17

首先(即使不是 C,而是 C++),您可以查看 muParser

它是一个数学表达式解析器,使用简单虚拟机来执行操作。我想即使你也需要时间来理解一切;无论如何,这段代码比能够运行真实完整程序的完整虚拟机更简单。 (顺便说一句,我正在 C# 中设计一个类似的库 - 这是它的早期阶段,但下一个版本将允许编译为.NET/VM IL或者可能是一个新的简单VM,如muParser)。

另一个有趣的事情是 NekoVM (它执行 .n 字节码文件)。这是一个用 C 编写的开源项目主语言(.neko)被认为是由源到源编译器 技术。本着上一个主题的精神,请参阅来自同一作者的Haxe(也是开源的)。

For starting (even if not C, but C++) you could give a look to muParser.

It's a math expression parser that use a simple virtual machine to execute operations. I think that even you need time to understand everything; anyway this code is more simple than a complete VM able to run a real complete program. (By the way, I'm designing a similar lib in C# - it is its early stages but next versions will allow compilation to .NET/VM IL or maybe a new simple VM like muParser).

An other interesting thing is NekoVM (it executes .n bytecode files). It's an open source project written in C and it's main language (.neko) is thought to be generated by source-to-source compiler technology. In the spirit of last topic see Haxe from same author (open source too).

逆光飞翔i 2024-08-24 11:28:17

和您一样,我也一直在研究虚拟机和编译器,我可以推荐的一本好书是 编译器设计:虚拟机。它通过为每个虚拟机提供指令集以及如何为该虚拟机编译高级语言的教程来描述命令式、函数式、逻辑和面向对象语言的虚拟机。我只为命令式语言实现了虚拟机,它已经是一个非常有用的练习了。

如果您刚刚开始,那么我可以推荐的另一个资源是 PL101。它是一组交互式 JavaScript 课程,可指导您完成为各种语言实现解析器和解释器的过程。

Like you I have also been studying virtual machines and compilers and one good book I can recommend is Compiler Design: Virtual Machines. It describes virtual machines for imperative, functional, logic, and object-oriented languages by giving the instruction set for each VM along with a tutorial on how to compile a higher-level language to that VM. I've only implemented the VM for the imperative language and already it has been a very useful exercise.

If you're just starting out then another resource I can recommend is PL101. It is an interactive set of lessons in JavaScript that guides you through the process of implementing parsers and interpreters for various languages.

初吻给了烟 2024-08-24 11:28:17

我参加聚会迟到了,但我会推荐《精通游戏脚本》,它可以帮助您从零开始编写工作脚本语言及其虚拟机。而且几乎没有任何先决条件。

I am late for the party, but I would.recommend Game Scripting Mastery, which takes your hand to write a working script language and its VM from zero. And with very little prerequisite.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文