当前位置：文江博客话题详情

实现 VM 的教程/资源

发布于 2024-08-17 11:28:17 字数 164 浏览 13 评论 0原文

我想要自学目的为动态语言实现一个简单的虚拟机，更喜欢用 C 语言。类似于 Lua VM、Parrot 或 Python VM，但更简单。除了查看现有虚拟机的代码和设计文档之外，是否有任何好的资源/教程来实现这一目标？

编辑：为什么要关闭投票？我不明白-这不是编程吗？如果我的问题有具体问题，请评论。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

橪书 2024-08-24 11:28:17

我假设您想要一个虚拟机而不仅仅是一个解释器。我认为它们是连续体上的两个点。解释器的工作接近于程序的原始表示。虚拟机运行在更原始（且独立）的指令上。这意味着您需要一个编译阶段将一个阶段转换为另一个阶段。我不知道您是否想先解决这个问题，或者您是否已经想到了输入语法。

对于动态语言，您需要在某个地方存储数据（作为键/值对）和一些对其进行操作的操作。 VM 维护存储。其上运行的程序是指令序列（包括控制流）。您需要定义指令集。我建议从一个简单的集合开始，例如：

基本算术运算，包括算术比较、访问存储
基本控制流
内置打印

您可能希望使用基于堆栈的计算方法来进行算术，就像许多虚拟机所做的那样。上面还没有太多动态。为此，我们需要两件事：在运行时计算变量名称的能力（这仅意味着字符串操作），以及将代码作为数据处理。这可能就像允许函数引用一样简单。

虚拟机的输入最好是字节码。如果您还没有编译器，则可以从基本汇编器（可能是虚拟机的一部分）生成它。

VM 本身由循环组成：

1. Look at the bytecode instruction pointed to by the instruction pointer.
2. Execute the instruction:
   * If it's an arithmetic instruction, update the store accordingly.
   * If it's control flow, perform the test (if there is one) and set the instruction pointer.
   * If it's print, print a value from the store.
3. Advance the instruction pointer to the next instruction.
4. Repeat from 1.

处理计算变量名称可能很棘手：指令需要指定计算名称所在的变量。这可以通过允许指令引用输入中提供的字符串常量池来完成。

示例程序（汇编和字节码）：

offset  bytecode (hex)   source
 0      01 05 0E         //      LOAD 5, .x
 3      01 03 10         // .l1: LOAD 3, .y
 6      02 0E 10 0E      //      ADD .x, .y, .x
10      03 0E            //      PRINT .x
12      04 03            //      GOTO .l1
14      78 00            //      .x: "x"
16      79 00            //      .y: "y"

隐含的指令代码是：

"LOAD x, k" (01 x k) Load single byte x as an integer into variable named by string constant at offset k.
"ADD k1, k2, k3" (02 v1 v2 v3) Add two variables named by string constants k1 and k2 and put the sum in variable named by string constant k3.
"PRINT k" (03 k) Print variable named by string constant k.
"GOTO a" (04 a) Go to offset given by byte a.

当变量由其他变量命名时，您需要变体等（并且间接级别很难推理）。汇编器查看“ADD .x、.y、.x”等参数，并生成正确的字节码以从字符串常量（而不是计算变量）进行添加。

I assume you want a virtual machine rather than a mere interpreter. I think they are two points on a continuum. An interpreter works on something close to the original representation of the program. A VM works on more primitive (and self-contained) instructions. This means you need a compilation stage to translate the one to the other. I don't know if you want to work on that first or if you even have an input syntax in mind yet.

For a dynamic language, you want somewhere that stores data (as key/value pairs) and some operations that act on it. The VM maintains the store. The program running on it is a sequence of instructions (including control flow). You need to define the set of instructions. I'd suggest a simple set to start with, like:

basic arithmetic operations, including arithmetic comparisons, accessing the store
basic control flow
built-in print

You may want to use a stack-based computation approach to arithmetic, as many VMs do. There isn't yet much dynamic in the above. To get to that we want two things: the ability to compute the names of variables at runtime (this just means string operations), and some treatment of code as data. This might be as simple as allowing function references.

Input to the VM would ideally be in bytecode. If you haven't got a compiler yet this could be generated from a basic assembler (which could be part of the VM).

The VM itself consists of the loop:

1. Look at the bytecode instruction pointed to by the instruction pointer.
2. Execute the instruction:
   * If it's an arithmetic instruction, update the store accordingly.
   * If it's control flow, perform the test (if there is one) and set the instruction pointer.
   * If it's print, print a value from the store.
3. Advance the instruction pointer to the next instruction.
4. Repeat from 1.

Dealing with computed variable names might be tricky: an instruction needs to specify which variables the computed names are in. This could be done by allowing instructions to refer to a pool of string constants provided in the input.

An example program (in assembly and bytecode):

offset  bytecode (hex)   source
 0      01 05 0E         //      LOAD 5, .x
 3      01 03 10         // .l1: LOAD 3, .y
 6      02 0E 10 0E      //      ADD .x, .y, .x
10      03 0E            //      PRINT .x
12      04 03            //      GOTO .l1
14      78 00            //      .x: "x"
16      79 00            //      .y: "y"

The instruction codes implied are:

"LOAD x, k" (01 x k) Load single byte x as an integer into variable named by string constant at offset k.
"ADD k1, k2, k3" (02 v1 v2 v3) Add two variables named by string constants k1 and k2 and put the sum in variable named by string constant k3.
"PRINT k" (03 k) Print variable named by string constant k.
"GOTO a" (04 a) Go to offset given by byte a.

You need variants for when variables are named by other variables, etc. (and the levels of indirection get tricky to reason about). The assembler looks at the arguments like "ADD .x, .y, .x" and generates the correct bytecode for adding from string constants (and not computed variables).

回复收藏 0 原文

朱染 2024-08-24 11:28:17

好吧，这不是关于用 C 实现虚拟机，但由于这是我在看到这个问题之前打开的最后一个选项卡，我觉得我需要指出一个关于使用标签在 JavaScript 中实现 QBASIC 字节码编译器和虚拟机的文章进行显示。它包含了足够的 QBASIC 实现来运行“nibbles”游戏的所有源代码，并且是有关编译器和字节码解释器的系列文章中的第一篇；这篇文章描述了虚拟机，并且他承诺未来也会发布描述编译器的文章。

顺便说一句，我没有投票结束你的问题，但你得到的接近投票是去年的问题，关于如何学习实现虚拟机。我认为这个问题（关于教程或相对简单的问题）与那个问题足够不同，因此它应该保持开放状态，但您可能想参考该问题以获得更多建议。

回复收藏 0 原文

绿萝 2024-08-24 11:28:17

另一个值得关注的资源是 Lua 语言的实现。它是一个基于寄存器的虚拟机，在性能方面享有盛誉。源代码采用 ANSI C89 格式，通常非常可读。

与大多数高性能脚本语言一样，最终用户看到的是一种可读的高级动态语言（具有闭包、尾部调用、不可变字符串、数字和哈希表等功能作为主要数据类型，函数作为第一类值等）。源文本被编译为 VM 的字节码，以便由 VM 实现执行，其概要几乎如埃德蒙的回答。

为了保持虚拟机本身的可移植性和高效性，我们付出了巨大的努力。如果需要更高的性能，可以使用针对 32 位 x86 的从 VM 字节代码到本机指令的即时编译器，并且是 64 位的测试版。

回复收藏 0 原文

孤芳又自赏 2024-08-24 11:28:17

首先（即使不是 C，而是 C++），您可以查看 muParser。

它是一个数学表达式解析器，使用简单虚拟机来执行操作。我想即使你也需要时间来理解一切；无论如何，这段代码比能够运行真实完整程序的完整虚拟机更简单。（顺便说一句，我正在 C# 中设计一个类似的库 - 这是它的早期阶段，但下一个版本将允许编译为.NET/VM IL或者可能是一个新的简单VM，如muParser）。

另一个有趣的事情是 NekoVM （它执行 .n 字节码文件）。这是一个用 C 编写的开源项目主语言（.neko）被认为是由源到源编译器技术。本着上一个主题的精神，请参阅来自同一作者的 Haxe（也是开源的）。