VM 设计:更多操作码还是更少操作码? 什么是更好的?

发布于 2024-07-24 02:40:52 字数 3700 浏览 13 评论 0原文

别感到震惊。 这是很多文字,但恐怕如果不提供一些详细信息,我就无法真正展示这一切(并且可能会得到很多并不能真正解决我的问题的答案)。 这绝对不是一项任务(正如有人在评论中可笑地声称的那样)。

先决条件

由于除非至少设置了一些先决条件,否则这个问题可能根本无法回答,因此先决条件如下:

  • 应解释虚拟机代码。 并不禁止存在 JIT 编译器,但设计应该针对解释器。
  • VM 应基于寄存器,而不是基于堆栈。
  • 答案既不能假设有一组固定的寄存器,也不能假设它们的数量是无限的,无论是哪种情况。

此外,我们需要对“更好”有一个更好的定义。 有几个属性必须考虑:

  1. 磁盘上 VM 代码的存储空间。 当然,您始终可以放弃此处的所有优化并仅压缩代码,但这会对 (2) 产生负面影响。
  2. 解码速度。 如果需要很长时间才能将代码转换为可以直接执行的代码,那么存储代码的最佳方式也是无用的。
  3. 内存中的存储空间。 无论是否进一步解码,此代码都必须可直接执行,但如果涉及进一步解码,则在执行期间和每次执行指令时完成此编码(在加载代码计数到第 2 项时仅进行一次解码)。
  4. 代码的执行速度(考虑常见的解释器技术)。
  5. VM 的复杂性以及为其编写解释器的难度。
  6. VM 自身所需的资源量。 (如果虚拟机运行的代码大小为 2 KB 并且执行速度比眨眼的速度快,那么这不是一个好的设计,但是它需要 150 MB 才能执行此操作,并且其启动时间远远高于代码的运行时间它执行)

现在举例说明我所说的或多或少操作码的实际含义。 操作码的数量可能看起来像是实际设置的,因为每个操作都需要一个操作码。 然而这并不那么容易。

同一操作的多个操作码

您可以进行类似

ADD R1, R2, R3

将 R1 和 R2 的值相加,并将结果写入 R3 的操作。 现在考虑以下特殊情况:

ADD R1, R2, R2
ADD R1, 1, R1

这些是许多应用程序中常见的操作。 您可以使用已经存在的操作码来表达它们(除非您需要不同的操作码,因为最后一个操作码具有 int 值而不是寄存器)。 但是,您也可以为这些创建特殊的操作码:

ADD2 R1, R2
INC R1

与以前相同。 优势在哪里? ADD2 只需要两个参数,而不是 3 个,INC 甚至只需要一个。 因此,这可以在磁盘和/或内存中更紧凑地编码。 由于将其中一种形式转换为另一种形式也很容易,因此解码步骤可以在两种方式之间进行转换以表达这些语句。 不过,我不确定这两种形式会对执行速度产生多大影响。

将两个操作码合并为一个操作码

现在假设您有一个 ADD_RRR(R 代表寄存器)和一个 LOAD 将数据加载到寄存器中。

LOAD value, R2
ADD_RRR R1, R2, R3

您可以拥有这两个操作码,并始终在整个代码中使用这样的结构...或者您可以将它们组合成一个新的操作码,名为 ADD_RMR(M 表示内存)

ADD_RMR R1, value, R3

数据类型与操作码

假设您有 16 位整数和 32 位整数作为原生类型。 寄存器是 32 位,因此任一数据类型都适合。 现在,当您添加两个寄存器时,您可以将数据类型作为参数:

ADD int16, R1, R2, R3
ADD int32, R1, R2, R3

例如,对于有符号和无符号整数也是如此。 这样,ADD 可以是一个短操作码,一个字节,然后你有另一个字节(或者可能只是 4 位)告诉 VM 如何解释寄存器(它们保存 16 位还是 32 位值)。 或者,您可以废弃类型编码,而使用两个操作码:

ADD16 R1, R2, R3
ADD32 R1, R2, R3

有些人可能会说两者完全相同 - 只需将第一种方式解释为 16 位操作码即可。 是的,但是一个非常天真的翻译可能看起来完全不同。 例如,如果每个操作码有一个函数并使用 switch 语句进行分派(这不是最好的方法,函数调用开销,switch 语句可能也不是最佳的,我知道),那么两个操作码可能如下所示:

case ADD16: add16(p1, p2, p3); break; // pX pointer to register
case ADD32: add32(p1, p2, p3); break;

并且每个函数都居中围绕某种添加。 第二个可能看起来像这样:

case ADD: add(type, p1, p2, p3); break;

// ...
// and the function

void add (enum Type type, Register p1, Register p2, Register p3)
{
    switch (type) {
       case INT16: //...
       case INT32: // ...
    }
}

将子交换机添加到主交换机或将子调度表添加到主调度表。 当然,无论类型是否显式,解释器都可以采用任何一种方式,但根据操作码设计,任何一种方式都会让开发人员感觉更原生。

元操作码

由于缺乏更好的名称,我就这样称呼它们。 这些操作码本身没有任何意义,它们只是改变了后面的操作码的含义。 就像著名的 WIDE 运算符一样:

ADD R1, R2, R3
WIDE
ADD R1, R2, R3

例如,在第二种情况下,寄存器是 16 位(因此您可以添加更多寄存器),在第一种情况下只有 8 位。或者,您不能拥有这样的元操作码,而是拥有 ADD 和 ADD_WIDE 操作码。 像 WIDE 这样的元操作码避免使用 SUB_WIDE、MUL_WIDE 等,因为您始终可以在每个其他正常操作码前加上 WIDE(始终只有一个操作码)。 缺点是操作码本身变得毫无意义,您始终必须检查它之前的操作码是否是元操作码。 此外,VM 必须为每个线程存储一个额外的状态(例如,无论我们现在是否处于宽模式),并在下一条指令后再次删除该状态。 甚至 CPU 也有这样的操作码(例如 x86 LOCK 操作码)。

如何找到一个好的权衡???

当然,您拥有的操作码越多,开关/调度表就会变得越大,并且在磁盘或内存中表达这些代码所需的位就越多(尽管您可以更有效地将它们存储在数据不存在的磁盘上)必须可由虚拟机直接执行); 虚拟机也将变得更加复杂并且有更多的代码行 - 另一方面,操作码的功能越强大:您越来越接近每个表达式(即使是复杂的表达式)最终都会出现在一个操作码中的情况。

选择很少的操作码可以很容易地对虚拟机进行编码,并且我猜会导致非常紧凑的操作码 - 另一方面,这意味着您可能需要大量的操作码来执行简单的任务,并且每个不经常使用的表达式都必须成为某种(本机)函数调用,因为没有操作码可以用于它。

我在互联网上阅读了很多有关各种虚拟机的信息,但没有任何来源能够真正做出良好且公平的权衡。 设计虚拟机就像设计CPU一样,有些CPU的操作码很少,它们速度很快,但你也需要很多这样的CPU。 有些 CPU 具有许多操作码,有些操作码非常慢,但表达同一段代码所需的操作码要少得多。 看起来“操作码越多越好”的CPU已经完全赢得了消费市场,而“操作码越少越好”的CPU只能在服务器市场或超级计算机业务的某些部分生存。 虚拟机呢?

Don't be shocked. This is a lot of text but I'm afraid without giving some detailed information I cannot really show what this is all about (and might get a lot of answers that don't really address my question). And this definitely not an assignment (as someone ridiculously claimed in his comment).

Prerequisites

Since this question can probably not be answered at all unless at least some prerequisites are set, here are the prerequisites:

  • The Virtual Machine code shall be interpreted. It is not forbidden that there may be a JIT compiler, but the design should target an interpreter.
  • The VM shall be register based, not stack based.
  • The answer may neither assume that there is a fixed set of registers nor that there is an unlimited number of them, either one may be the case.

Further we need a better definition of "better". There are a couple of properties that must be considered:

  1. The storage space for the VM code on disk. Of course you could always scrap all optimizations here and just compress the code, but this has a negative effect on (2).
  2. Decoding speed. The best way to store the code is useless if it takes too long to transform that into something that can be directly executed.
  3. The storage space in memory. This code must be directly executable either with or without further decoding, but if there is further decoding involved, this encoding is done during execution and each time the instruction is executed (decoding done only once when loading the code counts to item 2).
  4. The execution speed of the code (taking common interpreter techniques into account).
  5. The VM complexity and how hard it is to write an interpreter for it.
  6. The amount of resources the VM needs for itself. (It is not a good design if the code the VM runs is 2 KB in size and executes faster than the wink of an eye, however it needs 150 MB to do this and its start up time is far above the run time of the code it executes)

Now examples what I actually mean by more or less opcodes. It may look like the number of opcodes is actually set, as you need one opcode per operation. However its not that easy.

Mulitple Opcodes for the Same Operation

You can have an operation like

ADD R1, R2, R3

adding the values of R1 and R2, writing the result to R3. Now consider the following special cases:

ADD R1, R2, R2
ADD R1, 1, R1

These are common operations you'll find in a lot of applications. You can express them with the already existing opcode (unless you need a different one because the last one has an int value instead of a register). However, you could also create special opcodes for these:

ADD2 R1, R2
INC R1

Same as before. Where's the advantage? ADD2 only needs two arguments, instead of 3, INC even only needs a single one. So this could be encoded more compact on disk and/or in memory. Since it is also easy to transform either form to the other one, the decoding step could transform between both ways to express these statements. I'm not sure how much either form will influence execution speed, though.

Combining Two Opcodes Into a Single One

Now let's assume you have an ADD_RRR (R for register) and a LOAD to load data into an register.

LOAD value, R2
ADD_RRR R1, R2, R3

You can have these two opcodes and always use constructs like this throughout your code... or you can combine them into a single new opcode, named ADD_RMR (M for memory)

ADD_RMR R1, value, R3

Data Types vs Opcodes

Assume you have 16 Bit integer and 32 Bit integer as native types. Registers are 32 Bit so either data type fits. Now when you add two registers, you could make the data type a parameter:

ADD int16, R1, R2, R3
ADD int32, R1, R2, R3

Same is true for a signed and unsigned integers for example. That way ADD can be a short opcode, one byte, and then you have another byte (or maybe just 4 Bit) telling the VM how to interpret the registers (do they hold 16 Bit or 32 Bit values). Or you can scrap type encoding and instead have two opcodes:

ADD16 R1, R2, R3
ADD32 R1, R2, R3

Some may say both are exactly the same - just interpreting the first way as 16 Bit opcodes would work. Yes, but a very naive interpreter might look quite different. E.g. if it has one function per opcode and dispatches using a switch statement (not the best way doing it, function calling overhead, switch statement maybe not optimal either, I know), the two opcodes could look like this:

case ADD16: add16(p1, p2, p3); break; // pX pointer to register
case ADD32: add32(p1, p2, p3); break;

and each function is centered around a certain kind of add. The second one though may look like this:

case ADD: add(type, p1, p2, p3); break;

// ...
// and the function

void add (enum Type type, Register p1, Register p2, Register p3)
{
    switch (type) {
       case INT16: //...
       case INT32: // ...
    }
}

Adding a sub-switch to a main switch or a sub dispatch table to a main dispatch table. Of course an interpreter can do either way regardless if types are explicit or not, but either way will feel more native to developers depending on opcode design.

Meta Opcodes

For lack of a better name I'll call them that way. These opcodes have no meaning at all on their own, they just change the meaning of the opcode following. Like the famous WIDE operator:

ADD R1, R2, R3
WIDE
ADD R1, R2, R3

E.g. in the second case the registers are 16 Bit (so you can addnress more of them), in the first one only 8. Alternatively you can not have such a meta opcode and have an ADD and an ADD_WIDE opcode. Meta opcodes like WIDE avoid having a SUB_WIDE, MUL_WIDE, etc. as you can always prepend every other normal opcode with WIDE (always just one opcode). Disadvantage is that an opcode alone becomes meaningless, you always must check the opcode before it if it was a meta opcode or not. Further the VM must store an extra state per thread (e.g. whether we are now in wide mode or not) and remove the state again after the next instruction. Even CPUs have such opcodes (e.g. x86 LOCK opcode).

How to Find a Good Trade-Off???

Of course the more opcodes you have, the bigger switches/dispatch-tables will become and the more bits you will need to express these codes on disk or in memory (though you can maybe store them more efficiently on disk where the data doesn't have to be directly executable by a VM); also the VM will become more complicated and have more lines of code - on the other hand the more powerful the opcodes are: You are getting closer to the point where every expression, even a complex one, will end up in one opcode.

Choosing little opcodes makes it easy to code the VM and will lead to very compact opcodes I guess - on the other hand it means you may need a very high number of opcodes to perform a simple task and every not extremely often used expression will have to become a (native) function call of some kind, as no opcode can be used for it.

I read a lot about all kind of VMs on the Internet, but no source was really making a good and fair trade-off going either way. Designing a VM is like designing a CPU, there are CPUs with little opcodes, they are fast, but you also need many of these. And there are CPUs with many opcodes, some are very slow, but you'll need much less of them to express the same piece of code. It looks like the "more opcodes are better" CPUs have totally won the consumer market and the "less opcodes are better" ones can only survive in some parts of the server market or super computer business. What about VMs?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

橙幽之幻 2024-07-31 02:40:52

老实说,我认为这很大程度上取决于虚拟机的用途,类似于处理器设计很大程度上取决于处理器的主要用途。

换句话说,您最好能够确定虚拟机的常见用例场景,以便您可以建立可能需要的功能,也可以建立不太常见的功能。

当然,我确实理解,您可能正在设想一个抽象的、非常通用的虚拟机,它可以用作其他编程语言的内部/后端实现?

然而,我认为,重要的是要认识到并强调,实际上不存在任何事物的“通用理想”实现,即一旦你保持事物通用和抽象,你将不可避免地面临需要做出妥协的情况。

理想情况下,这些妥协将基于代码的现实使用场景,因此这些妥协实际上基于您无需冒险即可做出的充分知情的假设和简化。

换句话说,我会考虑你的虚拟机的目标是什么?
它主要如何用于您的愿景中?
您想实现什么目标?

这将帮助您提出要求并帮助您进行简化,以便您可以根据合理的假设来设计指令集。

如果您希望您的虚拟机主要由编程语言用于数字运算,您可能需要通过提供大量低级原语并支持广泛的数据类型来寻找具有数学运算的相当强大的基础。

另一方面,如果您将服务器作为面向对象语言的后端,您将需要研究优化相应的低级指令(即哈希/字典)。

一般来说,我建议在一开始就保持指令集尽可能简单和直观,并且只有在证明将它们放在适当的位置确实有用(即配置文件和操作码转储)并且确实会导致性能增益。 因此,这在很大程度上取决于您的虚拟机将拥有的第一个“客户”。

如果您确实渴望研究更多涉及的方法,您甚至可以考虑在运行时动态优化指令集,使用模式匹配来查找字节码中常见的操作码,以便派生出更抽象的实现,以便您可以转换使用自定义的、运行时生成的操作码动态地生成您的字节码。

To be honest, I think it's largely a matter of the purpose of the VM, similar to how the processor design is largely determined by how the processor is primarily meant to be used.

In other words, you'll preferably be able to determine common use case scenarios for your VM, so that you can establish features that are likely going to be required, and also establish those that are unlikely to be very commonly required.

Of course I do understand, that you are probably envisioning an abstract, very generic, Virtual Machine, that can be used as the internal/backend implementation for other programming languages?

However, I feel, it's important to realize and to emphasize that there really is no such thing as a "generic ideal" implementation of anything, i.e. once you keep things generic and abstract you will inevitably face a situation where you need to make compromises.

Ideally, these compromises will be based on real life use scenarios for your code, so that these compromises are actually based on well-informed assumptions and simplifications that you can make without going out on a limb.

In other words, I would think about what are the goals for your VM?
How is it primarily going to be used in your vision?
What are the goals you want to achieve?

This will help you come up with requirements and help you make simplifcations, so that you can design your instruction set based on reasonable assumptions.

If you expect your VM to be primarily used by programming languages for numbers crunching, you'll probably want to look for a fairly powerful foundation with maths operations, by providing lots of low level primitives, with support for wide data types.

If on the other hand, you'll server as the backend for OO languages, you will want to look into optimizing the corresponding low level instructions (i.e. hashes/dictionaries).

In general, I would recommend to keep the instruction set as simple and intuitive as possible in the beginning, and only add special instructions once you have proven that having them in place is indeed useful (i.e. profile & opcode dumps) and does cause a performance gain. So, this will be largely determine by the very first "customers" your VM will have.

If you are really eager to research more involved approaches, you could even look into dynamically optimizing the instruction set at runtime, using pattern matching to find common occurrences of opcodes in your bytecode, in order to derive more abstract implementations, so that your can transform your bytecode dynamically with custom, runtime-generated, opcodes.

小伙你站住 2024-07-31 02:40:52

对于软件性能而言,如果所有操作码都具有相同的长度,则更容易,因此您可以拥有一个巨大的 switch 语句,而不必检查可能已由前面的修饰符操作码设置的各种选项位。

我认为您没有问到的两件事是编写将编程语言转换为 VM 代码的编译器的难易性以及编写执行 VM 代码的解释器的难易性。 使用更少的操作码,这两种方法都更容易。 (但不要太少。例如,如果您省略除法操作码,那么您将有机会学习如何编写良好的除法函数。好的除法函数比简单的函数困难得多。)

For software performance it's easier if all opcodes are the same length, so you can have one gigantic switch statement and not have to examine various option bits that might have been set by preceding modifier opcodes.

Two matters that I think you didn't ask about are ease of writing compilers that translate programming languages to your VM code and ease of writing interpreters that execute your VM code. Both of these are easier with fewer opcodes. (But not too few. For example if you omit a divide opcode then you get an opportunity to learn how to code good division functions. Good ones are far harder than simple ones.)

紫南 2024-07-31 02:40:52

我更喜欢简约的指令集,因为它们可以组合成一个操作码。 例如,由两个 4 位指令字段组成的操作码可以使用 256 个条目的跳转表进行分派。 由于调度开销是解释性能的主要瓶颈,因此性能增加了两倍,因为只需要调度每隔一条指令。 实现简约但有效的指令集的一种方法是累加器/存储设计。

I prefer minimalistic instruction-sets because there can be combined into one opcode. For example an opcode consisting of two 4 bit instruction fields can be dispatched with an 256 entry jump-table. As dispatch overhead is the main bottleneck in interpretation perfomance increased by an factor ~ two because only every second instruction needs to be dispatched. One way to implement an minimalistic but effective instruction set would be an accumulator/store design.

逆蝶 2024-07-31 02:40:52

本质上是原子的,操作码更少。

但是,如果经常使用某些操作码的组合,则将其添加为单个指令。

例如,许多高级 PL 具有更简单的“if”和“goto”指令,但它们也具有组合的“while”、“for”、“do-while”或“repeat-until”指令,基于按照前面的说明。

Less opcodes, atomic, in nature.

But, if a combination, of some opcodes, is used frequently, added as a single instruction.

For example, a lot, of Higher PL have the simpler "if" and "goto" instructions, yet, they also have the composed "while", "for", "do-while" or " repeat-until" instructions, based on the previous instructions.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文