x86 最快的虚拟机设计是什么?

发布于 2024-07-10 20:20:22 字数 1255 浏览 8 评论 0原文

我将在 x86 中实现一个虚拟机,我想知道什么样的设计会产生最好的结果。 我应该集中注意什么才能挤出果汁? 我将在 x86 汇编中实现整个虚拟机。

我没有太多指示,我可以选择它们的形式。 这些指令直接投射到smalltalk 的语法块中。 我给出了我正在考虑的指令设计:

^ ...       # return
^null     # return nothing
object    # address to object
... selector: ... # message pass (in this case arity:1 selector: #selector:)
var := ... # set
var # get

我正在考虑的虚拟机类型:

mov eax, [esi]
add esi, 2
mov ecx, eax
and eax, 0xff
and ecx, 0xff00 # *256
shr ecx, 5          # *8
jmp [ecx*4 + operations]
align 8:
    operations:
dd retnull
dd ret
# so on...
    retnull:          # jumps here at retnul
# ... retnull action
    ret:
# ... ret action
#etc.

不要开始问为什么我需要另一个虚拟机实现。 解释例程并不是你需要时就可以拿起的库存东西。 您在其他地方建议的大多数虚拟机都注重可移植性和性能成本。 我的目标不是便携性,我的目标是性能。

需要这个解释器的原因是因为 Smalltalk 块最终不会以相同的方式被解释:

A := B subclass: [
    def a:x [^ x*x]
    clmet b [...]
    def c [...]
    def d [...]
]

[ 2 < x ] whileTrue: [...]

(i isNeat) ifTrue: [...] ifFalse: [...]

List fromBlock: [
    "carrots"
    "apples"
    "oranges" toUpper
]

我需要来自解释例程的真正好处,即选择在其中读取程序的上下文。当然,好的编译器应该在大多数情况下编译明显的情况,例如:“ifTrue:ifFalse”或“whileTrue:”,或列表示例。 对口译员的需求并不会消失,因为您总是可能会遇到无法确定该块得到您期望的处理的情况。

I will implement a virtual machine in x86 and I wonder what kind of design would yield best results. What should I concentrate on to squish out the juice? I will to implement the whole virtual machine in x86 assembly.

I haven't much instructions and I can choose their form. The instructions project directly into smalltalk's syntax in blocks. I give out the instruction design I were thinking of:

^ ...       # return
^null     # return nothing
object    # address to object
... selector: ... # message pass (in this case arity:1 selector: #selector:)
var := ... # set
var # get

The sort of VM I were thinking about:

mov eax, [esi]
add esi, 2
mov ecx, eax
and eax, 0xff
and ecx, 0xff00 # *256
shr ecx, 5          # *8
jmp [ecx*4 + operations]
align 8:
    operations:
dd retnull
dd ret
# so on...
    retnull:          # jumps here at retnul
# ... retnull action
    ret:
# ... ret action
#etc.

Don't start asking why I need yet another virtual machine implementation. Interpretive routines aren't stock stuff you just pick up whenever you need them. Most virtual machines you are proposing elsewhere are weighted towards portability with the cost of performance. My goal is not the portability, my goal is the performance.

The reason this interpreter is needed at all is because smalltalk blocks doesn't end up gotten interpreted the same way:

A := B subclass: [
    def a:x [^ x*x]
    clmet b [...]
    def c [...]
    def d [...]
]

[ 2 < x ] whileTrue: [...]

(i isNeat) ifTrue: [...] ifFalse: [...]

List fromBlock: [
    "carrots"
    "apples"
    "oranges" toUpper
]

I need the real benefit coming from the interpretive routines, that is the choice of context where to read the program in. Of course, good compiler should just most of the time compile the obvious cases like: 'ifTrue:ifFalse' or 'whileTrue:', or the list example. The need for interpreter doesn't just disappear because you always may hit a case where you can't be sure the block gets the treatment you expect.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

攒眉千度 2024-07-17 20:20:23

我想问,为什么要创建一个注重性能的虚拟机呢? 为什么不直接写x86代码呢? 没有什么比这更快了。

如果您想要一种非常快速的解释语言,请查看Forth。 他们的设计非常整洁并且很容易复制。

I have to ask, why create a virtual machine with a focus on performance? Why not just write x86 code directly? Nothing can possibly be faster.

If you want a very fast interpreted language, look at Forth. Their design is very tidy and very easy to copy.

梦在深巷 2024-07-17 20:20:23

如果您不喜欢 JIT 并且您的目标不是可移植性。 我想您可能会对 Google NativeClient 项目感兴趣。 他们进行静态分析、沙箱分析等。 它们允许主机执行 RAW x86 指令。

If you does not like JIT and your goal is not the portability. I think you may get interested in Google NativeClient project. They do static analyst, sandboxing and others. They allow the host execute RAW x86 instructions.

一花一树开 2024-07-17 20:20:22

我发现这里对可移植性存在一些困惑,所以我觉得有必要澄清一些问题。 这些是我的拙见,所以你当然可以自由地反对它们。

我假设您遇到了 http://www.complang.tuwien.ac.at/ fort/threading/ 如果您认真考虑编写虚拟机,那么我不会详细讨论所描述的技术。

已经提到,以 VM 为目标具有一些优点,例如减少代码大小、降低编译器复杂性(通常转化为更快的编译)、可移植性(请注意,VM 的要点是语言的可移植性,因此它如果虚拟机本身不可移植也没关系)。

考虑到您的示例的动态特性,您的虚拟机将比其他更流行的编译器更类似于JIT编译器。 所以,虽然S.Lott在这个例子中没有抓住重点,但他对福斯的提及却非常恰到好处。 如果我要为一种非常动态的语言设计一个 VM,我会将解释分为两个阶段;

  1. 生产者阶段,根据需要查询 AST 流并将其转换为更有意义的形式(例如,获取一个块,决定是否应该立即执行或存储在某个地方以供以后执行),可能会引入新类型的代币。 本质上,您可以恢复在解析时可能丢失的上下文敏感信息。

  2. 消费者阶段从 1 获取生成的流并像任何其他机器一样盲目执行它。 如果你像 Forth 那样,你可以只推送一个存储的流并完成它,而不是跳转指令指针。

正如您所说,仅以另一种方式模仿该死的处理器的工作方式并不能实现您所需的任何活力(或任何其他值得该死的功能,例如安全性)。 否则,您将编写一个编译器。

当然,您可以在第 1 阶段添加任意复杂的优化。

I see there is some confusion about portability here, so I feel obliged to clarify matters some. These are my humble opinions so you are, of course, free to object against them.

I assume you came accross http://www.complang.tuwien.ac.at/forth/threading/ if you consider writing a VM seriously, so I won't dwell upon the described techniques.

Already mentioned, targeting a VM has some advantages such as reduced code size, reduced compiler complexity (often translates to faster compilation), portability (note that the point of a VM is portability of the language, so it doesn't matter if the VM itself is not portable).

Considering dynamic nature of your example, your VM will resemble a JIT compiler more than other more popular ones. So, altough S.Lott missed the point in this case, his mentioning of Forth is very on the spot. If I were to design a VM for a very dynamic language, I would separate interpretation into two stages;

  1. A producer stage which consults an AST stream on demand and transforms it to a more meaningful form (for example, taking a block, deciding whether it should be executed right away or stored in somewhere for later execution) possibly introducing new kinds of tokens. Essentially you recover context sensitive information that may be lost in parsing here.

  2. A consumer stage fetching the generated stream from 1 and executes it blindly like any other machine. If you make it Forth like, you can just push a stored stream and be done with it instead of jumping instruction pointer around.

As you say, just mimicking how the damn processor work in another way doesn't accomplish any dynamism (or any other feature worth a damn, like security) that you require. Otherwise, you would be writing a compiler.

Of course, you can add arbitrarily comlex optimizations in stage 1.

つ可否回来 2024-07-17 20:20:22

如果您想要非常快的速度,请尝试使用 LLVM。 它可以根据高级程序描述为大多数处理器生成本机代码。 您可以使用自己的汇编语言,也可以跳过汇编阶段生成 llvm 结构,具体取决于您认为最方便的方式。

我不确定它是否最适合您的问题,但如果我要对无法与程序其余部分一起编译的代码执行一些性能关键的执行,那么它绝对是我会使用的。

If you want something really fast, try using LLVM. It can generate native code for most processors from a high level program description. You can either go with your own assembly language or generate the llvm structure skipping the assembly phase, depending on what you find most convenient.

I'm not sure if it's the best for your problem, but it's definitely what I would use if I would do some performance critical execution of code that can't be compiled with the rest of the program.

思慕 2024-07-17 20:20:22

大多数时候,解释器的重点是可移植性。 我能想到的最快方法是直接在内存中生成 x86 代码,就像 JIT 编译器所做的那样,但是,当然,你不再有解释器了。 你有一个编译器。

但是,我不确定用汇编程序编写解释器会给您带来最佳性能(除非您是汇编程序大师并且您的项目范围非常有限)。 使用高级语言可以帮助您专注于更好的算法,例如符号查找和寄存器分配策略。

The point of an interpreter is portability, most of the time. The fastest approach I can think of is to generate x86 code in memory directly, just like JIT compilers do, but then, of course, you don't have an interpreter anymore. You have a compiler.

However, I'm not sure writing the interpreter in assembler will give you the best performance (unless you're an assembler guru and your project is very limited in scope). Using a higher-level language can help you focus on better algorithms for, say, symbol lookup and register allocation strategies.

后来的我们 2024-07-17 20:20:22

您可以使用未编码的指令集来加速调度例程:

mov eax, [esi]
add esi, 4
add eax, pOpcodeTable
jmp eax

其开销应小于 cpu 上的每次调度 4 个周期 > Pentium 4。

此外,出于性能原因,最好在每个原始例程中递增 ESI (IP),因为递增可以与其他指令配对的机会很高:

mov eax, [esi]
add eax, pOpcodeTable
jmp eax

~ 1-2 周期开销。

you can speed up your dispatch routine with an unencoded instruction set to:

mov eax, [esi]
add esi, 4
add eax, pOpcodeTable
jmp eax

which should have an overhead < 4 cycles for each dispatch on cpu's > Pentium 4.

As addition, for performance reasons it is better to increment ESI (IP) in each primitive routine because the chances are high that the incrementation can be paired with other instructions:

mov eax, [esi]
add eax, pOpcodeTable
jmp eax

~ 1-2 cylces overhead.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文