当前位置：文江博客话题详情

什么是虚拟机？为什么动态语言需要虚拟机？

发布于 2024-10-11 03:20:26 字数 119 浏览 10 评论 0原文

例如，Python 和 Java 有 VM，而 C 和 Haskell 则没有。（错了请指正）

想想线两边都有什么语言，找不到原因。 Java 在很多方面都是静态的，而 Haskell 提供了很多动态特性。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

花开柳相依 2024-10-18 03:20:26

这与静态与动态无关。

相反，它是关于独立于底层硬件平台（理论上“构建一次，到处运行”......）

实际上，它也与语言无关。人们可以编写一个 C 编译器来为 JVM 生成字节码。人们可以编写一个生成 x86 机器代码的 Java 编译器。

回复收藏 0 原文

爱本泡沫多脆弱 2024-10-18 03:20:26

让我们暂时忘记虚拟机（我保证我们会回到下面的内容），并从这个重要事实开始：

C 没有垃圾收集。

对于提供垃圾收集的语言，必须有某种“运行时”/运行时环境/事物来执行它。

这就是为什么 Python、Java 和 Haskell 需要“运行时”，而 C 不需要，可以直接编译为本机代码。

请注意， psyco 是一个将 Python 代码编译为机器代码的 Python 优化器，但是，很多机器代码包含对 C-Python 运行时函数的调用，例如 PyImport_AddModule、PyImport_GetModuleDict 等。Haskell

/GHC 与 psyco 编译的 Python 类似。 Int 作为简单的机器指令添加，但分配对象等更复杂的东西，调用运行时。

还有什么？

C 没有“异常”

如果我们要向 C 添加异常，我们生成的机器代码将需要为每个函数和每个函数调用做一些事情。

如果我们也添加“闭包”，就会添加更多的东西。

现在，我们可以让它调用子过程来执行必要的操作，而不是在每个函数中重复这个样板机器代码，例如 PyErr_Occurred。

所以现在，基本上每个原始源代码行都映射到对某些函数和较小的独特部分的一些调用。

但只要我们在每个原始源代码行上做了这么多事情，为什么还要费心去处理机器代码呢？

这是一个想法（顺便说一句，我们称这个想法为“虚拟机”）。

让我们代表您的 Python 代码，例如：

def has_no_letters(text):
  return text.upper() == text.lower()

作为内存中数据结构，例如：

{ 'func_name': 'has_no_letters',
  'num_args': 1,
  'kwargs': [],
  'codez': [
    ('get_attr', 'tmp_a', 'arg_0', 'upper'),  # tmp_a = arg_0.upper
    ('func_call', 'tmp_b', 'tmp_a', []),  # tmp_b = tmp_a() # tmp_b = arg_0.upper()
    ('get_attr', 'tmp_c', 'arg_0', 'lower'),
    ('func_call', 'tmp_d', 'tmp_c', []),
    ('get_global', 'tmp_e', '=='),
    ('func_call', 'tmp_f', 'tmp_e', ['tmp_b', 'tmp_d']),
    ('return', 'tmp_f'),
  ]
}

现在，让我们编写一个执行此内存中数据结构的解释器。

让我们讨论一下与直接从文本解释器相比的好处，然后讨论与编译为机器代码相比的好处。

VM 相对于直接文本解释器的优势

VM 系统会在执行代码之前为您提供所有语法错误。
在评估循环时，VM 系统不会在每次运行时解析源代码。
- 使虚拟机比直接文本解释器更快。
- 因此，直接解释器在长变量名时运行速度较慢，而在短变量名时运行速度较快。这鼓励人们编写蹩脚的数学家风格的代码，例如 wt(f, d(o, e), s) <= th(i, s) + cr(a, p * d + o)代码>

虚拟机相对于编译为机器代码的好处

描述程序的内存数据结构或“虚拟机代码”可能比完整的机器代码紧凑得多，后者一次又一次地执行相同的操作原始代码行。这将使虚拟机系统运行得更快，因为需要从内存中获取的“指令”更少。
创建虚拟机比创建机器代码编译器要简单得多。您现在甚至不需要知道任何汇编/机器代码就可以做到这一点。

Let's forget about VMs for a sec (we'll get back to those below, I promise), and start with this important fact:

C doesn't have garbage collection.

For a language to provide garbage collection, there has to be some sort of "runtime"/runtime-environment/thing that will perform it.

That's why Python, Java, and Haskell require a "runtime", and C, which does not, can just straight-forwardly compile to native code.

Note that psyco was a Python optimizer that compiled Python code to machine code, however, a lot of that machine code consisted of calls to C-Python's runtime's functions, such as PyImport_AddModule, PyImport_GetModuleDict, etc.

Haskell/GHC is in a similar boat to psyco-compiled Python. Ints are added as simple machine instructions, but more complicated stuff which allocate objects etc, invoke the runtime.

What else?

C doesn't have "exceptions"

If we were to add exceptions to C, our generated machine code would need to do some stuff for every function and for every function call.

If we then add "closures" as well, there would be more stuff added.

Now, instead of having this boilerplate machine code repeated in every function, we could make it instead call a subprocedure to do the necessary stuff, something like PyErr_Occurred.

So now, basically every original source line maps to some calls to some functions and a smaller unique part.

But as long as we're doing so much stuff per original source code line, why even bother with machine code?

Here's an idea (btw let's call this idea a "Virtual Machine").

Let's represent your Python code, which is for example:

def has_no_letters(text):
  return text.upper() == text.lower()

As an in-memory data-structure, for example:

{ 'func_name': 'has_no_letters',
  'num_args': 1,
  'kwargs': [],
  'codez': [
    ('get_attr', 'tmp_a', 'arg_0', 'upper'),  # tmp_a = arg_0.upper
    ('func_call', 'tmp_b', 'tmp_a', []),  # tmp_b = tmp_a() # tmp_b = arg_0.upper()
    ('get_attr', 'tmp_c', 'arg_0', 'lower'),
    ('func_call', 'tmp_d', 'tmp_c', []),
    ('get_global', 'tmp_e', '=='),
    ('func_call', 'tmp_f', 'tmp_e', ['tmp_b', 'tmp_d']),
    ('return', 'tmp_f'),
  ]
}

Now, let's write an interpreter that executes this in-memory data structure.

Let's discuss the benefits of this over direct-from-text-interpreters, and then the benefits over compiling to machine code.

The benefits of VMs over direct-from-text-interpreters

The VM system gives you all the syntax errors before executing the code.
When evaluating a loop, a VM system doesn't parse the source code each time it runs.
- Making the VM faster than the direct-from-text-interpreter.
- So the direct interpreter runs slower with long variable name, and faster with short variable names. This encourages people to write crappy mathematician-style code such as wt(f, d(o, e), s) <= th(i, s) + cr(a, p * d + o)

The benefits of VMs over compiling to machine code

The in-memory data structure describing the program, or the "VM code", will probably be much more compact than boilerplate-full machine code which does the same stuff again and again for every original line of code. This will make the VM system run faster because less "instructions" will need to be fetched from memory.
Creating a VM is much simpler than creating a compiler to machine code. You can probably do this now without even knowing any assembly/machine-code.

回复收藏 0 原文