虚拟机中的函数调用杀死性能

发布于 2024-10-12 17:36:51 字数 241 浏览 8 评论 0原文

我用 C 语言编写了一个虚拟机,它有一个调用表,其中填充了指向提供虚拟机操作码功能的函数的指针。当虚拟机运行时,它首先解释程序,为所提供的操作码创建与调用表中适当函数相对应的索引数组。然后它循环遍历数组,调用每个函数,直到到达末尾。

每条指令都非常小,通常只有一行。非常适合内联。问题是编译器不知道何时调用虚拟机的任何指令,因为它是在运行时决定的,因此它无法内联它们。函数调用和参数传递的开销正在降低我的虚拟机的性能。关于如何解决这个问题有什么想法吗?

I wrote a virtual machine in C, which has a call table populated by pointers to functions that provide the functionality of the VM's opcodes. When the virtual machine is run, it first interprets a program, creating an array of indexes corresponding to the appropriate function in the call table for the opcode provided. It then loops through the array, calling each function until it reaches the end.

Each instruction is extremely small, typically one line. Perfect for inlining. The problem is that the compiler doesn't know when any of the virtual machine's instructions are going to be called, as it's decided at runtime, so it can't inline them. The overhead of function calls and argument passing is killing the performance of my VM. Any ideas on how to get around this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

挽梦忆笙歌 2024-10-19 17:36:51

以下是减少开销的一些选项:

  1. 将函数声明为 fastcall (或类似的内容)以减少参数传递的开销
  2. 使用大型 switch-case 而不是函数指针表(编译器将 。
  3. 将 VM 过程的所有代码复制到一个位置,以便它可以按顺序运行,而不是在每条指令后返回到解释器

最终您将了解 JIT 编译、在线分析和重新优化以及各种其他很棒的东西。

Here are some options for reducing the overhead:

  1. Declare the functions as fastcall (or something similar) to reduce the overhead of argument passing
  2. Use a large switch-case instead of a table of function pointers (the compiler will optimize to a jump table, and you remove the overhead of actually calling the function)
  3. Copy all the code for a VM procedure to one location so that it can run sequentially, rather than returning to the interpreter after each instruction.

Eventually you're going to get to the point of JIT-compiling, on-line profiling and reoptimization, and all sorts of other awesome stuff.

书信已泛黄 2024-10-19 17:36:51

您可能想要研究许多好的技术。以下是我熟悉的两个:

  1. 内联缓存 - 本质上,找到保留的内容被调用,然后从 vtable 查找切换到仅添加一堆分派到静态已知位置的 if 语句。该技术在 Self 语言中使用效果非常好,是 JVM 的主要优化之一。

  2. 跟踪 - 为可能最终使用的每种类型编译一段多态调度的一个版本,但推迟编译,直到代码运行了足够多次。 Mozilla 的 TraceMonkey JavaScript 解释器在许多情况下使用它来获得巨大的性能优势。

希望这有帮助!

There are many good techniques you might want to look into. Here are two that I'm familiar with:

  1. Inline caching- Essentially, finding what keeps getting called, then switching from a vtable lookup to just adding a bunch of if statements that dispatch to a statically-known location. This technique was used to great effect in the Self language and is one of the JVM's major optimizations.

  2. Tracing- Compiling one version of a piece of a polymorphic dispatch for each type that might end up being used, but deferring compilation until the code has been run sufficiently many times. Mozilla's TraceMonkey JavaScript interpreter uses this to get a huge performance win in many cases.

Hope this helps!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文