标签地址 (MSVC)
我们正在为高级编译语言编写字节码,经过一些分析和优化后,很明显,当前最大的性能开销是我们用来跳转到字节码情况的 switch 语句。
我们研究了提取每个 case 标签的地址并将其存储在字节码流本身中,而不是我们通常打开的指令 ID。如果这样做,我们可以跳过跳转表,直接跳转到当前正在执行的指令的代码位置。这在 GCC 中效果非常好,但是,MSVC 似乎不支持这样的功能。
我们尝试使用内联汇编来获取标签的地址(并跳转到它们),并且它有效,但是,使用内联汇编会导致 MSVC 优化器避免整个函数。
有没有办法让优化器仍然运行代码?不幸的是,我们无法将内联汇编提取到除创建标签的函数之外的另一个函数中,因为即使在内联汇编中也无法引用另一个函数的标签。有什么想法或想法吗?非常感谢您的意见,谢谢!
We are writing a byte-code for a high-level compiled language, and after a bit of profiling and optimization, it became clear that the current largest performance overhead is the switch statement we're using to jump to the byte-code cases.
We investigated pulling out the address of each case label and storing it in the stream of byte-code itself, rather than the instruction ID that we usually switch on. If we do that, we can skip the jump table, and directly jump to the location of code of the currently executing instruction. This works fantastically in GCC, however, MSVC doesn't seem to support a feature like this.
We attempted to use inline assembly to grab the address of the labels (and to jump to them), and it works, however, using inline assembly causes the entire function to be avoided by the MSVC optimizer.
Is there a way to allow the optimizer to still run over the code? Unfortunately, we can't extract the inline assembly into another function other than the one that the labels were made in, since there's no way to reference a label for another function even in inline assembly. Any thoughts or ideas? Your input is much appreciated, thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
在 MSVC 中执行此操作的唯一方法是使用内联汇编(这对于 x64 来说基本上是个麻烦事):
如果您打算做这样的事情,那么最好的方法是在汇编中编写整个解释器,然后将其链接到main 二进制文件通过链接器(这就是 LuaJIT 所做的,这也是虚拟机在不运行 JIT 代码时速度如此之快的主要原因)。
LuaJIT 是开源的,因此如果您走这条路,您可能会从中学到一些技巧。或者,您可能想查看第四个的来源(其创建者开发了您尝试使用的原理),如果有一个 MSVC 构建,您可以看到他们是如何完成它的,否则您将陷入 GCC(这不是一件坏事,它适用于所有主要平台)。
The only way of doing this in MSVC is by using inline assembly (which basically buggers you for x64):
If you plan on doing something like this, then the best way would be to write the whole interpreter in assembly then link that in to the main binary via the linker (this is what LuaJIT did, and it is the main reason the VM is so blindingly fast, when its not running JIT'ed code that is).
LuaJIT is open-source, so you might pick up some tips from it if you go that route. Alternatively you might want to look into the source of forth (whose creator developed the principle you're trying to use), if there is an MSVC build you can see how they accomplished it, else you're stuck with GCC (which isn't a bad thing, it works on all major platforms).
看看 Erlang 在 Windows 上进行构建时做了什么。他们在大部分构建中使用 MSVC,然后在一个文件中使用 GCC,以利用标签即值扩展。然后对生成的目标代码进行修改,使其与 MSVC 链接器兼容。
http://www.erlang.org/doc/installation_guide/INSTALL-WIN32.html
Take a look at what Erlang does for building on Windows. They use MSVC for most of the build, and then GCC for one file to make use of the labels-as-values extension. The resulting object code is then hacked to be made compatible with the MSVC linker.
http://www.erlang.org/doc/installation_guide/INSTALL-WIN32.html
看来您可以将实际代码移至函数,而不是案例标签。然后可以将字节码简单地转换为直接调用。即字节码 1 将转换为
CALL BC1
。由于您生成直接调用,因此没有函数指针的开销。大多数CPU 的流水线可以遵循这种无条件直接分支。因此,每个字节码的实际实现都得到了优化,并且从字节码到机器码的转换是一个简单的 1:1 转换。由于每个
CALL
都是 5 个字节(假设 x86-32),因此您会得到一些代码扩展,但这不太可能是一个主要问题。It seems you could just move the actual code to functions, instead of case labels. The byte code can then be trivially transformed into direct calls. I.e. byte code 1 would translate to
CALL BC1
. Since you're generating direct calls, you don't have the overhead of function pointers. The pipelines of most CPU's can follow such unconditional direct branches.As a result, the actual implementations of each byte code are optimized, and the conversion from byte code to machince code is a trivial 1:1 conversion. You get a bit of code expansion since each
CALL
is 5 bytes (assuming x86-32) but that's unlikely to be a major problem.我发现的最佳方法是使用 switch 子句,然后为 switch 子句的每个元素编写 goto 调用。
手动工作,但不使用汇编,看起来与 MSVC 的兼容性很好。
The best approach I found for this is to use a switch clause and then for each element of the switch clause you write the goto call.
Manual work but doesn’t use assembly and looks like compatibility is good with MSVC.