C/C++ 的 JIT 优化器

发布于 2024-12-04 03:37:04 字数 535 浏览 2 评论 0原文

我正在阅读有关 JIT 相对于预编译的优势的内容,其中提到的其中之一是 JIT 可以根据实际运行时数据调整分支预测。现在,距离我在大学编写编译器已经很长时间了,但在我看来,在大多数情况下(没有显式的 goto 的情况),预编译代码也可以实现类似的效果。

考虑以下代码:

   test x
   jne L2:
L1: ...
   jmp L3:
L2: ...
L3:

如果我们有一些运行时工具可以查看“jne L2”为真的次数,它可以物理交换 L1: 块和 L2: 块中的所有指令。当然,它必须知道在交换期间两个块内都没有线程,但这些都是细节......

   test x
   jeq L1:
L2: ...
   jmp L3:
L1: ...
L3:

我知道当程序代码加载到只读内存等中时也会出现问题,但这是一个想法。

所以我的问题是,这样的 JIT 优化对于 C/C++ 是否可行,或者我是否遗漏了一些无法做到这一点的根本原因?有没有针对 C/C++ 的 JIT 优化器?

I was reading about the advantages of JIT over precompiled and one of those mentioned was that a JIT could adjust branch predictions based on actual runtime data. Now it's been a long time since I wrote a compiler in college, but it seems to me that something similar can be achieved for precompiled code also in most cases (where there are no explicit gotos).

Consider the following code:

   test x
   jne L2:
L1: ...
   jmp L3:
L2: ...
L3:

If we have some runtime instrumentation that sees how many times the 'jne L2' is true, it could physically swap all the instructions in the L1: block and the L2: block. Of course, it would have to know that no thread is within either block during the swap, but those are details...

   test x
   jeq L1:
L2: ...
   jmp L3:
L1: ...
L3:

I understand there are also issues when the program code is loaded in readonly memory, etc. but it's an idea.

So my question is, is such a JIT optimization feasible for C/C++ or am I missing some fundamental reason why this cannot be done? Are there any JIT optimizers for C/C++ out there?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

泛泛之交 2024-12-11 03:37:05

据我所知,没有适用于 C++ 的 JIT 编译器;不过,GCC 确实支持反馈定向优化(FDO),它可以使用运行时分析来优化分支预测等。

请参阅以“-fprofile”开头的 GCC 选项(提示:“- fprofile-use”使用生成的运行时配置文件来执行优化,而“-fprofile-generate”用于生成运行时配置文件)。

There is no JIT compiler for C++ that I am aware of; however, GCC does support feedback directed optimization (FDO), which can use runtime profiling to optimize branch prediction and the like.

See the GCC options starting with "-fprofile" (HINT: "-fprofile-use" uses the generated runtime profile to perform the optimization, while "-fprofile-generate" is used to generate the runtime profile).

我的黑色迷你裙 2024-12-11 03:37:05

大多数现代 CPU 支持分支预测。它们有一个小缓存,理论上可以让 CPU 享受在运行时重新排序的好处。此缓存的大小相当有限,但可能意味着您无法获得想象中的那么多好处。有些 CPU 甚至可以开始执行两个分支并丢弃在未执行的分支上完成的工作。


编辑:使用 JIT 编译器的最大优势来自于这样的代码。

if (debug) {
   // do something
}

JIT 非常擅长检测和优化不执行任何操作的代码。 (如果你有一个微基准测试表明 Java 比 C 快得多,那么 JIT 很可能检测到你的测试没有做任何 C 编译器没有做的事情)

你可能会问,为什么 C 没有一些东西像这样?因为它有一些“更好”的东西,

#if DEBUG
    // do something
#endif

如果 DEBUG 很少更改并且您只有很少的这些标志,因此您可以编译每个有用的组合,那么这是最佳的。

这种方法的问题是可扩展性。您添加的每个标志都可以使要生成的预编译二进制文件的数量增加一倍。

如果您有很多这样的标志,并且编译每个组合是不切实际的,那么您需要依靠分支预测来动态优化您的代码。

Most modern CPU support branch prediction. They have a small cache which allow the CPU to notionally give you the benefits of re-ordering at runtime. This cache is fairly limited in size, but may mean you don't get as much benefit as you might imagine. Some CPUs can even start executing both branches and discard the work done on the branch not taken.


EDIT: The biggest advantage in using a JIT compiler comes from code like this.

if (debug) {
   // do something
}

JITs are very good at detecting and optimising code which doesn't do anything. (If you have a micro-benchmark which suggests Java is much faster than C it is most likely the JIT has detected your test isn't doing anything where the C compiler didn't)

You might ask, why doesn't C have something like this? Because it has something "better"

#if DEBUG
    // do something
#endif

This is optimal provided DEBUG rarely changes and you have very few of these flags so you can compile every useful combination.

The problem this approach is scalability. Every flag you add can double the number of pre-compiled binaries to produce.

If you have many such flags and it is impractical to compile every combination, you need to rely on branch prediction to optimise your code dynamically.

欲拥i 2024-12-11 03:37:05

您指的是跟踪或重新优化 JIT,而不仅仅是任何旧的 JIT,类似的东西还没有为 C 或 C++ 制作(至少没有公开)。然而,您可能想检查 LLVM 是否没有使用 Clang 或 GCC 前端的分支(考虑到它既是编译器又是 JIT),因为我似乎看到一些主题表明它可能会被实现。

You are refering to tracing or reoptimizing JITs, not just any old JIT, something like this hasn't been made for C or C++ (at least not publically). However, you might want to check if LLVM isn't headed that way with a branch (considering its both a compiler and JIT) using Clang or GCC front ends, as I've seem some topics suggesting it might be implemented.

萌化 2024-12-11 03:37:05

HP Dynamo 二进制重新编译器证明,C++ 编译器生成的优化代码可以实现高达 20% 的加速。 Dynamo 并不完全是 JIT 编译器,因为它以任意机器代码而不是某些更高级别的表示形式(例如 JVM 字节码或 .NET CIL)启动,但原则上,C++ 的 JIT 只能比 Dynamo 更高效。请参阅:http://citeseerx.ist.psu .edu/viewdoc/summary?doi=10.1.1.12.7138&rank=1

Dynamo 是为 HP 创建的PA-RISC 架构从未作为商业产品提供,因此它在当前 x86 变体主导的世界中没有多大用处。我想知道 VMware、Connectix 或 Parallels 是否曾经尝试过向其重新编译器添加优化过程,或者他们是否已经放弃了二进制转换,转而支持最新 x86 CPU 中的虚拟化功能。

The HP Dynamo binary recompiler demonstrated that it is possible to achieve speed-ups of up to 20 % on optimized code produced by a C++ compiler. Dynamo isn't exactly a JIT compiler since it starts with arbitrary machine code instead of some higher level representation such as JVM bytecode or .NET CIL, but in principle a JIT for C++ could only be more efficient than Dynamo. See: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.12.7138&rank=1

Dynamo was created for the HP PA-RISC architecture, and never offered as a commercial product, so it isn't of much use in the current world dominated by x86 variants. I wonder if VMware, Connectix or Parallels have ever played around with adding optimization passes to their recompilers, or have they already got rid of binary translation in favour of the virtualization features in the latest x86 CPUs.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文