关于可能的java(或其他内存管理语言)优化的问题
从我读过的java(通常)来看,似乎将java编译为不是非常(根本不是?)优化的java字节码,将其留给jit来优化。这是真的吗?如果是的话,是否有任何探索(可能在替代实现中)让编译器优化代码,以便 jit 要做的工作更少(这可能吗)?
此外,许多人似乎不喜欢 Java(和许多其他高级内存管理语言)的本机代码生成(有时称为提前编译),原因有很多,例如损失可移植性(等等),但是也部分是因为(至少对于那些具有即时编译器的语言)这种想法是,提前编译为机器代码将错过 jit 编译器可以完成的可能优化,因此从长远来看可能会更慢。
这让我想知道是否有人尝试过实施 http://en.wikipedia.org/ wiki/Profile-guided_optimization(编译为二进制文件+一些额外的内容,然后运行程序并分析测试运行的运行时信息,以生成一个希望更优化的二进制文件以供实际使用)用于java/(其他内存管理语言) )这与 jit 代码相比如何?有人知道吗?
From what I have read java (usually) seems to compile java to not very (is at all?) optimised java bytecode, leaving it to the jit to optimise. Is this true? And if it is has there been any exploration (possibly in alternative implementations) of getting the compiler to optimise the code so the jit has less work to do (is this possible)?
Also many people seem to have a dislike for native code generation (sometimes referred to as ahead of time compilation) for Java (and many other high level memory managed languages) , for many reasons such as loss of portability (and ect.) , but also partially because (at least for those languages that have a just in time compiler) the thinking goes that ahead of time compilation to machine code will miss the possible optimisations that can be done by a jit compiler and therefore may be slower in the long run.
This leads me to wonder whether anyone has ever tried to implement http://en.wikipedia.org/wiki/Profile-guided_optimization (compiling to a binary + some extras then running the program and analysing the runtime information of the test run to generate a hopefully more optimised binary for real world usage) for java/(other memory managed languages) and how this would compare to jit code? Anyone have a clue?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
就我个人而言,我认为最大的区别不在于JIT编译和AOT编译之间,而在于类编译和整个程序优化之间。
当您运行 javac 时,它仅查看单个 .java 文件,并将其编译为单个 .class 文件。所有接口实现以及虚拟方法和重写都经过有效性检查,但未得到解决(因为如果不分析整个程序就不可能知道真正的方法调用目标)。
JVM 使用“运行时加载和链接”将所有类组装成一个连贯的程序(并且程序中的任何类都可以调用专门的行为来更改默认的加载/链接行为)。
但是,在运行时,JVM 可以删除绝大多数虚拟方法。它可以内联所有的 getter 和 setter,将它们变成原始字段。当这些原始字段被内联时,它可以执行常量传播以进一步优化代码。 (在运行时,不存在私有字段之类的东西。)如果只有一个线程在运行,JVM 可以消除所有同步原语。
长话短说,如果不分析整个程序,很多优化都是不可能实现的,而进行整个程序分析的最佳时间是在运行时。
Personally, I think the big difference is not between JIT compiling and AOT compiling, but between class-compilation and whole-program optimization.
When you run javac, it only looks at a single .java file, compiling it into a single .class file. All the interface implementations and virtual methods and overrides are checked for validity but left unresolved (because it's impossible to know the true method invocation targets without analyzing the whole program).
The JVM uses "runtime loading and linking" to assemble all of your classes into a coherent program (and any class in your program can invoke specialized behavior to change the default loading/linking behavior).
But then, at runtime, the JVM can remove the vast majority of virtual methods. It can inline all of your getters and setters, turning them into raw fields. And when those raw fields are inlined, it can perform constant-propagation to further optimize the code. (At runtime, there's no such thing as a private field.) And if there's only one thread running, the JVM can eliminate all synchronization primitives.
To make a long story short, there are a lot of optimizations that aren't possible without analyzing the whole program, and the best time for doing whole program analysis is at runtime.
配置文件引导优化有一些注意事项,其中之一甚至在您链接的 Wiki 文章中也提到过。它的结果对于给定的示例有效
从性能的角度来看,即使在通常被认为(或多或少)相同的平台之间也存在相当大的差异(例如,将单核、旧的 Athlon 512M 与 6 核 Intel 8G、在 Linux 上运行、但与非常不同的内核版本)。
如果其中任何一个发生变化,那么您的分析结果(以及基于它们的优化)就不再有效。最有可能的是,一些优化仍然会产生有益的效果,但其中一些可能会导致次优(甚至降低性能)。
正如前面提到的,JIT JVM 执行的操作与分析非常相似,但它们是动态执行的。它也被称为“热点”,因为它不断监视执行的代码,查找频繁执行的热点,并尝试仅优化这些部分。此时,它将能够利用更多有关代码的知识(了解它的上下文、其他类如何使用它等),因此 - 正如您和其他答案所提到的 - 它可以作为一个更好的优化静态一.它将继续监控,如果需要,它将在稍后进行另一轮优化,这一次会更加努力(寻找更多、更昂贵的优化)。
使用现实生活数据(使用统计+平台+配置)可以避免前面提到的警告。
其代价是需要花费一些额外的时间来进行“分析”+ JIT-ing。大部分时间都度过得很好。
我想配置文件引导的优化器仍然可以与它竞争(甚至击败它),但只有在某些特殊情况下,如果您可以避免警告:
这种情况很少发生,我想一般来说 JIT 会给你更好的结果,但我没有证据证明这一点。
如果您的目标是无法进行 JIT 优化的 JVM(我认为大多数小型设备都有这样的 JVM),则还有另一种可能从配置文件引导的优化中获取价值。
顺便说一句,其他答案中提到的一个缺点很容易避免:如果静态/配置文件引导优化很慢(这可能是这种情况),那么仅在发布(或发送给测试人员的 RC)或夜间构建期间(时间不长)执行此操作。没那么重要)。
我认为更大的问题是要有好的样本测试用例。创建和维护它们通常并不容易并且需要花费大量时间。特别是如果您希望能够自动执行它们,这在这种情况下非常重要。
Profile-guided optimization has some caveats, one of them mentioned even in the Wiki article you linked. It's results are valid
From the performance point of view there are quite big differences even among platforms that are usually considered (more or less) the same (e.g. compare a single core, old Athlon with 512M with a 6 core Intel with 8G, running on Linux, but with very different kernel versions).
If any of these change then your profiling results (and the optimizations based on them) are not necessary valid any more. Most likely some of the optimizations will still have a beneficial effect, but some of them may turn out suboptimal (or even degrading performance).
As it was mentioned the JIT JVMs do something very similar to profiling, but they do it on the fly. It's also called 'hotspot', because it constantly monitors the executed code, looks for hot spots that are executed frequently and will try to optimize only those parts. At this point it will be able to exploit more knowledge about the code (knowing the context of it, how it is used by other classes, etc.) so - as mentioned by you and the other answers - it can do better optimizations as a static one. It will continue monitoring and if its needed it will do another turn of optimization later, this time trying even harder (looking for more, more expensive optimizations).
Working on the real life data (usage statistics + platform + config) it can avoid the caveats mentioned before.
The price of it is some additional time it needs to spend on "profiling" + JIT-ing. Most of the time its spent quite well.
I guess a profile-guided optimizer could still compete with it (or even beat it), but only in some special cases, if you can avoid the caveats:
It will happen rarely and I guess in general JIT will give you better results, but I have no evidence for it.
Another possibility for getting value from the profile-guided optimization if you target a JVM that can't do JIT optimization (I think most small devices have such a JVM).
BTW one disadvantage mentioned in other answers would be quite easy to avoid: if static/profile guided optimization is slow (which is probably the case) then do it only for releases (or RCs going to testers) or during nightly builds (where time does not matter so much).
I think the much bigger problem would be to have good sample test cases. Creating and maintaining them is usually not easy and takes a lot of time. Especially if you want to be able to execute them automatically, which would be quite essential in this case.
官方的Java Hot Spot编译器在运行时做了“自适应优化”,这和你提到的profile-guided优化本质上是一样的。长期以来,这一直是至少这个特定 Java 实现的一个特性。
在编译时预先执行更多静态分析或优化传递的权衡本质上是您从这种额外努力中获得的(不断减少的)回报与编译器运行所需的时间。像 MLton(用于标准 ML)这样的编译器是一个具有大量静态检查的整体程序优化编译器。它生成非常好的代码,但在中型到大型程序上变得非常非常慢,即使在快速系统上也是如此。
因此,Java 方法似乎是尽可能使用 JIT 和自适应优化,初始编译过程仅生成可接受的有效二进制文件。绝对相反的目的是使用类似于 MLKit 之类的方法,它对区域和内存行为进行大量静态推断。
The official Java Hot Spot compiler does "adaptive optimisation" at runtime, which is essentially the same as the profile-guided optimisation you mentioned. This has been a feature of at least this particular Java implementation for a long time.
The trade-off to performing more static analysis or optimisation passes up-front at compile time is essentially the (ever-diminishing) returns you get from this extra effort against the time it takes for the compiler to run. A compiler like MLton (for Standard ML) is a whole-program optimising compiler with a lot of static checks. It produces very good code, but becomes very, very slow on medium-to-large programs, even on a fast system.
So the Java approach seems to be to use JIT and adaptive optimisation as much as possible, with the initial compilation pass just producing an acceptable valid binary. The absolute opposite end is to use an approach like that of something like MLKit, which does a lot of static inference of regions and memory behaviour.