Java超级调优,几个问题
在我提出问题之前,请请求不要无缘无故地接受有关优化的讲座。 考虑以下纯学术问题。
我一直在思考Java中根类(即经常使用并且经常相互访问)之间的访问效率,但这适用于大多数OO语言/编译器。在 Java 中访问某些内容的最快方法(我猜)是静态最终引用。理论上,由于该引用在加载期间可用,因此一个好的 JIT 编译器将不需要执行任何引用查找来访问该变量,并将对该变量的任何访问直接指向常量地址。也许出于安全原因,它无论如何都不会那样工作,但请耐心等待...
假设我已经确定存在一些操作顺序问题或在启动时传递一些参数,这意味着我不能有静态最终结果引用,即使我要像建议的那样麻烦地让每个类构造另一个类,以使 Java 类相互具有静态最终引用。我可能不想这样做的另一个原因是......哦,举例来说,我正在提供其中一些类的特定于平台的实现。 ;-)
现在我有两个明显的选择。我可以让我的类通过静态引用(在某些系统集线器类上)相互了解,该引用是在构造所有类之后设置的(在此期间我要求它们还不能相互访问,从而消除了操作顺序问题)至少在施工期间)。另一方面,如果我现在决定整理操作顺序很重要,或者可以由传递参数的人负责,那么这些类可以具有彼此的实例最终引用 -或者更重要的是,提供我们希望相互引用的这些类的特定于平台的实现。
静态变量意味着您不必查找变量所属类的位置,从而节省了一次操作。最终变量意味着您根本不必查找该值,但它必须属于您的类,因此您可以节省“一次操作”。好吧,我知道我现在真的在挥手!
然后我想到了其他事情:我可以拥有静态最终存根类,有点像一个古怪的接口,其中每个调用都被降级到一个只能扩展存根的“impl”。那么对性能的影响将是运行函数所需的双重函数调用,并且可能我猜你不能再将你的方法声明为final。我假设如果它们被适当地声明,也许它们可以被内联,然后放弃,因为我意识到我必须考虑对 'impl 的引用是否可以是静态的,或最终的,或者......
所以哪个三者中的哪一个结果最快? :-)
关于降低频繁访问开销的任何其他想法,甚至是向 JIT 编译器暗示性能的其他方法?
更新:经过几个小时的各种测试并阅读 http:// www.ibm.com/developerworks/java/library/j-jtp02225.html 我发现,在调优(例如 C++)时,您通常会看到的大多数内容都完全被 JIT 编译器排除在外。我见过它一次、两次运行 30 秒的计算,在第三次(及后续)运行时决定“嘿,你没有读取该计算的结果,所以我不运行它!”。
FWIW,您可以测试数据结构,并且我能够使用微基准开发一个更能满足我的需求的数组列表实现。访问模式必须足够随机,以让编译器不断猜测,但它仍然解决了如何使用我更简单、更优化的代码更好地实现通用化增长数组的问题。
就这里的测试而言,我根本无法获得基准测试结果!我对调用函数并从最终对象引用和非最终对象引用中读取变量的简单测试揭示了更多有关 JIT 的信息,而不是 JVM 的访问模式。令人难以置信的是,在方法中的不同位置对相同对象调用相同函数会将所花费的时间改变四倍!
正如 IBM 文章中的人所说,测试优化的唯一方法是就地。
感谢一路上为我指点的所有人。
Before I ask my question can I please ask not to get a lecture about optimising for no reason.
Consider the following questions purely academic.
I've been thinking about the efficiency of accesses between root (ie often used and often accessing each other) classes in Java, but this applies to most OO languages/compilers. The fastest way (I'm guessing) that you could access something in Java would be a static final reference. Theoretically, since that reference is available during loading, a good JIT compiler would remove the need to do any reference lookup to access the variable and point any accesses to that variable straight to a constant address. Perhaps for security reasons it doesn't work that way anyway, but bear with me...
Say I've decided that there are some order of operations problems or some arguments to pass at startup that means I can't have a static final reference, even if I were to go to the trouble of having each class construct the other as is recommended to get Java classes to have static final references to each other. Another reason I might not want to do this would be... oh, say, just for example, that I was providing platform specific implementations of some of these classes. ;-)
Now I'm left with two obvious choices. I can have my classes know about each other with a static reference (on some system hub class), which is set after constructing all classes (during which I mandate that they cannot access each other yet, thus doing away with order of operations problems at least during construction). On the other hand, the classes could have instance final references to each other, were I now to decide that sorting out the order of operations was important or could be made the responsibility of the person passing the args - or more to the point, providing platform specific implementations of these classes we want to have referencing each other.
A static variable means you don't have to look up the location of the variable wrt to the class it belongs to, saving you one operation. A final variable means you don't have to look up the value at all but it does have to belong to your class, so you save 'one operation'. OK I know I'm really handwaving now!
Then something else occurred to me: I could have static final stub classes, kind of like a wacky interface where each call was relegated to an 'impl' which can just extend the stub. The performance hit then would be the double function call required to run the functions and possibly I guess you can't declare your methods final anymore. I hypothesised that perhaps those could be inlined if they were appropriately declared, then gave up as I realised I would then have to think about whether or not the references to the 'impl's could be made static, or final, or...
So which of the three would turn out fastest? :-)
Any other thoughts on lowering frequent-access overheads or even other ways of hinting performance to the JIT compiler?
UPDATE: After running several hours of test of various things and reading http://www.ibm.com/developerworks/java/library/j-jtp02225.html I've found that most things you would normally look at when tuning e.g. C++ go out the window completely with the JIT compiler. I've seen it run 30 seconds of calculations once, twice, and on the third (and subsequent) runs decide "Hey, you aren't reading the result of that calculation, so I'm not running it!".
FWIW you can test data structures and I was able to develop an arraylist implementation that was more performant for my needs using a microbenchmark. The access patterns must have been random enough to keep the compiler guessing, but it still worked out how to better implement a generic-ified growing array with my simpler and more tuned code.
As far as the test here was concerned, I simply could not get a benchmark result! My simple test of calling a function and reading a variable from a final vs non-final object reference revealed more about the JIT than the JVM's access patterns. Unbelievably, calling the same function on the same object at different places in the method changes the time taken by a factor of FOUR!
As the guy in the IBM article says, the only way to test an optimisation is in-situ.
Thanks to everyone who pointed me along the way.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
值得注意的是,静态字段存储在一个特殊的每类对象中,其中包含该类的静态字段。使用静态字段而不是对象字段不太可能更快。
Its worth noting that static fields are stored in a special per-class object which contains the static fields for that class. Using static fields instead of object fields are unlikely to be any faster.
查看更新,我通过做一些基准测试回答了我自己的问题,发现在意想不到的领域有更大的收益,并且像引用成员这样的简单操作的性能在大多数现代系统上是可比的,其中性能更多地受到内存带宽的限制而不是CPU循环。
See the update, I answered my own question by doing some benchmarking, and found that there are far greater gains in unexpected areas and that performance for simple operations like referencing members is comparable on most modern systems where performance is limited more by memory bandwidth than CPU cycles.
假设您找到了一种可靠地分析应用程序的方法,请记住,如果您切换到另一个 jdk impl(IBM、Sun、OpenJDK 等),甚至升级现有 JVM 上的版本,那么一切都会消失。
您遇到麻烦并且可能会因不同的 JVM 实现而产生不同结果的原因在于 Java 规范 - 明确指出它没有定义优化,并将其留给每个实现以任何方式优化(或不优化),只要执行行为不会因优化而改变。
Assuming you found a way to reliably profile your application, keep in mind that it will all go out the window should you switch to another jdk impl (IBM to Sun to OpenJDK etc), or even upgrade version on your existing JVM.
The reason you are having trouble, and would likely have different results with different JVM impls lies in the Java spec - is explicitly states that it does not define optimizations and leaves it to each implementation to optimize (or not) in any way so long as execution behavior is unchanged by the optimization.