内联分配作为确保读取顺序的一种方式
在Java7的ForkJoinPool
类中,有一条关于实现的注释,其中指出:
方法 signalWork() 和 scan() 是主要瓶颈,因此尤其受到严重的微优化/破坏。有很多内联赋值(形式为“while ((local = field) != 0)”),这通常是确保所需读取顺序(有时很关键)的最简单方法
我的问题是:内联赋值如何帮助读取排序(我熟悉Java内存模型,但我看不出内联赋值在这里有何帮助)?
In the ForkJoinPool
class in Java7, there is a comment regarding the implementation which states:
Methods signalWork() and scan() are the main bottlenecks so are especially heavily micro-optimized/mangled. There are lots of inline assignments (of form "while ((local = field) != 0)") which are usually the simplest way to ensure the required read orderings (which are sometimes critical)
My question is: how does inline assignment help with read-ordering (i'm familiar with the Java Memory model and i can't see how inline assignment helps here)?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我认为 ninjalj 是正确的,因为表达式可以安全地重写为
local = field; while (本地!= 0) {...;本地=字段}
。然而,在实际代码中,它们的表达式要复杂得多,例如:while ((((e = (int)(c = ctl)) | (u = (int)(c >>>) ; 32))) & (INT_SIGN|SHORT_SIGN)) == (INT_SIGN|SHORT_SIGN) & e >= 0) {
.将 that 重写为一系列临时变量赋值和条件会将其从两行代码更改为半屏代码,并且具有此类非平凡代码的两份副本(在循环之前和内部循环体)将是可维护性和可读性的噩梦。整个函数中的代码大小和临时局部变量的数量也可能会增加,这可能会影响性能或至少使优化器的工作更加困难。内联版本可以编译为:
labelloop_start;计算条件; if (!condition) 转到 after_loop;循环体;转到循环开始; label after_loop;
虽然我怀疑编译器是否总是足够聪明,能够自行删除循环条件显式计算两次的代码。I think ninjalj is right in that the expression could safely be rewritten as
local = field; while (local != 0) {...; local = field }
. However, in the actual code, they have much more complex expressions, for example:while ((((e = (int)(c = ctl)) | (u = (int)(c >>> 32))) & (INT_SIGN|SHORT_SIGN)) == (INT_SIGN|SHORT_SIGN) && e >= 0) {
. Rewriting that into a series of temporary variable assignments and conditionals would change it from two lines to half a screen of code, and having two copies of such non-trivial code code (before the loop and inside loop body) would be a maintainability and readability nightmare.Code size and number of temporary local variables in the whole function might also grow, which could impact performance or at least make the optimizer's work harder. The inlined version can be compiled to:
label loop_start; calculate condition; if (!condition) goto after_loop; loop_body; goto loop_start; label after_loop;
while I doubt the compiler would always be smart enough to deduplicate by itself the code where loop condition is explicitly calculated twice.理论上,内联对排序没有影响。编译器可以自由地对代码进行重新排序,JIT 编译器以及某些情况下的 CPU 也是如此。
阅读完相关代码后,您应该注意以下事实:在所述 while 循环中读取的许多字段都是易失性。易失性读取和写入无法重新排序,并且受先于发生关系的影响。请参阅这篇博客文章,了解精彩的内容易失性语义的解释。
通过内联易失性读取,其余条件将受到可见性规则的约束,并且不符合重新排序的条件。通过其他方式实现这一点很可能很困难。
In theory in-lining should make no difference to ordering. The compiler is free to re-order your code, same goes for the JIT compiler and in some cases the CPU.
Having read the code in question you should pay attention to the fact that many of the fields read in the said while loops are volatile. Volatile reads and writes cannot be re-ordered and are subject to happens-before relationships. See this blog post for an excellent explanation of volatile semantics.
By in-lining the volatile reads the rest of the conditions in are subject to visibility rules and are not eligible for re-ordering. This may well have been awkward to achieve by other means.