Java中访问final局部变量比访问类变量更快吗?
我一直在研究一些java原始集合(trove,fastutil, hppc),我注意到类变量有时被声明为 final
局部变量的模式。例如:
public void forEach(IntIntProcedure p) {
final boolean[] used = this.used;
final int[] key = this.key;
final int[] value = this.value;
for (int i = 0; i < used.length; i++) {
if (used[i]) {
p.apply(key[i],value[i]);
}
}
}
我做了一些基准测试,这样做时似乎稍快,但为什么会这样呢?我试图了解如果函数的前三行被注释掉,Java 会有什么不同的做法。
注意:这似乎类似于 这个问题,但那是针对 C++ 的,并没有说明为什么它们被声明为 final
。
I've been looking at at some of the java primitive collections (trove, fastutil, hppc) and I've noticed a pattern that class variables are sometimes declared as final
local variables. For example:
public void forEach(IntIntProcedure p) {
final boolean[] used = this.used;
final int[] key = this.key;
final int[] value = this.value;
for (int i = 0; i < used.length; i++) {
if (used[i]) {
p.apply(key[i],value[i]);
}
}
}
I've done some benchmarking, and it appears that it is slightly faster when doing this, but why is this the case? I'm trying to understand what Java would do differently if the first three lines of the function were commented out.
Note: This seems similiar to this question, but that was for c++ and doesn't address why they are declared final
.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
访问局部变量或参数是单步操作:获取位于堆栈上偏移量 N 处的变量。如果您的函数有 2 个参数(简化):
this
当您访问局部变量时,您会在固定偏移量处进行一次内存访问(N 在编译时已知)。这是用于访问第一个方法参数 (
int
) 的字节码:但是,当您访问字段时,您实际上执行了一个额外的步骤。首先,您读取“局部变量”
this
只是为了确定当前对象地址。然后,您将加载一个字段 (getfield
),该字段与this
具有固定偏移量。因此,您执行两次内存操作,而不是一次(或额外一次)。字节码:因此从技术上讲,访问局部变量和参数比对象字段更快。实际上,许多其他因素可能会影响性能(包括各种级别的 CPU 缓存和 JVM 优化)。
final
是一个不同的故事。这基本上是对编译器/JIT 的提示,该引用不会改变,因此它可以进行一些更重的优化。但这更难追踪,因为根据经验,只要有可能就使用final
。Accessing local variable or parameter is a single step operation: take a variable located at offset N on the stack. If you function has 2 arguments (simplified):
this
So when you access local variable, you have one memory access at fixed offset (N is known at compilation time). This is the bytecode for accessing first method argument (
int
):However when you access field, you are actually performing an extra step. First you are reading "local variable"
this
just to determine the current object address. Then you are loading a field (getfield
) which has a fixed offset fromthis
. So you perform two memory operations instead of one (or one extra). Bytecode:So technically accessing local variables and parameters is faster than object fields. In practice, many other factors may affect performance (including various levels of CPU cache and JVM optimizations).
final
is a different story. It is basically a hint for the compiler/JIT that this reference won't change so it can make some heavier optimizations. But this is much harder to track down, as a rule of thumb usefinal
whenever possible.final
关键字在这里是一个转移注意力的东西。性能差异的出现是因为他们说的是两种不同的事情。
意思是“获取一个布尔数组,并为该数组的每个元素执行某些操作。”
如果没有
final boolean[]used
,该函数会说“当索引小于当前对象的used
字段的当前值的长度时,获取当前对象当前对象的used
字段的值,并对索引i
处的元素执行某些操作。”JIT 可能会更容易地证明循环绑定不变量,以消除多余的绑定检查等,因为它可以更轻松地确定什么会导致
used
的值发生变化。即使忽略多个线程,如果p.apply
可以更改used
的值,那么 JIT 也无法消除边界检查或执行其他有用的优化。The
final
keyword is a red herring here.The performance difference comes because they are saying two different things.
is saying, "fetch a boolean array, and for each element of that array do something."
Without
final boolean[] used
, the function is saying "while the index is less than the length of the current value of theused
field of the current object, fetch the current value of theused
field of the current object and do something with the element at indexi
."The JIT might have a much easier time proving loop bound invariants to eliminate excess bound checks and so on because it can much more easily determine what would cause the value of
used
to change. Even ignoring multiple threads, ifp.apply
could change the value ofused
then the JIT can't eliminate bounds checks or do other useful optimizations.在生成的 VM 操作码中,局部变量是操作数堆栈上的条目,而字段引用必须通过通过对象引用检索值的指令移动到堆栈。我想 JIT 可以使堆栈引用更容易地注册引用。
In the generated VM opcodes local variables are entries on the operand stack while field references must be moved to the stack via an instruction that retrieves the value through the object reference. I imagine the JIT can make the stack references register references more easily.
它告诉运行时 (jit) 在该方法调用的上下文中,这 3 个值永远不会改变,因此运行时不需要不断地从成员变量加载值。这可能会稍微提高速度。
当然,随着 jit 变得越来越聪明并且可以自己解决这些问题,这些约定就变得不那么有用了。
请注意,我没有明确表示加速更多来自使用局部变量而不是最后部分。
it tells the runtime (jit) that in the context of that method call, those 3 values will never change, so the runtime does not need to continually load the values from the member variable. this may give a slight speed improvement.
of course, as the jit gets smarter and can figure out these things on its own, these conventions become less useful.
note, i didn't make it clear that the speedup is more from using a local variable than the final part.
这些简单的优化已经包含在 JVM 运行时中。如果 JVM 对实例变量进行简单的访问,我们的 Java 应用程序将会变得非常慢。
不过,对于更简单的 JVM(例如 Android)来说,这种手动调整可能是值得的。
Such simple optimizations are already included in JVM runtime. If JVM does naive access to instance variables, our Java applications will be turtle slow.
Such manual tuning probably worthwhile for simpler JVMs though, e.g. Android.