Java中访问final局部变量比访问类变量更快吗？

发布于 2024-11-18 16:43:35 字数 892 浏览 7 评论 0原文

我一直在研究一些java原始集合（trove，fastutil, hppc），我注意到类变量有时被声明为 final 局部变量的模式。例如：

public void forEach(IntIntProcedure p) {
    final boolean[] used = this.used;
    final int[] key = this.key;
    final int[] value = this.value;
    for (int i = 0; i < used.length; i++) {
        if (used[i]) {
          p.apply(key[i],value[i]);
        }
    }
}

我做了一些基准测试，这样做时似乎稍快，但为什么会这样呢？我试图了解如果函数的前三行被注释掉，Java 会有什么不同的做法。

注意：这似乎类似于这个问题，但那是针对 C++ 的，并没有说明为什么它们被声明为 final。

原文

I've been looking at at some of the java primitive collections (trove, fastutil, hppc) and I've noticed a pattern that class variables are sometimes declared as final local variables. For example:

public void forEach(IntIntProcedure p) {
    final boolean[] used = this.used;
    final int[] key = this.key;
    final int[] value = this.value;
    for (int i = 0; i < used.length; i++) {
        if (used[i]) {
          p.apply(key[i],value[i]);
        }
    }
}

I've done some benchmarking, and it appears that it is slightly faster when doing this, but why is this the case? I'm trying to understand what Java would do differently if the first three lines of the function were commented out.

Note: This seems similiar to this question, but that was for c++ and doesn't address why they are declared final.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

后知后觉 2024-11-25 16:43:35

访问局部变量或参数是单步操作：获取位于堆栈上偏移量 N 处的变量。如果您的函数有 2 个参数（简化）：

N = 0 - this
N = 1 - 第一个参数
N = 2 - 第二个参数
N = 3 - 第一个局部变量
N = 4 - 第二个局部变量
.. 因此，

当您访问局部变量时，您会在固定偏移量处进行一次内存访问（N 在编译时已知）。这是用于访问第一个方法参数 (int) 的字节码：

iload 1  //N = 1

但是，当您访问字段时，您实际上执行了一个额外的步骤。首先，您读取“局部变量”this只是为了确定当前对象地址。然后，您将加载一个字段 (getfield)，该字段与 this 具有固定偏移量。因此，您执行两次内存操作，而不是一次（或额外一次）。字节码：

aload 0  //N = 0: this reference
getfield total I  //int total

因此从技术上讲，访问局部变量和参数比对象字段更快。实际上，许多其他因素可能会影响性能（包括各种级别的 CPU 缓存和 JVM 优化）。

final 是一个不同的故事。这基本上是对编译器/JIT 的提示，该引用不会改变，因此它可以进行一些更重的优化。但这更难追踪，因为根据经验，只要有可能就使用 final。

Accessing local variable or parameter is a single step operation: take a variable located at offset N on the stack. If you function has 2 arguments (simplified):

N = 0 - this
N = 1 - first argument
N = 2 - second argument
N = 3 - first local variable
N = 4 - second local variable
...

So when you access local variable, you have one memory access at fixed offset (N is known at compilation time). This is the bytecode for accessing first method argument (int):

iload 1  //N = 1

However when you access field, you are actually performing an extra step. First you are reading "local variable" this just to determine the current object address. Then you are loading a field (getfield) which has a fixed offset from this. So you perform two memory operations instead of one (or one extra). Bytecode:

aload 0  //N = 0: this reference
getfield total I  //int total

So technically accessing local variables and parameters is faster than object fields. In practice, many other factors may affect performance (including various levels of CPU cache and JVM optimizations).

final is a different story. It is basically a hint for the compiler/JIT that this reference won't change so it can make some heavier optimizations. But this is much harder to track down, as a rule of thumb use final whenever possible.

回复收藏 0 原文

要走干脆点 2024-11-25 16:43:35

final 关键字在这里是一个转移注意力的东西。
性能差异的出现是因为他们说的是两种不同的事情。

public void forEach(IntIntProcedure p) {
  final boolean[] used = this.used;
  for (int i = 0; i < used.length; i++) {
    ...
  }
}

意思是“获取一个布尔数组，并为该数组的每个元素执行某些操作。”

如果没有 final boolean[]used，该函数会说“当索引小于当前对象的 used 字段的当前值的长度时，获取当前对象当前对象的 used 字段的值，并对索引 i 处的元素执行某些操作。”

JIT 可能会更容易地证明循环绑定不变量，以消除多余的绑定检查等，因为它可以更轻松地确定什么会导致 used 的值发生变化。即使忽略多个线程，如果 p.apply 可以更改 used 的值，那么 JIT 也无法消除边界检查或执行其他有用的优化。

The final keyword is a red herring here.
The performance difference comes because they are saying two different things.

public void forEach(IntIntProcedure p) {
  final boolean[] used = this.used;
  for (int i = 0; i < used.length; i++) {
    ...
  }
}

is saying, "fetch a boolean array, and for each element of that array do something."

Without final boolean[] used, the function is saying "while the index is less than the length of the current value of the used field of the current object, fetch the current value of the used field of the current object and do something with the element at index i."

The JIT might have a much easier time proving loop bound invariants to eliminate excess bound checks and so on because it can much more easily determine what would cause the value of used to change. Even ignoring multiple threads, if p.apply could change the value of used then the JIT can't eliminate bounds checks or do other useful optimizations.

回复收藏 0 原文