比较long比比较double慢的原因
我写了一个小程序来计算前 18 个三元组 (x,y,z)
和 x
x^3+y^ 3=z^3+1。
在优化总运行时间时,我发现使用 double
来表示立方值和方程的两侧比使用 long
更快。在我的机器上,差异约为 3 秒。
现在我想知道为什么会出现这种情况。我猜想它是在比较两个 long 变量时对 long 的内部处理中的某个地方,因为这是唯一在计算循环中发生变化的事情。
这是我的代码:
class Threes {
public static void main(String[] args) {
System.out.println("Threes --- Java");
int Z_MAX = 60000, Y_MAX = Z_MAX-1, X_MAX = Y_MAX-1;
double[] powers = new double[Z_MAX+1];
for (int i = 0; i <= Z_MAX; i++) {
powers[i] = Math.pow(i, 3);
}
System.out.println("Powers calculated");
int x, y, z;
double right, left;
int[][] sets = new int[18][3];
int foundCount = 0;
long loopCount = 0;
long start, end;
start = System.currentTimeMillis();
for (x = 1 ; x < X_MAX; x++) {
for (y = x + 1; y < Y_MAX; y++) {
right = powers[x] + powers[y];
for (z = y + 1; z < Z_MAX; z++) {
left = powers[z] + 1;
if (right < left) {
z = Z_MAX;
} else if (right == left) {
sets[foundCount][0] = x;
sets[foundCount][1] = y;
sets[foundCount][2] = z;
foundCount++;
end = System.currentTimeMillis();
System.out.println("found " + foundCount + ". set:\t" + x + "\t" + y + "\t" + z + "\t" + ((end - start) / 1000.0));
if (foundCount == 18) {
x = X_MAX;
y = Y_MAX;
z = Z_MAX;
}
}
loopCount++;
}
}
}
System.out.println("finished: " + loopCount);
}
}
我更改的行是:
double[] powers = new double[Z_MAX+1];
成为
long[] powers = new long[Z_MAX+1];
,
powers[i] = Math.pow(i, 3);
成为
powers[i] = (long)Math.pow(i, 3);
,
double right, left;
成为
long right, left;
“额外问题”: 在总运行时间方面我还有哪些其他可能性来优化整个代码?我知道,省略 loopCount
会给我一些毫秒的时间。我确信,我必须显着减少循环迭代的次数。但如何呢?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果您使用 32 位操作系统,则 long 变量的性能可能会更差,因为 long 是 64 位类型。例如,在 64 位操作系统中,Java 可以仅使用一条机器指令进行比较,但在 32 位环境中,它必须使用多个机器指令,因为它当时只能处理 32 位。
但对于 double 来说,这不是必需的,因为 32 位系统具有用于 64 位浮点数的机器指令,即使没有用于 64 位整数的机器指令。
另外,对于代码:
有两个不必要的转换,首先 i(整数)被转换为 double(这就是 Math.pow 所采用的),然后返回值被转换回 64 位整数(long)。
If you are using 32-bit operating system the performance for long-variable could be worse because long is 64-bit type. For example, with 64-bit OS Java could do the comparison with just one machine instruction, but in 32-bit environment it has to use multiple machine instructions, since it can only handle 32-bit at the time.
But for double, this is not neccessary the case, since 32-bit systems have machine instructions for 64-bit floating point numbers, even when thet don't have them for 64-bit integers.
Also, with code:
there is two unneccesary conversions, first i (integer) is converted to double (that's what Math.pow takes) and then the return value is converted back to 64-bit integer (long).
可以公平地说,您的代码大部分时间都花在本节中:
并且大多数时候,它总是从条件中取出相同的分支。因此,一旦您的代码达到稳定状态(即一旦设置了 CPU 的分支预测器),运行时间将由计算本身主导:依赖性最小化,因此指令管道的延迟并不重要。
在 32 位机器上,对 64 位整数类型进行加法和比较比对 double 进行等效操作需要更多指令。
double
计算将需要更多周期才能完成,但这并不重要。我们主导的是指令吞吐量,而不是延迟。所以整体运行时间会更长。在进一步优化方面,您可以通过计算
right = powers[x] + powers[y] - 1
将 +1 移到内部循环之外。但优化器可能已经发现了这一点。It's probably fair to say that your code spends most of its time in this section:
And most of the time, it will always be taking the same branch out of the conditional. So once your code has reached the steady-state (i.e. once the CPU's branch predictor is set up), the run-time will be dominated by the computation itself: dependencies are minimised, so the latency of the instruction pipeline doesn't matter.
On a 32-bit machine, doing addition and comparison on 64-bit integer types takes more instructions than doing the equivalent on
double
s. Adouble
calculation will take more cycles to complete, but that doesn't matter. We're dominated by instruction throughput, not latency. So the overall run-time will be longer.In terms of further optimization, you could move the +1 outside the inner loop, by calculating
right = powers[x] + powers[y] - 1
. But it's possible the optimizer has already spotted that.您最大的“奖励”优化将是将
z
循环替换为如下计算:并检查是否
z > > y&&左 == powers[(int)z] + 1
。如果您想找到限制范围内的所有三元组,则还有其他改进:
x
(而不是 1)z = Z_MAX;
替换为break;
来提前退出循环,X_MAX
计算为Math.pow((powers[Z_MAX] + 1)/2, 1./3)
~=Z_MAX * Math.pow(0.5, 1./3)
因为如果x
大于该值,z
将超过Z_MAX
x
的 code>Y_MAX 为Math.pow(powers[Z_MAX] - powers[x] + 1, 1./3)/2
BTW,更常见的订购方式三元组将使用 z 作为主排序键,这可能会导致前 18 个与首先按 x 排序不同。要改变这一点,您可以让外循环迭代 z,无论如何,这会更简单:
Your biggest "bonus" optimization will be to replace the
z
loop with a calculation like:and check if
z > y && left == powers[(int)z] + 1
.Other improvements if you wanted to find all triples within your limits:
x
at 2 instead of 1z = Z_MAX;
withbreak;
to exit the loop earlyX_MAX
asMath.pow((powers[Z_MAX] + 1)/2, 1./3)
~=Z_MAX * Math.pow(0.5, 1./3)
since ifx
is bigger than that,z
will exceedZ_MAX
Y_MAX
for eachx
asMath.pow(powers[Z_MAX] - powers[x] + 1, 1./3)/2
BTW, a more common way to order the triples would be using z as the primary sort key, which may result in a different first 18 than you get ordering by x first. To change that, you'd make your outer loop iterate over z, which would be simpler anyway: