比较long比比较double慢的原因

发布于 2024-11-15 16:26:25 字数 2220 浏览 2 评论 0 原文

我写了一个小程序来计算前 18 个三元组 (x,y,z)x,满足 x^3+y^ 3=z^3+1。

在优化总运行时间时,我发现使用 double 来表示立方值和方程的两侧比使用 long 更快。在我的机器上,差异约为 3 秒。

现在我想知道为什么会出现这种情况。我猜想它是在比较两个 long 变量时对 long 的内部处理中的某个地方,因为这是唯一在计算循环中发生变化的事情。

这是我的代码:

class Threes {
  public static void main(String[] args) {
    System.out.println("Threes --- Java");
    int Z_MAX = 60000, Y_MAX = Z_MAX-1, X_MAX = Y_MAX-1;
    double[] powers = new double[Z_MAX+1];
    for (int i = 0; i <= Z_MAX; i++) {
      powers[i] = Math.pow(i, 3);
    }
    System.out.println("Powers calculated");
    int x, y, z;
    double right, left;
    int[][] sets = new int[18][3];
    int foundCount = 0;
    long loopCount = 0;
    long start, end;
    start = System.currentTimeMillis();

    for (x = 1 ; x < X_MAX; x++) {
      for (y = x + 1; y < Y_MAX; y++) {
        right = powers[x] + powers[y];
        for (z = y + 1; z < Z_MAX; z++) {
          left = powers[z] + 1;
          if (right < left) {
            z = Z_MAX;
          } else if (right == left) {
            sets[foundCount][0] = x;
            sets[foundCount][1] = y;
            sets[foundCount][2] = z;
            foundCount++;
            end = System.currentTimeMillis();
            System.out.println("found " + foundCount + ". set:\t" + x + "\t" + y + "\t" + z + "\t" + ((end - start) / 1000.0));
            if (foundCount == 18) {
              x = X_MAX;
              y = Y_MAX;
              z = Z_MAX;
            }
          }
          loopCount++;
        }
      }
    }
    System.out.println("finished: " + loopCount);
  }
}

我更改的行是:

double[] powers = new double[Z_MAX+1];

成为

long[] powers = new long[Z_MAX+1];

powers[i] = Math.pow(i, 3);

成为

powers[i] = (long)Math.pow(i, 3);

double right, left;

成为

long right, left;

“额外问题”: 在总运行时间方面我还有哪些其他可能性来优化整个代码?我知道,省略 loopCount 会给我一些毫秒的时间。我确信,我必须显着减少循环迭代的次数。但如何呢?

I wrote a little program to calculate the first 18 triples (x,y,z) with x<y<z, which satisfy x^3+y^3=z^3+1.

While playing around to optimise the total runtime, I discovered, that using double for the cubic values and the two sides of the equation is faster than using long. On my machine the difference is about 3 seconds.

Now I wonder why exactly this is the case. I guess it is somewhere in the internal handling of long while the comparison of two long-Variables, as this is the only thing, which changes within the calculation loops.

Here is my code:

class Threes {
  public static void main(String[] args) {
    System.out.println("Threes --- Java");
    int Z_MAX = 60000, Y_MAX = Z_MAX-1, X_MAX = Y_MAX-1;
    double[] powers = new double[Z_MAX+1];
    for (int i = 0; i <= Z_MAX; i++) {
      powers[i] = Math.pow(i, 3);
    }
    System.out.println("Powers calculated");
    int x, y, z;
    double right, left;
    int[][] sets = new int[18][3];
    int foundCount = 0;
    long loopCount = 0;
    long start, end;
    start = System.currentTimeMillis();

    for (x = 1 ; x < X_MAX; x++) {
      for (y = x + 1; y < Y_MAX; y++) {
        right = powers[x] + powers[y];
        for (z = y + 1; z < Z_MAX; z++) {
          left = powers[z] + 1;
          if (right < left) {
            z = Z_MAX;
          } else if (right == left) {
            sets[foundCount][0] = x;
            sets[foundCount][1] = y;
            sets[foundCount][2] = z;
            foundCount++;
            end = System.currentTimeMillis();
            System.out.println("found " + foundCount + ". set:\t" + x + "\t" + y + "\t" + z + "\t" + ((end - start) / 1000.0));
            if (foundCount == 18) {
              x = X_MAX;
              y = Y_MAX;
              z = Z_MAX;
            }
          }
          loopCount++;
        }
      }
    }
    System.out.println("finished: " + loopCount);
  }
}

The lines I changed are:

double[] powers = new double[Z_MAX+1];

becomes

long[] powers = new long[Z_MAX+1];

and

powers[i] = Math.pow(i, 3);

becomes

powers[i] = (long)Math.pow(i, 3);

and

double right, left;

becomes

long right, left;

"Bonus Question": What other possibilities of optimizing the whole code in terms of total runtime do I have? I know, that leaving out the loopCount gives me some milliseconds. I'm sure, that I have to reduce the number of loop iterations significantly. But how?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

海的爱人是光 2024-11-22 16:26:25

如果您使用 32 位操作系统,则 long 变量的性能可能会更差,因为 long 是 64 位类型。例如,在 64 位操作系统中,Java 可以仅使用一条机器指令进行比较,但在 32 位环境中,它必须使用多个机器指令,因为它当时只能处理 32 位。

但对于 double 来说,这不是必需的,因为 32 位系统具有用于 64 位浮点数的机器指令,即使没有用于 64 位整数的机器指令。

另外,对于代码:

powers[i] = (long)Math.pow(i, 3);

有两个不必要的转换,首先 i(整数)被转换为 double(这就是 Math.pow 所采用的),然后返回值被转换回 64 位整数(long)。

If you are using 32-bit operating system the performance for long-variable could be worse because long is 64-bit type. For example, with 64-bit OS Java could do the comparison with just one machine instruction, but in 32-bit environment it has to use multiple machine instructions, since it can only handle 32-bit at the time.

But for double, this is not neccessary the case, since 32-bit systems have machine instructions for 64-bit floating point numbers, even when thet don't have them for 64-bit integers.

Also, with code:

powers[i] = (long)Math.pow(i, 3);

there is two unneccesary conversions, first i (integer) is converted to double (that's what Math.pow takes) and then the return value is converted back to 64-bit integer (long).

嗼ふ静 2024-11-22 16:26:25

可以公平地说,您的代码大部分时间都花在本节中:

for (z = y + 1; z < Z_MAX; z++) {
    left = powers[z] + 1;
     if (right < left) {
        z = Z_MAX;
     }

并且大多数时候,它总是从条件中取出相同的分支。因此,一旦您的代码达到稳定状态(即一旦设置了 CPU 的分支预测器),运行时间将由计算本身主导:依赖性最小化,因此指令管道的延迟并不重要。

在 32 位机器上,对 64 位整数类型进行加法和比较比对 double 进行等效操作需要更多指令。 double 计算将需要更多周期才能完成,但这并不重要。我们主导的是指令吞吐量,而不是延迟。所以整体运行时间会更长。

在进一步优化方面,您可以通过计算 right = powers[x] + powers[y] - 1 将 +1 移到内部循环之外。但优化器可能已经发现了这一点。

It's probably fair to say that your code spends most of its time in this section:

for (z = y + 1; z < Z_MAX; z++) {
    left = powers[z] + 1;
     if (right < left) {
        z = Z_MAX;
     }

And most of the time, it will always be taking the same branch out of the conditional. So once your code has reached the steady-state (i.e. once the CPU's branch predictor is set up), the run-time will be dominated by the computation itself: dependencies are minimised, so the latency of the instruction pipeline doesn't matter.

On a 32-bit machine, doing addition and comparison on 64-bit integer types takes more instructions than doing the equivalent on doubles. A double calculation will take more cycles to complete, but that doesn't matter. We're dominated by instruction throughput, not latency. So the overall run-time will be longer.

In terms of further optimization, you could move the +1 outside the inner loop, by calculating right = powers[x] + powers[y] - 1. But it's possible the optimizer has already spotted that.

一身软味 2024-11-22 16:26:25

您最大的“奖励”优化将是将 z 循环替换为如下计算:

z = Math.round(Math.pow(left - 1, 1./3));

并检查是否 z > > y&&左 == powers[(int)z] + 1

如果您想找到限制范围内的所有三元组,则还有其他改进:

  • 从 2 开始 x(而不是 1)
  • z = Z_MAX; 替换为 break; 来提前退出循环,
  • X_MAX 计算为 Math.pow((powers[Z_MAX] + 1)/2, 1./3) ~= Z_MAX * Math.pow(0.5, 1./3) 因为如果 x 大于该值,z 将超过 Z_MAX
  • 重新计算 <每个 x 的 code>Y_MAX 为 Math.pow(powers[Z_MAX] - powers[x] + 1, 1./3)/2

BTW,更常见的订购方式三元组将使用 z 作为主排序键,这可能会导致前 18 个与首先按 x 排序不同。要改变这一点,您可以让外循环迭代 z,无论如何,这会更简单:

for (z = 1; z < Z_MAX; z++) {
    for (y = 1; y < z - 1; y++) {
       zy = powers[z] - 1 - powers[y];
       x = Math.round(Math.pow(zy, 1./3));
       if (x < y && zy == powers[(int)x])
           ...report triple found;
    }
}

Your biggest "bonus" optimization will be to replace the z loop with a calculation like:

z = Math.round(Math.pow(left - 1, 1./3));

and check if z > y && left == powers[(int)z] + 1.

Other improvements if you wanted to find all triples within your limits:

  • start x at 2 instead of 1
  • replace z = Z_MAX; with break; to exit the loop early
  • compute X_MAX as Math.pow((powers[Z_MAX] + 1)/2, 1./3) ~= Z_MAX * Math.pow(0.5, 1./3) since if x is bigger than that, z will exceed Z_MAX
  • recompute Y_MAX for each x as Math.pow(powers[Z_MAX] - powers[x] + 1, 1./3)/2

BTW, a more common way to order the triples would be using z as the primary sort key, which may result in a different first 18 than you get ordering by x first. To change that, you'd make your outer loop iterate over z, which would be simpler anyway:

for (z = 1; z < Z_MAX; z++) {
    for (y = 1; y < z - 1; y++) {
       zy = powers[z] - 1 - powers[y];
       x = Math.round(Math.pow(zy, 1./3));
       if (x < y && zy == powers[(int)x])
           ...report triple found;
    }
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文