嵌套循环的效率

发布于 2024-08-27 17:40:22 字数 875 浏览 4 评论 0原文

请参阅以下代码片段:

    Long first_begin = System.currentTimeMillis();

    // first nested loops
    for (int i = 0; i < 10; i++) {
        for (int j = 0; j < 1000000; j++) {
            // do some stuff
        }
    }
    System.out.println(System.currentTimeMillis() - first_begin);
    // second nested loops
    Long seconde_begin = System.currentTimeMillis();
    for (int i = 0; i < 1000000; i++) {
        for (int j = 0; j < 10; j++) {
            // do some stuff
        }
    }
    System.out.println(System.currentTimeMillis() - seconde_begin);

我想知道为什么第一个嵌套循环比第二个嵌套循环运行得慢?

问候!

重要提示!:很抱歉,在第一次提出这个问题时,我不小心将变量 j 设置为以 1 开头,我已进行更正。

更新:循环中没有任何具体的逻辑,我只是​​在做一些测试,实际上这是面试时提出的问题,面试官提示我改变循环的顺序以获得更好的性能。顺便说一句,我使用的是 JDK1.5。经过一些测试后,我现在更困惑了,因为程序的结果不一致——有时第一个循环运行得比第二个循环快,但大多数时候它运行得比第二个循环慢。

See the following snippet:

    Long first_begin = System.currentTimeMillis();

    // first nested loops
    for (int i = 0; i < 10; i++) {
        for (int j = 0; j < 1000000; j++) {
            // do some stuff
        }
    }
    System.out.println(System.currentTimeMillis() - first_begin);
    // second nested loops
    Long seconde_begin = System.currentTimeMillis();
    for (int i = 0; i < 1000000; i++) {
        for (int j = 0; j < 10; j++) {
            // do some stuff
        }
    }
    System.out.println(System.currentTimeMillis() - seconde_begin);

I am wondering why the first nested loops is running slower than the second one?

Regards!

Important Note!: I am sorry that I made the variable j beginning with 1 accidentally when this question is first asked, I have made the correction.

Update:there is not any specific logic within the loops, I am just doing some test, actually this is a question asked during an interview and the interviewer hint me to change the order of loops to achieve better performance. BTW, I am using JDK1.5. after some test I am more confused now, because the result of program is not consistent---sometime the first loop running faster than the second one, but most of the time it's running slower than second one.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

只想待在家 2024-09-03 17:40:22

这个答案适用于更新的问题:

  • 如果您正在访问二维数组,例如 int[][],则内部循环中具有较大值的数组应该会更慢。虽然不是很多,但仍然如此。要稍微了解这个问题,请阅读 Joel 的一篇博客文章中有关街头画家 Shlemiel 的内容。
  • 您获得不一致结果的原因是您没有执行任何 JVM 预热。 JVM 不断分析运行的字节码并对其进行优化,通常只有在 30 到 50 次迭代之后才以最佳速度运行。是的,这意味着您需要先运行代码几十次,然后根据另外几十次运行的平均值对其进行基准测试,因为垃圾收集器会减慢几次运行的速度。
  • 一般注意,使用 Long 对象而不是 long 原语只是愚蠢的,JVM 很可能通过将其替换为原语来优化它,如果可以的话,如果不能的话,使用它肯定会出现一些(尽管非常轻微)持续的速度减慢。

This answer is for the updated question:

  • If you're accessing two dimensional array such as int[][], the one with the larger value in the inner loop should be slower. Not by much but still. To somewhat understand the problem, read about Shlemiel the street painter in one of Joel's blog posts.
  • The reason you're getting inconsistent results is that you're not performing any JVM warmup. JVM constantly analyzes the bytecode that is run and optimizes it, usually only after 30 to 50 iterations it runs at optimal speed. Yes, this means you need to run the code first a couple of dozen times and then benchmark it from an average of another couple dozen runs because of Garbage Collector which will slow couple of runs.
  • General note, using Long object instead of long primitive is just dumb, JVM most likely optimizes it by replacing it with the primitive one if it can and if it can't, there's bound to be some (albeit extremely minor) constant slowdown from using it.
满身野味 2024-09-03 17:40:22

编辑:原始答案如下。现在您已经修复了该示例,以便所有循环变量都从 0 开始,但我们又回到了没有足够信息的情况。看起来可能是缓存一致性/引用位置问题 - 但我们只是猜测。如果您可以提供一个简短但完整的程序来演示该问题,那将会有所帮助......就像告诉我们我们正在谈论哪种语言/平台一样!


第一个循环有 10 * 999999 = 9999990 次迭代。第二个循环有 1000000 * 9 = 9000000 次迭代。因此,我预计(所有其他条件相同)第一个循环需要更长的时间。

但是,您尚未表明您正在做什么工作或该工作在什么平台上进行。有很多因素可能会影响事情:

  • 第二个循环可能会更好地命中缓存
  • 如果您使用的是 JIT 编译平台,JIT 可能会选择更大力地优化第二个循环。
  • 您正在执行的操作本身可能有缓存或类似的东西
  • 如果您正在执行少量工作但它首先需要加载和初始化一堆类型,这可能会导致第一个循环变慢

EDIT: Original answer is below. Now that you've fixed the example so that all loop variables start at 0, we're back to simply not having enough information. It seems likely that it's a cache coherency / locality of reference issue - but we're just guessing. If you could provide a short but complete program which demonstrates the problem, that would help... as would telling us which language/platform we're talking about to start with!


The first loop has 10 * 999999 = 9999990 iterations. The second loop has 1000000 * 9 = 9000000 iterations. I would therefore expect (all other things being equal) the first loop to take longer.

However, you haven't indicated what work you're doing or what platform this is on. There are many things which could affect things:

  • The second loop may hit a cache better
  • If you're using a JIT-compiled platform, the JIT may have chosen to optimise the second loop more heavily.
  • The operations you're performing may themselves have caching or something like that
  • If you're performing a small amount of work but it first needs to load and initialize a bunch of types, that could cause the first loop to be slower
柒七 2024-09-03 17:40:22

问题转移了。这些不是您想要的机器人...

因为您在第一个示例中做了约 1000000 倍的工作。 ;-)

The question shifted. These are not the droids you seek...

Because you are doing ~1000000 times more work in the first example. ;-)

凯凯我们等你回来 2024-09-03 17:40:22

如果您查看生成的字节码,就会发现这两个循环几乎相同。除了当它执行 10 循环的 while 条件时,Java 从指令中获取 10 作为立即值,但是当它执行 1000000 循环的 while 条件时,Java 从变量加载 1000000。我没有任何关于执行每条指令需要多长时间的信息,但立即加载似乎比从变量加载更快。

请注意,在第一个循环中,必须与 1000000 进行比较 1000 万次,而在第二个循环中只需要进行 100 万次。当然,在第二个循环中与 10 进行比较的频率要高得多,但如果变量加载比立即加载慢得多,那就可以解释您所看到的结果。

If you look at the generated byte code, the two loops are almost identical. EXCEPT that when it does the while-condition for the 10 loop, Java gets the 10 as an immediate value from within the instruction, but when it does the while-condition for the 1000000 loop, Java loads the 1000000 from a variable. I don't have any info on how long it takes to execute each instruction, but it seems likely that an immediate load will be faster than a load from a variable.

Note, then, that in the first loop, the compare against 1000000 must be done 10 million times while in the second loop it is only done 1 million times. Of course the compare against 10 is done much more often in the second loop, but if the variable load is much slower than the immediate load, that would explain the results you are seeing.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文