螺纹的成本（1）不同，具体取决于繁忙的循环实现

发布于 2025-02-10 00:26:25 字数 4502 浏览 2 评论 0原文

假设我们在循环迭代n times（在此处和下面的Java 11）中执行thread.sleep（1）：

@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Fork(jvmArgsAppend = {"-Xms1g", "-Xmx1g"})
public class ThreadSleep1Benchmark {
  @Param({"5", "10", "50"})
  long delay;

  @Benchmark
  public int sleep() throws Exception {
    for (int i = 0; i < delay; i++) {
      Thread.sleep(1);
    }
    return hashCode();
  }
}

此基准显示以下结果：

Benchmark                    (delay)  Mode  Cnt   Score   Error  Units
ThreadSleep1Benchmark.sleep        5  avgt   50   6,552 ± 0,071  ms/op
ThreadSleep1Benchmark.sleep       10  avgt   50  13,343 ± 0,227  ms/op
ThreadSleep1Benchmark.sleep       50  avgt   50  68,059 ± 1,441  ms/op

这里我们看到该方法sleep（）所需的时间超过n毫秒，而直觉上，我们希望它为〜n，因为每个迭代电流电流线程在1毫秒内入睡。此示例说明了使线程入睡并唤醒的成本。

现在让我们修改基准：

@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Fork(jvmArgsAppend = {"-Xms1g", "-Xmx1g"})
public class ThreadSleep2Benchmark {
  private final ExecutorService executor = Executors.newFixedThreadPool(1);
  volatile boolean flag;

  @Param({"5", "10", "50"})
  long delay;

  @Setup(Level.Invocation)
  public void setUp() {
    flag = true;
    startThread();
  }

  @TearDown(Level.Trial)
  public void tearDown() {
    executor.shutdown();
  }

  @Benchmark
  public int sleep() throws Exception {
    while (flag) {
      Thread.sleep(1);
    }
    return hashCode();
  }

  private void startThread() {
    executor.submit(() -> {
      try {
        Thread.sleep(delay);
        flag = false;
      } catch (InterruptedException e) {
        Thread.currentThread().interrupt();
        throw new RuntimeException(e);
      }
    });
  }
}

在这里，我们运行一个背景线程，等待n毫秒毫秒，并在sleep（）方法迭代时将标志放下（flag））循环。 n毫秒延迟后，将标志放下，我们期望 loop迭代大约n times。

再一次，我们看到thread.sleep（1）的成本，但对于延迟的5和10的成本几乎相同，而对于delay> delay> delast << /code>是50。请注意，这里的差异不是线性的：5，〜0,1毫秒，为5，〜1,2毫秒10，50毫米为〜13

Benchmark                    (delay)  Mode  Cnt   Score   Error  Units
ThreadSleep2Benchmark.sleep        5  avgt   50   6,760 ± 0,070  ms/op
ThreadSleep2Benchmark.sleep       10  avgt   50  12,496 ± 0,050  ms/op
ThreadSleep2Benchmark.sleep       50  avgt   50  54,727 ± 0,599  ms/op

Benchmark                    (delay)  Mode  Cnt   Score   Error  Units
ThreadSleep1Benchmark.sleep        5  avgt   50   6,609 ± 0,105  ms/op
ThreadSleep1Benchmark.sleep       10  avgt   50  13,233 ± 0,148  ms/op
ThreadSleep1Benchmark.sleep       50  avgt   50  66,017 ± 0,714  ms/op

ThreadSleep2Benchmark.sleep        5  avgt   50   6,740 ± 0,067  ms/op
ThreadSleep2Benchmark.sleep       10  avgt   50  12,400 ± 0,112  ms/op
ThreadSleep2Benchmark.sleep       50  avgt   50  53,836 ± 0,250  ms/op

。问题是：threadsleep2benchmark的成本降低的影响是编译器的成就（环状展开等）还是关于我如何在循环中迭代的方式？

upd

对于Linux，我得到了以下结果：

Java 11

Linux

Benchmark                    (delay)  Mode  Cnt   Score   Error  Units
ThreadSleep1Benchmark.sleep        5  avgt   50   5.597 ± 0.038  ms/op
ThreadSleep1Benchmark.sleep       10  avgt   50  11.263 ± 0.069  ms/op
ThreadSleep1Benchmark.sleep       50  avgt   50  56.079 ± 0.267  ms/op

Benchmark                    (delay)  Mode  Cnt   Score   Error  Units
ThreadSleep2Benchmark.sleep        5  avgt   50   5.600 ± 0.032  ms/op
ThreadSleep2Benchmark.sleep       10  avgt   50  10.558 ± 0.052  ms/op
ThreadSleep2Benchmark.sleep       50  avgt   50  50.625 ± 0.049  ms/op

Java 18

Benchmark                    (delay)  Mode  Cnt   Score   Error  Units
ThreadSleep1Benchmark.sleep        5  avgt   50   5.581 ± 0.041  ms/op
ThreadSleep1Benchmark.sleep       10  avgt   50  11.069 ± 0.067  ms/op
ThreadSleep1Benchmark.sleep       50  avgt   50  55.719 ± 0.602  ms/op

Benchmark                    (delay)  Mode  Cnt   Score   Error  Units
ThreadSleep2Benchmark.sleep        5  avgt   50   5.574 ± 0.035  ms/op
ThreadSleep2Benchmark.sleep       10  avgt   50  10.918 ± 0.035  ms/op
ThreadSleep2Benchmark.sleep       50  avgt   50  50.823 ± 0.055  ms/op

原文

Suppose we execute Thread.sleep(1) within a loop iterating n times (here and below it's Java 11):

@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Fork(jvmArgsAppend = {"-Xms1g", "-Xmx1g"})
public class ThreadSleep1Benchmark {
  @Param({"5", "10", "50"})
  long delay;

  @Benchmark
  public int sleep() throws Exception {
    for (int i = 0; i < delay; i++) {
      Thread.sleep(1);
    }
    return hashCode();
  }
}

This benchmark demonstrates the following results:

Benchmark                    (delay)  Mode  Cnt   Score   Error  Units
ThreadSleep1Benchmark.sleep        5  avgt   50   6,552 ± 0,071  ms/op
ThreadSleep1Benchmark.sleep       10  avgt   50  13,343 ± 0,227  ms/op
ThreadSleep1Benchmark.sleep       50  avgt   50  68,059 ± 1,441  ms/op

Here we see that method sleep() takes more than n milliseconds while intuitively we would expect it to be ~n as at each iteration current thread sleeps for 1 ms. This example demonstrates the costs of putting thread asleep and awakening it.

Let's now modify the benchmark:

@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Fork(jvmArgsAppend = {"-Xms1g", "-Xmx1g"})
public class ThreadSleep2Benchmark {
  private final ExecutorService executor = Executors.newFixedThreadPool(1);
  volatile boolean flag;

  @Param({"5", "10", "50"})
  long delay;

  @Setup(Level.Invocation)
  public void setUp() {
    flag = true;
    startThread();
  }

  @TearDown(Level.Trial)
  public void tearDown() {
    executor.shutdown();
  }

  @Benchmark
  public int sleep() throws Exception {
    while (flag) {
      Thread.sleep(1);
    }
    return hashCode();
  }

  private void startThread() {
    executor.submit(() -> {
      try {
        Thread.sleep(delay);
        flag = false;
      } catch (InterruptedException e) {
        Thread.currentThread().interrupt();
        throw new RuntimeException(e);
      }
    });
  }
}

Here we run a background thread that waits for n milliseconds and puts the flag down while the sleep() method iterates over while(flag) loop. As soon as the flag is put down after delay of n milliseconds we expect while loop iterate approximately n times.

And again we see costs of Thread.sleep(1) but they appear to be almost same for delay of 5 and 10 significantly lower for the case when delay is 50. Pay attention, that the difference here is not linear: it is ~0,1 ms for 5, ~1,2 ms for 10 and ~13 ms for 50.

Benchmark                    (delay)  Mode  Cnt   Score   Error  Units
ThreadSleep2Benchmark.sleep        5  avgt   50   6,760 ± 0,070  ms/op
ThreadSleep2Benchmark.sleep       10  avgt   50  12,496 ± 0,050  ms/op
ThreadSleep2Benchmark.sleep       50  avgt   50  54,727 ± 0,599  ms/op

On Java 18 results are similar:

Benchmark                    (delay)  Mode  Cnt   Score   Error  Units
ThreadSleep1Benchmark.sleep        5  avgt   50   6,609 ± 0,105  ms/op
ThreadSleep1Benchmark.sleep       10  avgt   50  13,233 ± 0,148  ms/op
ThreadSleep1Benchmark.sleep       50  avgt   50  66,017 ± 0,714  ms/op

ThreadSleep2Benchmark.sleep        5  avgt   50   6,740 ± 0,067  ms/op
ThreadSleep2Benchmark.sleep       10  avgt   50  12,400 ± 0,112  ms/op
ThreadSleep2Benchmark.sleep       50  avgt   50  53,836 ± 0,250  ms/op

So my question is: whether the effect of costs reduction in ThreadSleep2Benchmark is compiler's achievement (loop unrolling etc.) or is it about how I iterate over the loops?

UPD

For Linux I've got the following results:

Java 11

Linux

Benchmark                    (delay)  Mode  Cnt   Score   Error  Units
ThreadSleep1Benchmark.sleep        5  avgt   50   5.597 ± 0.038  ms/op
ThreadSleep1Benchmark.sleep       10  avgt   50  11.263 ± 0.069  ms/op
ThreadSleep1Benchmark.sleep       50  avgt   50  56.079 ± 0.267  ms/op

Benchmark                    (delay)  Mode  Cnt   Score   Error  Units
ThreadSleep2Benchmark.sleep        5  avgt   50   5.600 ± 0.032  ms/op
ThreadSleep2Benchmark.sleep       10  avgt   50  10.558 ± 0.052  ms/op
ThreadSleep2Benchmark.sleep       50  avgt   50  50.625 ± 0.049  ms/op

Java 18

Benchmark                    (delay)  Mode  Cnt   Score   Error  Units
ThreadSleep1Benchmark.sleep        5  avgt   50   5.581 ± 0.041  ms/op
ThreadSleep1Benchmark.sleep       10  avgt   50  11.069 ± 0.067  ms/op
ThreadSleep1Benchmark.sleep       50  avgt   50  55.719 ± 0.602  ms/op

Benchmark                    (delay)  Mode  Cnt   Score   Error  Units
ThreadSleep2Benchmark.sleep        5  avgt   50   5.574 ± 0.035  ms/op
ThreadSleep2Benchmark.sleep       10  avgt   50  10.918 ± 0.035  ms/op
ThreadSleep2Benchmark.sleep       50  avgt   50  50.823 ± 0.055  ms/op

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

彡翼 2025-02-17 00:26:25

如果您想更多地控制暂停Java线程，请查看locksupport.parknanos。在Linux默认情况下，您可以获得50 US分辨率。有关更多信息以及如何调整它，请参见 https://hazelcast.com/blog/locksupport-parknanos-under-under-the-hood-and-and-the-cuil--cuil-casuious-case-of-parking/

回复收藏 0 原文

~没有更多了~