螺纹的成本(1)不同,具体取决于繁忙的循环实现
假设我们在循环迭代n
times(在此处和下面的Java 11)中执行thread.sleep(1)
:
@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Fork(jvmArgsAppend = {"-Xms1g", "-Xmx1g"})
public class ThreadSleep1Benchmark {
@Param({"5", "10", "50"})
long delay;
@Benchmark
public int sleep() throws Exception {
for (int i = 0; i < delay; i++) {
Thread.sleep(1);
}
return hashCode();
}
}
此基准显示以下结果:
Benchmark (delay) Mode Cnt Score Error Units
ThreadSleep1Benchmark.sleep 5 avgt 50 6,552 ± 0,071 ms/op
ThreadSleep1Benchmark.sleep 10 avgt 50 13,343 ± 0,227 ms/op
ThreadSleep1Benchmark.sleep 50 avgt 50 68,059 ± 1,441 ms/op
这里我们看到该方法sleep()
所需的时间超过n
毫秒,而直觉上,我们希望它为〜n
,因为每个迭代电流电流线程在1毫秒内入睡。此示例说明了使线程入睡并唤醒的成本。
现在让我们修改基准:
@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Fork(jvmArgsAppend = {"-Xms1g", "-Xmx1g"})
public class ThreadSleep2Benchmark {
private final ExecutorService executor = Executors.newFixedThreadPool(1);
volatile boolean flag;
@Param({"5", "10", "50"})
long delay;
@Setup(Level.Invocation)
public void setUp() {
flag = true;
startThread();
}
@TearDown(Level.Trial)
public void tearDown() {
executor.shutdown();
}
@Benchmark
public int sleep() throws Exception {
while (flag) {
Thread.sleep(1);
}
return hashCode();
}
private void startThread() {
executor.submit(() -> {
try {
Thread.sleep(delay);
flag = false;
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
throw new RuntimeException(e);
}
});
}
}
在这里,我们运行一个背景线程,等待n
毫秒毫秒,并在sleep()
方法迭代时将标志放下(flag) )
循环。 n
毫秒延迟后,将标志放下,我们期望 loop迭代大约
n
times。
再一次,我们看到thread.sleep(1)
的成本,但对于延迟
的5和10的成本几乎相同,而对于delay> delay> delast << /code>是50。请注意,这里的差异不是线性的:5,〜0,1毫秒,为5,〜1,2毫秒10,50毫米为〜13
Benchmark (delay) Mode Cnt Score Error Units
ThreadSleep2Benchmark.sleep 5 avgt 50 6,760 ± 0,070 ms/op
ThreadSleep2Benchmark.sleep 10 avgt 50 12,496 ± 0,050 ms/op
ThreadSleep2Benchmark.sleep 50 avgt 50 54,727 ± 0,599 ms/op
ms
Benchmark (delay) Mode Cnt Score Error Units
ThreadSleep1Benchmark.sleep 5 avgt 50 6,609 ± 0,105 ms/op
ThreadSleep1Benchmark.sleep 10 avgt 50 13,233 ± 0,148 ms/op
ThreadSleep1Benchmark.sleep 50 avgt 50 66,017 ± 0,714 ms/op
ThreadSleep2Benchmark.sleep 5 avgt 50 6,740 ± 0,067 ms/op
ThreadSleep2Benchmark.sleep 10 avgt 50 12,400 ± 0,112 ms/op
ThreadSleep2Benchmark.sleep 50 avgt 50 53,836 ± 0,250 ms/op
。问题是:threadsleep2benchmark
的成本降低的影响是编译器的成就(环状展开等)还是关于我如何在循环中迭代的方式?
upd
对于Linux,我得到了以下结果:
Java 11
Linux
Benchmark (delay) Mode Cnt Score Error Units
ThreadSleep1Benchmark.sleep 5 avgt 50 5.597 ± 0.038 ms/op
ThreadSleep1Benchmark.sleep 10 avgt 50 11.263 ± 0.069 ms/op
ThreadSleep1Benchmark.sleep 50 avgt 50 56.079 ± 0.267 ms/op
Benchmark (delay) Mode Cnt Score Error Units
ThreadSleep2Benchmark.sleep 5 avgt 50 5.600 ± 0.032 ms/op
ThreadSleep2Benchmark.sleep 10 avgt 50 10.558 ± 0.052 ms/op
ThreadSleep2Benchmark.sleep 50 avgt 50 50.625 ± 0.049 ms/op
Java 18
Benchmark (delay) Mode Cnt Score Error Units
ThreadSleep1Benchmark.sleep 5 avgt 50 5.581 ± 0.041 ms/op
ThreadSleep1Benchmark.sleep 10 avgt 50 11.069 ± 0.067 ms/op
ThreadSleep1Benchmark.sleep 50 avgt 50 55.719 ± 0.602 ms/op
Benchmark (delay) Mode Cnt Score Error Units
ThreadSleep2Benchmark.sleep 5 avgt 50 5.574 ± 0.035 ms/op
ThreadSleep2Benchmark.sleep 10 avgt 50 10.918 ± 0.035 ms/op
ThreadSleep2Benchmark.sleep 50 avgt 50 50.823 ± 0.055 ms/op
Suppose we execute Thread.sleep(1)
within a loop iterating n
times (here and below it's Java 11):
@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Fork(jvmArgsAppend = {"-Xms1g", "-Xmx1g"})
public class ThreadSleep1Benchmark {
@Param({"5", "10", "50"})
long delay;
@Benchmark
public int sleep() throws Exception {
for (int i = 0; i < delay; i++) {
Thread.sleep(1);
}
return hashCode();
}
}
This benchmark demonstrates the following results:
Benchmark (delay) Mode Cnt Score Error Units
ThreadSleep1Benchmark.sleep 5 avgt 50 6,552 ± 0,071 ms/op
ThreadSleep1Benchmark.sleep 10 avgt 50 13,343 ± 0,227 ms/op
ThreadSleep1Benchmark.sleep 50 avgt 50 68,059 ± 1,441 ms/op
Here we see that method sleep()
takes more than n
milliseconds while intuitively we would expect it to be ~n
as at each iteration current thread sleeps for 1 ms. This example demonstrates the costs of putting thread asleep and awakening it.
Let's now modify the benchmark:
@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Fork(jvmArgsAppend = {"-Xms1g", "-Xmx1g"})
public class ThreadSleep2Benchmark {
private final ExecutorService executor = Executors.newFixedThreadPool(1);
volatile boolean flag;
@Param({"5", "10", "50"})
long delay;
@Setup(Level.Invocation)
public void setUp() {
flag = true;
startThread();
}
@TearDown(Level.Trial)
public void tearDown() {
executor.shutdown();
}
@Benchmark
public int sleep() throws Exception {
while (flag) {
Thread.sleep(1);
}
return hashCode();
}
private void startThread() {
executor.submit(() -> {
try {
Thread.sleep(delay);
flag = false;
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
throw new RuntimeException(e);
}
});
}
}
Here we run a background thread that waits for n
milliseconds and puts the flag down while the sleep()
method iterates over while(flag)
loop. As soon as the flag is put down after delay of n
milliseconds we expect while
loop iterate approximately n
times.
And again we see costs of Thread.sleep(1)
but they appear to be almost same for delay
of 5 and 10 significantly lower for the case when delay
is 50. Pay attention, that the difference here is not linear: it is ~0,1 ms for 5, ~1,2 ms for 10 and ~13 ms for 50.
Benchmark (delay) Mode Cnt Score Error Units
ThreadSleep2Benchmark.sleep 5 avgt 50 6,760 ± 0,070 ms/op
ThreadSleep2Benchmark.sleep 10 avgt 50 12,496 ± 0,050 ms/op
ThreadSleep2Benchmark.sleep 50 avgt 50 54,727 ± 0,599 ms/op
On Java 18 results are similar:
Benchmark (delay) Mode Cnt Score Error Units
ThreadSleep1Benchmark.sleep 5 avgt 50 6,609 ± 0,105 ms/op
ThreadSleep1Benchmark.sleep 10 avgt 50 13,233 ± 0,148 ms/op
ThreadSleep1Benchmark.sleep 50 avgt 50 66,017 ± 0,714 ms/op
ThreadSleep2Benchmark.sleep 5 avgt 50 6,740 ± 0,067 ms/op
ThreadSleep2Benchmark.sleep 10 avgt 50 12,400 ± 0,112 ms/op
ThreadSleep2Benchmark.sleep 50 avgt 50 53,836 ± 0,250 ms/op
So my question is: whether the effect of costs reduction in ThreadSleep2Benchmark
is compiler's achievement (loop unrolling etc.) or is it about how I iterate over the loops?
UPD
For Linux I've got the following results:
Java 11
Linux
Benchmark (delay) Mode Cnt Score Error Units
ThreadSleep1Benchmark.sleep 5 avgt 50 5.597 ± 0.038 ms/op
ThreadSleep1Benchmark.sleep 10 avgt 50 11.263 ± 0.069 ms/op
ThreadSleep1Benchmark.sleep 50 avgt 50 56.079 ± 0.267 ms/op
Benchmark (delay) Mode Cnt Score Error Units
ThreadSleep2Benchmark.sleep 5 avgt 50 5.600 ± 0.032 ms/op
ThreadSleep2Benchmark.sleep 10 avgt 50 10.558 ± 0.052 ms/op
ThreadSleep2Benchmark.sleep 50 avgt 50 50.625 ± 0.049 ms/op
Java 18
Benchmark (delay) Mode Cnt Score Error Units
ThreadSleep1Benchmark.sleep 5 avgt 50 5.581 ± 0.041 ms/op
ThreadSleep1Benchmark.sleep 10 avgt 50 11.069 ± 0.067 ms/op
ThreadSleep1Benchmark.sleep 50 avgt 50 55.719 ± 0.602 ms/op
Benchmark (delay) Mode Cnt Score Error Units
ThreadSleep2Benchmark.sleep 5 avgt 50 5.574 ± 0.035 ms/op
ThreadSleep2Benchmark.sleep 10 avgt 50 10.918 ± 0.035 ms/op
ThreadSleep2Benchmark.sleep 50 avgt 50 50.823 ± 0.055 ms/op
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果您想更多地控制暂停Java线程,请查看locksupport.parknanos。在Linux默认情况下,您可以获得50 US分辨率。有关更多信息以及如何调整它,请参见 https://hazelcast.com/blog/locksupport-parknanos-under-under-the-hood-and-and-the-cuil--cuil-casuious-case-of-parking/
If you want more control on pausing a Java thread, have a look at LockSupport.parkNanos. Under Linux by default, you can get 50 us resolution. For more info and how to tune it, see https://hazelcast.com/blog/locksupport-parknanos-under-the-hood-and-the-curious-case-of-parking/