什么是真正的“热身”?多线程处理中的线程?
我正在处理 Java 中的多线程,正如有人向我指出的那样,我注意到线程会预热,随着重复执行,它们会变得更快。我想了解为什么会发生这种情况,是否与Java本身有关,或者是否是每个多线程程序的常见行为。
举例说明它的代码(由 Peter Lawrey 编写)如下:
for (int i = 0; i < 20; i++) {
ExecutorService es = Executors.newFixedThreadPool(1);
final double[] d = new double[4 * 1024];
Arrays.fill(d, 1);
final double[] d2 = new double[4 * 1024];
es.submit(new Runnable() {
@Override
public void run() {
// nothing.
}
}).get();
long start = System.nanoTime();
es.submit(new Runnable() {
@Override
public void run() {
synchronized (d) {
System.arraycopy(d, 0, d2, 0, d.length);
}
}
});
es.shutdown();
es.awaitTermination(10, TimeUnit.SECONDS);
// get a the values in d2.
for (double x : d2) ;
long time = System.nanoTime() - start;
System.out.printf("Time to pass %,d doubles to another thread and back was %,d ns.%n", d.length, time);
}
结果:
Time to pass 4,096 doubles to another thread and back was 1,098,045 ns.
Time to pass 4,096 doubles to another thread and back was 171,949 ns.
... deleted ...
Time to pass 4,096 doubles to another thread and back was 50,566 ns.
Time to pass 4,096 doubles to another thread and back was 49,937 ns.
即它变得更快并稳定在 50 ns 左右。这是为什么?
如果我运行此代码(重复 20 次),然后执行其他操作(比如对先前结果进行后处理并为另一轮多线程做准备),然后在同一个 ThreadPoolRunnable
上执行相同的 Runnable
/code> 再重复 20 次,无论如何它都已经预热了?
在我的程序中,我仅在一个线程中执行 Runnable(实际上我拥有的每个处理核心一个,它是一个 CPU 密集型程序),然后多次交替执行其他串行处理。随着程序的进行,它似乎并没有变得更快。也许我可以找到一种方法来温暖它......
I’m dealing with multithreading in Java and, as someone pointed out to me, I noticed that threads warm up, it is, they get faster as they are repeatedly executed. I would like to understand why this happens and if it is related to Java itself or whether it is a common behavior of every multithreaded program.
The code (by Peter Lawrey) that exemplifies it is the following:
for (int i = 0; i < 20; i++) {
ExecutorService es = Executors.newFixedThreadPool(1);
final double[] d = new double[4 * 1024];
Arrays.fill(d, 1);
final double[] d2 = new double[4 * 1024];
es.submit(new Runnable() {
@Override
public void run() {
// nothing.
}
}).get();
long start = System.nanoTime();
es.submit(new Runnable() {
@Override
public void run() {
synchronized (d) {
System.arraycopy(d, 0, d2, 0, d.length);
}
}
});
es.shutdown();
es.awaitTermination(10, TimeUnit.SECONDS);
// get a the values in d2.
for (double x : d2) ;
long time = System.nanoTime() - start;
System.out.printf("Time to pass %,d doubles to another thread and back was %,d ns.%n", d.length, time);
}
Results:
Time to pass 4,096 doubles to another thread and back was 1,098,045 ns.
Time to pass 4,096 doubles to another thread and back was 171,949 ns.
... deleted ...
Time to pass 4,096 doubles to another thread and back was 50,566 ns.
Time to pass 4,096 doubles to another thread and back was 49,937 ns.
I.e. it gets faster and stabilises around 50 ns. Why is that?
If I run this code (20 repetitions), then execute something else (lets say postprocessing of the previous results and preparation for another mulithreading round) and later execute the same Runnable
on the same ThreadPool
for another 20 repetitions, it will be warmed up already, in any case?
On my program, I execute the Runnable
in just one thread (actually one per processing core I have, its a CPU-intensive program), then some other serial processing alternately for many times. It doesn’t seem to get faster as the program goes. Maybe I could find a way to warm it up…
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
预热的并不是线程,而是 JVM。
JVM 具有所谓的 JIT(即时)编译。当程序运行时,它会分析程序中发生的情况并即时对其进行优化。它通过获取 JVM 运行的字节代码并将其转换为运行速度更快的本机代码来实现这一点。它可以通过最适合您当前情况的方式来执行此操作,因为它通过分析实际运行时行为来执行此操作。这可以(并不总是)带来很好的优化。甚至比一些在没有这些知识的情况下编译为本机代码的程序更重要。
您可以在 http://en.wikipedia.org/wiki/Just- 阅读更多内容in-time_compilation
当代码加载到 CPU 缓存中时,您可以在任何程序上获得类似的效果,但我相信这会是一个较小的差异。
It isn't the threads that are warming up so much as the JVM.
The JVM has what's called JIT (Just In Time) compiling. As the program is running, it analyzes what's happening in the program and optimizes it on the fly. It does this by taking the byte code that the JVM runs and converting it to native code that runs faster. It can do this in a way that is optimal for your current situation, as it does this by analyzing the actual runtime behavior. This can (not always) result in great optimization. Even more so than some programs that are compiled to native code without such knowledge.
You can read a bit more at http://en.wikipedia.org/wiki/Just-in-time_compilation
You could get a similar effect on any program as code is loaded into the CPU caches, but I believe this will be a smaller difference.
我认为线程执行速度更快的唯一原因是:
内存管理器可以重用已经分配的对象空间(例如,让堆分配填充可用内存,直到达到最大内存 -
Xmx
属性)工作集在硬件缓存中可用
重复操作可能会创建编译器可以更轻松地重新排序以优化执行的操作
The only reasons I see that a thread execution can end up being faster are:
The memory manager can reuse already allocated object space (e.g., to let heap allocations fill up the available memory until the max memory is reached - the
Xmx
property)The working set is available in the hardware cache
Repeating operations might create operations the compiler can easier reorder to optimize execution