是否可以使用多线程而不需要一遍又一遍地创建线程?
首先,再次感谢所有已经回答我问题的人。我不是一个经验丰富的程序员,这是我第一次体验多线程。
我有一个与我的问题非常相似的例子。我希望这可以缓解我们的情况。
public class ThreadMeasuring {
private static final int TASK_TIME = 1; //microseconds
private static class Batch implements Runnable {
CountDownLatch countDown;
public Batch(CountDownLatch countDown) {
this.countDown = countDown;
}
@Override
public void run() {
long t0 =System.nanoTime();
long t = 0;
while(t<TASK_TIME*1e6){ t = System.nanoTime() - t0; }
if(countDown!=null) countDown.countDown();
}
}
public static void main(String[] args) {
ThreadFactory threadFactory = new ThreadFactory() {
int counter = 1;
@Override
public Thread newThread(Runnable r) {
Thread t = new Thread(r, "Executor thread " + (counter++));
return t;
}
};
// the total duty to be divided in tasks is fixed (problem dependent).
// Increase ntasks will mean decrease the task time proportionally.
// 4 Is an arbitrary example.
// This tasks will be executed thousands of times, inside a loop alternating
// with serial processing that needs their result and prepare the next ones.
int ntasks = 4;
int nthreads = 2;
int ncores = Runtime.getRuntime().availableProcessors();
if (nthreads<ncores) ncores = nthreads;
Batch serial = new Batch(null);
long serialTime = System.nanoTime();
serial.run();
serialTime = System.nanoTime() - serialTime;
ExecutorService executor = Executors.newFixedThreadPool( nthreads, threadFactory );
CountDownLatch countDown = new CountDownLatch(ntasks);
ArrayList<Batch> batches = new ArrayList<Batch>();
for (int i = 0; i < ntasks; i++) {
batches.add(new Batch(countDown));
}
long start = System.nanoTime();
for (Batch r : batches){
executor.execute(r);
}
// wait for all threads to finish their task
try {
countDown.await();
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
long tmeasured = (System.nanoTime() - start);
System.out.println("Task time= " + TASK_TIME + " ms");
System.out.println("Number of tasks= " + ntasks);
System.out.println("Number of threads= " + nthreads);
System.out.println("Number of cores= " + ncores);
System.out.println("Measured time= " + tmeasured);
System.out.println("Theoretical serial time= " + TASK_TIME*1000000*ntasks);
System.out.println("Theoretical parallel time= " + (TASK_TIME*1000000*ntasks)/ncores);
System.out.println("Speedup= " + (serialTime*ntasks)/(double)tmeasured);
executor.shutdown();
}
}
每个批次不进行计算,而是等待某个给定时间。该程序计算加速,理论上总是2,但如果“TASK_TIME”较小,则可能会小于1(实际上速度下降)。
我的计算最多需要 1 毫秒,而且通常更快。对于 1 毫秒,我发现速度提高了 30% 左右,但实际上,在我的程序中,我注意到速度有所下降。
这段代码的结构与我的程序非常相似,所以如果你能帮助我优化线程处理,我将非常感激。
亲切的问候。
下面是原来的问题:
嗨。
我想在我的程序上使用多线程,因为我相信它可以大大提高效率。它的运行时间大部分是由于独立计算造成的。
我的程序有数千个独立的计算(需要解决几个线性系统),但它们只是由几十个左右的小团体同时发生。每个组都需要几毫秒才能运行。在完成一组计算之后,程序必须按顺序运行一段时间,然后我必须再次求解线性系统。
实际上,可以将其视为要求解的这些独立线性系统位于迭代数千次的循环内,与取决于先前结果的顺序计算交替进行。我加速程序的想法是在并行线程中计算这些独立计算,通过将每个组划分为(我可用的处理器数量)独立计算批次。所以原则上根本不用排队。
我尝试使用FixedThreadPool和CachedThreadPool,它甚至比串行处理还要慢。每次我需要解决批次问题时,似乎都需要花费太多时间来创建新的踏板。
有没有更好的方法来处理这个问题?我使用的这些池似乎适合每个线程花费更多时间而不是数千个较小线程的情况...
谢谢! 此致!
First and once more, thanks to all that already answered my question. I am not a very experienced programmer and it is my first experience with multithreading.
I got an example that is working quite like my problem. I hope it could ease our case here.
public class ThreadMeasuring {
private static final int TASK_TIME = 1; //microseconds
private static class Batch implements Runnable {
CountDownLatch countDown;
public Batch(CountDownLatch countDown) {
this.countDown = countDown;
}
@Override
public void run() {
long t0 =System.nanoTime();
long t = 0;
while(t<TASK_TIME*1e6){ t = System.nanoTime() - t0; }
if(countDown!=null) countDown.countDown();
}
}
public static void main(String[] args) {
ThreadFactory threadFactory = new ThreadFactory() {
int counter = 1;
@Override
public Thread newThread(Runnable r) {
Thread t = new Thread(r, "Executor thread " + (counter++));
return t;
}
};
// the total duty to be divided in tasks is fixed (problem dependent).
// Increase ntasks will mean decrease the task time proportionally.
// 4 Is an arbitrary example.
// This tasks will be executed thousands of times, inside a loop alternating
// with serial processing that needs their result and prepare the next ones.
int ntasks = 4;
int nthreads = 2;
int ncores = Runtime.getRuntime().availableProcessors();
if (nthreads<ncores) ncores = nthreads;
Batch serial = new Batch(null);
long serialTime = System.nanoTime();
serial.run();
serialTime = System.nanoTime() - serialTime;
ExecutorService executor = Executors.newFixedThreadPool( nthreads, threadFactory );
CountDownLatch countDown = new CountDownLatch(ntasks);
ArrayList<Batch> batches = new ArrayList<Batch>();
for (int i = 0; i < ntasks; i++) {
batches.add(new Batch(countDown));
}
long start = System.nanoTime();
for (Batch r : batches){
executor.execute(r);
}
// wait for all threads to finish their task
try {
countDown.await();
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
long tmeasured = (System.nanoTime() - start);
System.out.println("Task time= " + TASK_TIME + " ms");
System.out.println("Number of tasks= " + ntasks);
System.out.println("Number of threads= " + nthreads);
System.out.println("Number of cores= " + ncores);
System.out.println("Measured time= " + tmeasured);
System.out.println("Theoretical serial time= " + TASK_TIME*1000000*ntasks);
System.out.println("Theoretical parallel time= " + (TASK_TIME*1000000*ntasks)/ncores);
System.out.println("Speedup= " + (serialTime*ntasks)/(double)tmeasured);
executor.shutdown();
}
}
Instead of doing the calculations, each batch just waits for some given time. The program calculates the speedup, that would allways be 2 in theory but can get less than 1 (actually a speed down) if the 'TASK_TIME' is small.
My calculations take at the top 1 ms and are commonly faster. For 1 ms I find a little speedup of around 30%, but in practice, with my program, I notice a speed down.
The structure of this code is very similar to my program, so if you could help me to optimise the thread handling I would be very grateful.
Kind regards.
Below, the original question:
Hi.
I would like to use multithreading on my program, since it could increase its efficiency considerably, I believe. Most of its running time is due to independent calculations.
My program has thousands of independent calculations (several linear systems to solve), but they just happen at the same time by minor groups of dozens or so. Each of this groups would take some miliseconds to run. After one of these groups of calculations, the program has to run sequentially for a little while and then I have to solve the linear systems again.
Actually, it can be seen as these independent linear systems to solve are inside a loop that iterates thousands of times, alternating with sequential calculations that depends on the previous results. My idea to speed up the program is to compute these independent calculations in parallel threads, by dividing each group into (the number of processors I have available) batches of independent calculation. So, in principle, there isn't queuing at all.
I tried using the FixedThreadPool and CachedThreadPool and it got even slower than serial processing. It seems to takes too much time creating new Treads each time I need to solve the batches.
Is there a better way to handle this problem? These pools I've used seem to be proper for cases when each thread takes more time instead of thousands of smaller threads...
Thanks!
Best Regards!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
线程池不会一遍又一遍地创建新线程。这就是为什么它们是游泳池。
您使用了多少个线程以及有多少个 CPU/核心?系统负载是什么样的(通常,当您串行执行它们时,以及当您使用池执行时)?是否涉及同步或任何类型的锁定?
并行执行的算法是否与串行执行的算法完全相同(您的描述似乎表明串行正在重用上一次迭代的一些结果)。
Thread pools don't create new threads over and over. That's why they're pools.
How many threads were you using and how many CPUs/cores do you have? What is the system load like (normally, when you execute them serially, and when you execute with the pool)? Is synchronization or any kind of locking involved?
Is the algorithm for parallel execution exactly the same as the serial one (your description seems to suggest that serial was reusing some results from previous iteration).
从我读到的内容来看:“数千个独立计算...同时发生...需要几毫秒才能运行”,在我看来,您的问题非常适合 GPU 编程。
我认为它回答了你的问题。 GPU 编程变得越来越流行。有针对 CUDA 和 Java 的绑定。 OpenCL。如果您可以使用它,我建议您使用它。
From what i've read: "thousands of independent calculations... happen at the same time... would take some miliseconds to run" it seems to me that your problem is perfect for GPU programming.
And i think it answers you question. GPU programming is becoming more and more popular. There are Java bindings for CUDA & OpenCL. If it is possible for you to use it, i say go for it.
我不确定您如何执行计算,但如果您将它们分成小组,那么您的应用程序可能适合生产者/消费者模式。
此外,您可能有兴趣使用 阻塞队列。计算使用者将阻塞,直到队列中有内容并且阻塞发生在
take()
调用上。抱歉,如果有任何语法错误,我一直在咀嚼 C# 代码,有时我忘记了正确的 java 语法,但总体思路是存在的。
I'm not sure how you perform the calculations, but if you're breaking them up into small groups, then your application might be ripe for the Producer/Consumer pattern.
Additionally, you might be interested in using a BlockingQueue. The calculation consumers will block until there is something in the queue and the block occurs on the
take()
call.Sorry if there are any syntax errors, I've been chomping away at C# code and sometimes I forget the proper java syntax, but the general idea is there.
如果您遇到的问题无法扩展到多核,则需要更改程序,或者您遇到的问题并不像您想象的那么并行。我怀疑您有其他类型的错误,但根据所提供的信息无法确定。
此测试代码可能会有所帮助。
代码
编辑:假设您有一个连续执行此操作的循环。
您可能会认为更改为这样的循环会更快,但问题是开销可能大于收益。
因此,您需要创建一批工作(每个线程至少一个),以便有足够的任务来保持所有线程忙碌,但任务又不能太多,以免线程花费时间在开销上。
EDIT2:运行在线程之间复制数据的测试。
开始时很糟糕,但升温到约 50 秒。
If you have a problem which does not scale to multiple cores, you need to change your program or you have a problem which is not as parallel as you think. I suspect you have some other type of bug, but cannot say based on the information given.
This test code might help.
code
EDIT: Say you have a loop which serially does this.
You might assume that changing to loop like this would be faster, but the problem is that the overhead could be greater than the gain.
So you need to create batches of work (at least one per thread) so there are enough tasks to keep all the threads busy, but not so many tasks that your threads are spending time in overhead.
EDIT2: RUnning atest which copied data between threads.
starts badly but warms up to ~50 us.
嗯,
CachedThreadPool
似乎是专门为您的情况创建的。如果您足够快地重用线程,它不会重新创建线程,并且如果您在使用新线程之前花了整整一分钟,则线程创建的开销相对可以忽略不计。但是,除非您也可以并行访问数据,否则您不能指望并行执行能够加快计算速度。如果您使用大量的锁定、许多同步方法等,您将花费更多的开销而不是并行处理的收益。检查您的数据是否可以有效地并行处理,并且代码中没有隐藏不明显的同步。
此外,如果数据完全适合缓存,CPU 就能有效地处理数据。如果每个线程的数据集大于缓存的一半,两个线程将竞争缓存并发出许多 RAM 读取,而一个线程如果仅使用一个核心,可能会执行得更好,因为它避免了在其执行的紧密循环中的 RAM 读取。也检查一下这个。
Hmm,
CachedThreadPool
seems to be created just for your case. It does not recreate threads if you reuse them soon enough, and if you spend a whole minute before you use new thread, the overhead of thread creation is comparatively negligible.But you can't expect parallel execution to speed up your calculations unless you can also access data in parallel. If you employ extensive locking, many synchronized methods, etc you'll spend more on overhead than gain on parallel processing. Check that your data can be efficiently processed in parallel and that you don't have non-obvious synchronizations lurkinb in the code.
Also, CPUs process data efficiently if data fully fit into cache. If data sets of each thread is bigger than half the cache, two threads will compete for cache and issue many RAM reads, while one thread, if only employing one core, may perform better because it avoids RAM reads in the tight loop it executes. Check this, too.
这是我的想法的伪概述
如果您要调用外部程序,则另一种选择。不要将它们放入一次执行一个的循环中,否则它们将不会并行运行。您可以将它们放入循环中,一次处理一个,但不能一次执行一个。
Here's a psuedo outline of what I'm thinking
Another option, if you're calling external programs. Don't put them in a loop that does them one at a time or they won't run in parallel. You can put them in a loop that PROCESSES them one at a time, but not that execs them one at a time.