是否可以使用多线程而不需要一遍又一遍地创建线程？

发布于 2024-10-17 19:08:57 字数 3482 浏览 4 评论 0原文

首先，再次感谢所有已经回答我问题的人。我不是一个经验丰富的程序员，这是我第一次体验多线程。

我有一个与我的问题非常相似的例子。我希望这可以缓解我们的情况。

public class ThreadMeasuring {
private static final int TASK_TIME = 1; //microseconds
private static class Batch implements Runnable {
    CountDownLatch countDown;
    public Batch(CountDownLatch countDown) {
        this.countDown = countDown;
    }

    @Override
    public void run() {         
        long t0 =System.nanoTime();
        long t = 0;
        while(t<TASK_TIME*1e6){ t = System.nanoTime() - t0; }

        if(countDown!=null) countDown.countDown();
    }
}

public static void main(String[] args) {
    ThreadFactory threadFactory = new ThreadFactory() {
        int counter = 1;
        @Override
        public Thread newThread(Runnable r) {
            Thread t = new Thread(r, "Executor thread " + (counter++));
            return t;
        }
    };

  // the total duty to be divided in tasks is fixed (problem dependent). 
  // Increase ntasks will mean decrease the task time proportionally. 
  // 4 Is an arbitrary example.
  // This tasks will be executed thousands of times, inside a loop alternating 
  // with serial processing that needs their result and prepare the next ones.
    int ntasks = 4; 
    int nthreads = 2;
    int ncores = Runtime.getRuntime().availableProcessors();
    if (nthreads<ncores) ncores = nthreads;     

    Batch serial = new Batch(null);
    long serialTime = System.nanoTime();
    serial.run();
    serialTime = System.nanoTime() - serialTime;

    ExecutorService executor = Executors.newFixedThreadPool( nthreads, threadFactory );
    CountDownLatch countDown = new CountDownLatch(ntasks);

    ArrayList<Batch> batches = new ArrayList<Batch>();
    for (int i = 0; i < ntasks; i++) {
        batches.add(new Batch(countDown));
    }

    long start = System.nanoTime();
    for (Batch r : batches){
        executor.execute(r);
    }

    // wait for all threads to finish their task
    try {
        countDown.await();
    } catch (InterruptedException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
    long tmeasured = (System.nanoTime() - start);

    System.out.println("Task time= " + TASK_TIME + " ms");
    System.out.println("Number of tasks= " + ntasks);
    System.out.println("Number of threads= " + nthreads);
    System.out.println("Number of cores= " + ncores);
    System.out.println("Measured time= " + tmeasured);
    System.out.println("Theoretical serial time= " + TASK_TIME*1000000*ntasks);
    System.out.println("Theoretical parallel time= " + (TASK_TIME*1000000*ntasks)/ncores);
    System.out.println("Speedup= " + (serialTime*ntasks)/(double)tmeasured);

    executor.shutdown();
}
 }

每个批次不进行计算，而是等待某个给定时间。该程序计算加速，理论上总是2，但如果“TASK_TIME”较小，则可能会小于1（实际上速度下降）。

我的计算最多需要 1 毫秒，而且通常更快。对于 1 毫秒，我发现速度提高了 30% 左右，但实际上，在我的程序中，我注意到速度有所下降。

这段代码的结构与我的程序非常相似，所以如果你能帮助我优化线程处理，我将非常感激。

亲切的问候。

下面是原来的问题：

嗨。

我想在我的程序上使用多线程，因为我相信它可以大大提高效率。它的运行时间大部分是由于独立计算造成的。

我的程序有数千个独立的计算（需要解决几个线性系统），但它们只是由几十个左右的小团体同时发生。每个组都需要几毫秒才能运行。在完成一组计算之后，程序必须按顺序运行一段时间，然后我必须再次求解线性系统。

实际上，可以将其视为要求解的这些独立线性系统位于迭代数千次的循环内，与取决于先前结果的顺序计算交替进行。我加速程序的想法是在并行线程中计算这些独立计算，通过将每个组划分为（我可用的处理器数量）独立计算批次。所以原则上根本不用排队。

我尝试使用FixedThreadPool和CachedThreadPool，它甚至比串行处理还要慢。每次我需要解决批次问题时，似乎都需要花费太多时间来创建新的踏板。

有没有更好的方法来处理这个问题？我使用的这些池似乎适合每个线程花费更多时间而不是数千个较小线程的情况...

谢谢！此致！

原文

First and once more, thanks to all that already answered my question. I am not a very experienced programmer and it is my first experience with multithreading.

I got an example that is working quite like my problem. I hope it could ease our case here.

public class ThreadMeasuring {
private static final int TASK_TIME = 1; //microseconds
private static class Batch implements Runnable {
    CountDownLatch countDown;
    public Batch(CountDownLatch countDown) {
        this.countDown = countDown;
    }

    @Override
    public void run() {         
        long t0 =System.nanoTime();
        long t = 0;
        while(t<TASK_TIME*1e6){ t = System.nanoTime() - t0; }

        if(countDown!=null) countDown.countDown();
    }
}

public static void main(String[] args) {
    ThreadFactory threadFactory = new ThreadFactory() {
        int counter = 1;
        @Override
        public Thread newThread(Runnable r) {
            Thread t = new Thread(r, "Executor thread " + (counter++));
            return t;
        }
    };

  // the total duty to be divided in tasks is fixed (problem dependent). 
  // Increase ntasks will mean decrease the task time proportionally. 
  // 4 Is an arbitrary example.
  // This tasks will be executed thousands of times, inside a loop alternating 
  // with serial processing that needs their result and prepare the next ones.
    int ntasks = 4; 
    int nthreads = 2;
    int ncores = Runtime.getRuntime().availableProcessors();
    if (nthreads<ncores) ncores = nthreads;     

    Batch serial = new Batch(null);
    long serialTime = System.nanoTime();
    serial.run();
    serialTime = System.nanoTime() - serialTime;

    ExecutorService executor = Executors.newFixedThreadPool( nthreads, threadFactory );
    CountDownLatch countDown = new CountDownLatch(ntasks);

    ArrayList<Batch> batches = new ArrayList<Batch>();
    for (int i = 0; i < ntasks; i++) {
        batches.add(new Batch(countDown));
    }

    long start = System.nanoTime();
    for (Batch r : batches){
        executor.execute(r);
    }

    // wait for all threads to finish their task
    try {
        countDown.await();
    } catch (InterruptedException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
    long tmeasured = (System.nanoTime() - start);

    System.out.println("Task time= " + TASK_TIME + " ms");
    System.out.println("Number of tasks= " + ntasks);
    System.out.println("Number of threads= " + nthreads);
    System.out.println("Number of cores= " + ncores);
    System.out.println("Measured time= " + tmeasured);
    System.out.println("Theoretical serial time= " + TASK_TIME*1000000*ntasks);
    System.out.println("Theoretical parallel time= " + (TASK_TIME*1000000*ntasks)/ncores);
    System.out.println("Speedup= " + (serialTime*ntasks)/(double)tmeasured);

    executor.shutdown();
}
 }

Instead of doing the calculations, each batch just waits for some given time. The program calculates the speedup, that would allways be 2 in theory but can get less than 1 (actually a speed down) if the 'TASK_TIME' is small.

My calculations take at the top 1 ms and are commonly faster. For 1 ms I find a little speedup of around 30%, but in practice, with my program, I notice a speed down.

The structure of this code is very similar to my program, so if you could help me to optimise the thread handling I would be very grateful.

Kind regards.

Below, the original question:

Hi.

I would like to use multithreading on my program, since it could increase its efficiency considerably, I believe. Most of its running time is due to independent calculations.

My program has thousands of independent calculations (several linear systems to solve), but they just happen at the same time by minor groups of dozens or so. Each of this groups would take some miliseconds to run. After one of these groups of calculations, the program has to run sequentially for a little while and then I have to solve the linear systems again.

Actually, it can be seen as these independent linear systems to solve are inside a loop that iterates thousands of times, alternating with sequential calculations that depends on the previous results. My idea to speed up the program is to compute these independent calculations in parallel threads, by dividing each group into (the number of processors I have available) batches of independent calculation. So, in principle, there isn't queuing at all.

I tried using the FixedThreadPool and CachedThreadPool and it got even slower than serial processing. It seems to takes too much time creating new Treads each time I need to solve the batches.

Is there a better way to handle this problem? These pools I've used seem to be proper for cases when each thread takes more time instead of thousands of smaller threads...

Thanks!
Best Regards!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

淡笑忘祈一世凡恋 2024-10-24 19:08:57

线程池不会一遍又一遍地创建新线程。这就是为什么它们是游泳池。

您使用了多少个线程以及有多少个 CPU/核心？系统负载是什么样的（通常，当您串行执行它们时，以及当您使用池执行时）？是否涉及同步或任何类型的锁定？

并行执行的算法是否与串行执行的算法完全相同（您的描述似乎表明串行正在重用上一次迭代的一些结果）。

回复收藏 0 原文

一袭白衣梦中忆 2024-10-24 19:08:57

从我读到的内容来看：“数千个独立计算...同时发生...需要几毫秒才能运行”，在我看来，您的问题非常适合 GPU 编程。

我认为它回答了你的问题。 GPU 编程变得越来越流行。有针对 CUDA 和 Java 的绑定。 OpenCL。如果您可以使用它，我建议您使用它。

回复收藏 0 原文

尘世孤行 2024-10-24 19:08:57

我不确定您如何执行计算，但如果您将它们分成小组，那么您的应用程序可能适合生产者/消费者模式。

此外，您可能有兴趣使用阻塞队列。计算使用者将阻塞，直到队列中有内容并且阻塞发生在 take() 调用上。

private static class Batch implements Runnable {
    CountDownLatch countDown;
    public Batch(CountDownLatch countDown) {
        this.countDown = countDown;
    }

    CountDownLatch getLatch(){
        return countDown;
    }

    @Override
    public void run() {         
        long t0 =System.nanoTime();
        long t = 0;
        while(t<TASK_TIME*1e6){ t = System.nanoTime() - t0; }

        if(countDown!=null) countDown.countDown();
    }
}

class CalcProducer implements Runnable {
    private final BlockingQueue queue;
    CalcProducer(BlockingQueue q) { queue = q; }
    public void run() {
        try {
            while(true) { 
                CountDownLatch latch = new CountDownLatch(ntasks);
                for(int i = 0; i < ntasks; i++) {
                    queue.put(produce(latch)); 
                }
                // don't need to wait for the latch, only consumers wait
            }
        } catch (InterruptedException ex) { ... handle ...}
    }

    CalcGroup produce(CountDownLatch latch) {
        return new Batch(latch);
    }
}

class CalcConsumer implements Runnable {
    private final BlockingQueue queue;

    CalcConsumer(BlockingQueue q) { queue = q; }

    public void run() {
        try {
            while(true) { consume(queue.take()); }
        } catch (InterruptedException ex) { ... handle ...}
    }

    void consume(Batch batch) { 
        batch.Run();
        batch.getLatch().await();
    }
}

class Setup {
    void main() {
        BlockingQueue<Batch> q = new LinkedBlockingQueue<Batch>();
        int numConsumers = 4;

        CalcProducer p = new CalcProducer(q);
        Thread producerThread = new Thread(p);
        producerThread.start();

        Thread[] consumerThreads = new Thread[numConsumers];

        for(int i = 0; i < numConsumers; i++)
        {
            consumerThreads[i] = new Thread(new CalcConsumer(q));
            consumerThreads[i].start();
        }
    }
}

抱歉，如果有任何语法错误，我一直在咀嚼 C# 代码，有时我忘记了正确的 java 语法，但总体思路是存在的。

I'm not sure how you perform the calculations, but if you're breaking them up into small groups, then your application might be ripe for the Producer/Consumer pattern.

Additionally, you might be interested in using a BlockingQueue. The calculation consumers will block until there is something in the queue and the block occurs on the take() call.

private static class Batch implements Runnable {
    CountDownLatch countDown;
    public Batch(CountDownLatch countDown) {
        this.countDown = countDown;
    }

    CountDownLatch getLatch(){
        return countDown;
    }

    @Override
    public void run() {         
        long t0 =System.nanoTime();
        long t = 0;
        while(t<TASK_TIME*1e6){ t = System.nanoTime() - t0; }

        if(countDown!=null) countDown.countDown();
    }
}

class CalcProducer implements Runnable {
    private final BlockingQueue queue;
    CalcProducer(BlockingQueue q) { queue = q; }
    public void run() {
        try {
            while(true) { 
                CountDownLatch latch = new CountDownLatch(ntasks);
                for(int i = 0; i < ntasks; i++) {
                    queue.put(produce(latch)); 
                }
                // don't need to wait for the latch, only consumers wait
            }
        } catch (InterruptedException ex) { ... handle ...}
    }

    CalcGroup produce(CountDownLatch latch) {
        return new Batch(latch);
    }
}

class CalcConsumer implements Runnable {
    private final BlockingQueue queue;

    CalcConsumer(BlockingQueue q) { queue = q; }

    public void run() {
        try {
            while(true) { consume(queue.take()); }
        } catch (InterruptedException ex) { ... handle ...}
    }

    void consume(Batch batch) { 
        batch.Run();
        batch.getLatch().await();
    }
}

class Setup {
    void main() {
        BlockingQueue<Batch> q = new LinkedBlockingQueue<Batch>();
        int numConsumers = 4;

        CalcProducer p = new CalcProducer(q);
        Thread producerThread = new Thread(p);
        producerThread.start();

        Thread[] consumerThreads = new Thread[numConsumers];

        for(int i = 0; i < numConsumers; i++)
        {
            consumerThreads[i] = new Thread(new CalcConsumer(q));
            consumerThreads[i].start();
        }
    }
}

Sorry if there are any syntax errors, I've been chomping away at C# code and sometimes I forget the proper java syntax, but the general idea is there.

回复收藏 0 原文

客…行舟 2024-10-24 19:08:57

如果您遇到的问题无法扩展到多核，则需要更改程序，或者您遇到的问题并不像您想象的那么并行。我怀疑您有其他类型的错误，但根据所提供的信息无法确定。

此测试代码可能会有所帮助。

Time per million tasks 765 ms

代码

ExecutorService es = Executors.newFixedThreadPool(4);
Runnable task = new Runnable() {
    @Override
    public void run() {
        // do nothing.
    }
};
long start = System.nanoTime();
for(int i=0;i<1000*1000;i++) {
    es.submit(task);
}
es.shutdown();
es.awaitTermination(10, TimeUnit.SECONDS);
long time = System.nanoTime() - start;
System.out.println("Time per million tasks "+time/1000/1000+" ms");

编辑：假设您有一个连续执行此操作的循环。

for(int i=0;i<1000*1000;i++)
    doWork(i);

您可能会认为更改为这样的循环会更快，但问题是开销可能大于收益。

for(int i=0;i<1000*1000;i++) {
    final int i2 = i;
    ex.execute(new Runnable() {
        public void run() {
            doWork(i2);
        }
    }
}

因此，您需要创建一批工作（每个线程至少一个），以便有足够的任务来保持所有线程忙碌，但任务又不能太多，以免线程花费时间在开销上。

final int batchSize = 10*1000;
for(int i=0;i<1000*1000;i+=batchSize) {
    final int i2 = i;
    ex.execute(new Runnable() {
        public void run() {
            for(int i3=i2;i3<i2+batchSize;i3++)
               doWork(i3);
        }
    }
}

EDIT2：运行在线程之间复制数据的测试。

for (int i = 0; i < 20; i++) {
    ExecutorService es = Executors.newFixedThreadPool(1);
    final double[] d = new double[4 * 1024];
    Arrays.fill(d, 1);
    final double[] d2 = new double[4 * 1024];
    es.submit(new Runnable() {
        @Override
        public void run() {
            // nothing.
        }
    }).get();
    long start = System.nanoTime();
    es.submit(new Runnable() {
        @Override
        public void run() {
            synchronized (d) {
                System.arraycopy(d, 0, d2, 0, d.length);
            }
        }
    });
    es.shutdown();
    es.awaitTermination(10, TimeUnit.SECONDS);
    // get a the values in d2.
    for (double x : d2) ;
    long time = System.nanoTime() - start;
    System.out.printf("Time to pass %,d doubles to another thread and back was %,d ns.%n", d.length, time);
}

开始时很糟糕，但升温到约 50 秒。

Time to pass 4,096 doubles to another thread and back was 1,098,045 ns.
Time to pass 4,096 doubles to another thread and back was 171,949 ns.
 ... deleted ...
Time to pass 4,096 doubles to another thread and back was 50,566 ns.
Time to pass 4,096 doubles to another thread and back was 49,937 ns.

If you have a problem which does not scale to multiple cores, you need to change your program or you have a problem which is not as parallel as you think. I suspect you have some other type of bug, but cannot say based on the information given.

This test code might help.

Time per million tasks 765 ms

code

ExecutorService es = Executors.newFixedThreadPool(4);
Runnable task = new Runnable() {
    @Override
    public void run() {
        // do nothing.
    }
};
long start = System.nanoTime();
for(int i=0;i<1000*1000;i++) {
    es.submit(task);
}
es.shutdown();
es.awaitTermination(10, TimeUnit.SECONDS);
long time = System.nanoTime() - start;
System.out.println("Time per million tasks "+time/1000/1000+" ms");

EDIT: Say you have a loop which serially does this.

for(int i=0;i<1000*1000;i++)
    doWork(i);

You might assume that changing to loop like this would be faster, but the problem is that the overhead could be greater than the gain.

for(int i=0;i<1000*1000;i++) {
    final int i2 = i;
    ex.execute(new Runnable() {
        public void run() {
            doWork(i2);
        }
    }
}

So you need to create batches of work (at least one per thread) so there are enough tasks to keep all the threads busy, but not so many tasks that your threads are spending time in overhead.

final int batchSize = 10*1000;
for(int i=0;i<1000*1000;i+=batchSize) {
    final int i2 = i;
    ex.execute(new Runnable() {
        public void run() {
            for(int i3=i2;i3<i2+batchSize;i3++)
               doWork(i3);
        }
    }
}

EDIT2: RUnning atest which copied data between threads.

for (int i = 0; i < 20; i++) {
    ExecutorService es = Executors.newFixedThreadPool(1);
    final double[] d = new double[4 * 1024];
    Arrays.fill(d, 1);
    final double[] d2 = new double[4 * 1024];
    es.submit(new Runnable() {
        @Override
        public void run() {
            // nothing.
        }
    }).get();
    long start = System.nanoTime();
    es.submit(new Runnable() {
        @Override
        public void run() {
            synchronized (d) {
                System.arraycopy(d, 0, d2, 0, d.length);
            }
        }
    });
    es.shutdown();
    es.awaitTermination(10, TimeUnit.SECONDS);
    // get a the values in d2.
    for (double x : d2) ;
    long time = System.nanoTime() - start;
    System.out.printf("Time to pass %,d doubles to another thread and back was %,d ns.%n", d.length, time);
}

starts badly but warms up to ~50 us.

Time to pass 4,096 doubles to another thread and back was 1,098,045 ns.
Time to pass 4,096 doubles to another thread and back was 171,949 ns.
 ... deleted ...
Time to pass 4,096 doubles to another thread and back was 50,566 ns.
Time to pass 4,096 doubles to another thread and back was 49,937 ns.

回复收藏 0 原文

腹黑女流氓 2024-10-24 19:08:57

嗯， CachedThreadPool 似乎是专门为您的情况创建的。如果您足够快地重用线程，它不会重新创建线程，并且如果您在使用新线程之前花了整整一分钟，则线程创建的开销相对可以忽略不计。

但是，除非您也可以并行访问数据，否则您不能指望并行执行能够加快计算速度。如果您使用大量的锁定、许多同步方法等，您将花费更多的开销而不是并行处理的收益。检查您的数据是否可以有效地并行处理，并且代码中没有隐藏不明显的同步。

此外，如果数据完全适合缓存，CPU 就能有效地处理数据。如果每个线程的数据集大于缓存的一半，两个线程将竞争缓存并发出许多 RAM 读取，而一个线程如果仅使用一个核心，可能会执行得更好，因为它避免了在其执行的紧密循环中的 RAM 读取。也检查一下这个。

回复收藏 0 原文

第七度阳光i 2024-10-24 19:08:57

这是我的想法的伪概述

class WorkerThread extends Thread {

    Queue<Calculation> calcs;
    MainCalculator mainCalc;

    public void run() {
        while(true) {
            while(calcs.isEmpty()) sleep(500); // busy waiting? Context switching probably won't be so bad.
            Calculation calc = calcs.pop(); // is it pop to get and remove? you'll have to look
            CalculationResult result = calc.calc();
            mainCalc.returnResultFor(calc,result);      
        }
    }


}

如果您要调用外部程序，则另一种选择。不要将它们放入一次执行一个的循环中，否则它们将不会并行运行。您可以将它们放入循环中，一次处理一个，但不能一次执行一个。

Process calc1 = Runtime.getRuntime.exec("myCalc paramA1 paramA2 paramA3");
Process calc2 = Runtime.getRuntime.exec("myCalc paramB1 paramB2 paramB3");
Process calc3 = Runtime.getRuntime.exec("myCalc paramC1 paramC2 paramC3");
Process calc4 = Runtime.getRuntime.exec("myCalc paramD1 paramD2 paramD3");

calc1.waitFor();
calc2.waitFor();
calc3.waitFor();
calc4.waitFor();

InputStream is1 = calc1.getInputStream();
InputStreamReader isr1 = new InputStreamReader(is1);
BufferedReader br1 = new BufferedReader(isr1);
String resultStr1 = br1.nextLine();

InputStream is2 = calc2.getInputStream();
InputStreamReader isr2 = new InputStreamReader(is2);
BufferedReader br2 = new BufferedReader(isr2);
String resultStr2 = br2.nextLine();

InputStream is3 = calc3.getInputStream();
InputStreamReader isr3 = new InputStreamReader(is3);
BufferedReader br3 = new BufferedReader(isr3);
String resultStr3 = br3.nextLine();

InputStream is4 = calc4.getInputStream();
InputStreamReader isr4 = new InputStreamReader(is4);
BufferedReader br4 = new BufferedReader(isr4);
String resultStr4 = br4.nextLine();

Here's a psuedo outline of what I'm thinking

class WorkerThread extends Thread {

    Queue<Calculation> calcs;
    MainCalculator mainCalc;

    public void run() {
        while(true) {
            while(calcs.isEmpty()) sleep(500); // busy waiting? Context switching probably won't be so bad.
            Calculation calc = calcs.pop(); // is it pop to get and remove? you'll have to look
            CalculationResult result = calc.calc();
            mainCalc.returnResultFor(calc,result);      
        }
    }


}

Another option, if you're calling external programs. Don't put them in a loop that does them one at a time or they won't run in parallel. You can put them in a loop that PROCESSES them one at a time, but not that execs them one at a time.

Process calc1 = Runtime.getRuntime.exec("myCalc paramA1 paramA2 paramA3");
Process calc2 = Runtime.getRuntime.exec("myCalc paramB1 paramB2 paramB3");
Process calc3 = Runtime.getRuntime.exec("myCalc paramC1 paramC2 paramC3");
Process calc4 = Runtime.getRuntime.exec("myCalc paramD1 paramD2 paramD3");

calc1.waitFor();
calc2.waitFor();
calc3.waitFor();
calc4.waitFor();

InputStream is1 = calc1.getInputStream();
InputStreamReader isr1 = new InputStreamReader(is1);
BufferedReader br1 = new BufferedReader(isr1);
String resultStr1 = br1.nextLine();

InputStream is2 = calc2.getInputStream();
InputStreamReader isr2 = new InputStreamReader(is2);
BufferedReader br2 = new BufferedReader(isr2);
String resultStr2 = br2.nextLine();

InputStream is3 = calc3.getInputStream();
InputStreamReader isr3 = new InputStreamReader(is3);
BufferedReader br3 = new BufferedReader(isr3);
String resultStr3 = br3.nextLine();

InputStream is4 = calc4.getInputStream();
InputStreamReader isr4 = new InputStreamReader(is4);
BufferedReader br4 = new BufferedReader(isr4);
String resultStr4 = br4.nextLine();

回复收藏 0 原文

~没有更多了~