C++ 的多线程应用程序在分配（取消）内存时互相阻塞

发布于 2025-01-14 23:14:56 字数 1724 浏览 1 评论 0原文

World，

我尝试使用多个线程运行 C++ 应用程序（在 VS 中编译为 .exe），并为此使用 QThread 或 omp 并行化。在使用 umfpack 求解从这些矩阵构建的方程系统之前，每个线程都会执行多次内存分配/释放，以执行大型矩阵计算。现在，当我使用太多线程时，我会降低性能，因为执行此操作时线程会相互阻塞。我已经读到，内存（取消）分配一次只能用于一个线程（如互斥条件）。

我已经尝试过：

尽可能减少大型重新分配我可以
使用不同的并行化方法（Qt 与 omp）
随机更改保留和提交的堆栈/堆大小
使 umfpack 数组成为线程私有

在我的设置中，我可以使用〜4个线程（在性能下降之前，每个线程使用约 1.5 GB RAM）。有趣的是——但我还无法理解这一点——只有在几个线程完成并且新线程接管之后，性能才会降低。另请注意，线程之间不相互依赖，没有其他阻塞条件，并且每个线程运行的时间大致相同（约 2 分钟）。

是否有一种“简单的方法” - 例如以某种方式设置堆/堆栈 - 来解决这个问题？

以下是一些代码片段：

// Loop to start threads

forever
{
    if (sem.tryAcquire(1)) {
        QThread *t = new QThread();
        connect(t, SIGNAL(started()), aktBer, SLOT(doWork()));
        connect(aktBer, SIGNAL(workFinished()), t, SLOT(quit()));
        connect(t, SIGNAL(finished()), t, SLOT(deleteLater()));
        aktBer->moveToThread(t);
        t->start();
        sleep(1);
    }
    else {
        //... wait for threads to end before starting new ones
        //... eventually break
    }
    qApp->processEvents();
}

void doWork() {
    // Do initial matrix stuff...
    
    // Initializing array pointers for umfpack-lib
        static int *Ap=0;
        static int *Ai=0;
        static int *Ax=0;
        static int *x=0;
        static int *b=0;

    // Private static Variablen per thread
    #pragma omp threadprivate(Ap, Ai, Acol, Arow)

    // Solving -> this is the part where the threads block each other, note, that 
              there are other functions with matrix operations, which also (de-)/allocate a 
              lot
    status = umfpack_di_solve (UMFPACK_A, Ap,Ai,Ax,x,b, /*...*/);
    
    emit(workFinished());
}

原文

World,

I try to run an C++ application (compiled in VS as .exe) with multiple threads and use QThread or omp-parallelization for this. Each thread does multiple allocations/deallocations of memory to perfrom large matrix computations before solving equation systems built from these matrices with umfpack. Now, when I use too many threads, I loose performance because the threads are blocking each other while doing this. I already read that memory (de)-allocation is possible only for one thread at a time (like a mutex condition).

What I have tried already:

deacrease large reallocations as best I could
use different parallelization methods (Qt vs. omp)
randomly changing the reserved and committed stack/heap size
making umfpack arrays threadprivate

In my setup, I am able to use ~4 threads (each thread uses ~1.5 GB RAM) before performance decreases. Interestingly - but something I couldn't wrap my head around yet - the performace is reduced only after a couple of threads finished and new ones are taking over. Note also that threads are not depended from each other, there are no other blocking conditions, and each thread runs roughly the same amount of time (~2min).

Is there an "easy way" - e.g. setting up heap/stack in a certain way - to solve this issue?

Here are some code snippets:

// Loop to start threads

forever
{
    if (sem.tryAcquire(1)) {
        QThread *t = new QThread();
        connect(t, SIGNAL(started()), aktBer, SLOT(doWork()));
        connect(aktBer, SIGNAL(workFinished()), t, SLOT(quit()));
        connect(t, SIGNAL(finished()), t, SLOT(deleteLater()));
        aktBer->moveToThread(t);
        t->start();
        sleep(1);
    }
    else {
        //... wait for threads to end before starting new ones
        //... eventually break
    }
    qApp->processEvents();
}

void doWork() {
    // Do initial matrix stuff...
    
    // Initializing array pointers for umfpack-lib
        static int *Ap=0;
        static int *Ai=0;
        static int *Ax=0;
        static int *x=0;
        static int *b=0;

    // Private static Variablen per thread
    #pragma omp threadprivate(Ap, Ai, Acol, Arow)

    // Solving -> this is the part where the threads block each other, note, that 
              there are other functions with matrix operations, which also (de-)/allocate a 
              lot
    status = umfpack_di_solve (UMFPACK_A, Ap,Ai,Ax,x,b, /*...*/);
    
    emit(workFinished());
}

分享到QQ

分享到微博