当前位置：文江博客话题详情

OpenMP 并行程序中的信号处理

发布于 2024-12-15 12:34:27 字数 367 浏览 7 评论 0原文

我有一个使用 POSIX 计时器的程序 (timer_create())。本质上，程序设置一个计时器并开始执行一些冗长（可能是无限的）计算。当计时器到期并且调用信号处理程序时，处理程序会打印已计算出的最佳结果并退出程序。

我考虑使用 OpenMP 并行计算，因为它应该加快计算速度。

在 pthreads 中，有一些特殊的函数，例如为我的线程设置信号掩码等。 OpenMP 是否提供此类控制，或者我是否必须接受信号可以传递到 OpenMP 创建的任何线程的事实？

另外，如果我当前位于代码的并行部分并且调用了我的处理程序，它仍然可以安全地终止应用程序 (exit(0);) 并执行诸如锁定 OpenMP 锁之类的操作吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

诗笺 2024-12-22 12:34:27

这有点晚了，但希望这个示例代码能够帮助处于类似位置的其他人！

正如 osgx 提到的，OpenMP 在信号问题上保持沉默，但由于 OpenMP 通常在 POSIX 系统上使用 pthread 实现，因此我们可以使用 pthread 信号方法。

对于使用 OpenMP 的繁重计算，可能只有少数位置可以真正安全地停止计算。因此，对于您想要获得过早结果的情况，我们可以使用同步信号处理来安全地做到这一点。另一个优点是，这让我们可以接受来自特定 OpenMP 线程的信号（在下面的示例代码中，我们选择主线程）。在捕获信号时，我们只需设置一个标志来指示计算应该停止。然后，每个线程应确保在方便时定期检查此标志，然后结束其分担的工作负载。

通过使用这种同步方法，我们允许计算优雅地退出，并且对算法的更改非常小。另一方面，所需的信号处理程序方法可能并不合适，因为可能很难将每个线程的当前工作状态整理成一致的结果。不过，同步方法的一个缺点是计算可能需要相当长的时间才能停止。

信号检查装置由三部分组成：

阻断相关信号。这应该在 omp 并行区域之外完成，以便每个 OpenMP 线程 (pthread) 都将继承相同的阻塞行为。
从主线程轮询所需的信号。可以使用 sigtimedwait 来实现这一点，但某些系统（例如 MacOS）不支持这一点。更方便的是，我们可以使用 sigpending 来轮询任何被阻止的信号，然后在使用 sigwait 同步接受它们之前仔细检查被阻止的信号是否是我们所期望的（这应该立即返回此处，除非程序的其他部分正在创建竞争条件）。我们终于设置了相关的flag。
我们应该在最后删除信号掩码（可以选择对信号进行最后一次检查）。

有一些重要的性能考虑因素和警告：

假设每个内部循环迭代都很小，执行信号检查系统调用的成本很高。在示例代码中，我们仅每 1000 万次（每个线程）迭代检查一次信号，这可能相当于几秒钟的挂机时间。
omp for 循环无法打破¹，因此您必须旋转剩余的迭代或使用更基本的 OpenMP 原语重写循环。常规循环（例如外部并行循环的内部循环）可以很好地分解。
如果只有主线程可以检查信号，那么这可能会在程序中产生一个问题，即主线程先于其他线程完成。在这种情况下，这些其他线程将是不可中断的。为了解决这个问题，您可以在每个线程完成其工作负载时“传递信号检查的接力棒”，或者可以强制主线程继续运行和轮询，直到所有其他线程完成²。
在某些架构（例如 NUMA HPC）上，检查“全局”信号标志的时间可能非常昂贵，因此在决定何时何地检查或操作该标志时要小心。例如，对于自旋循环部分，人们可能希望在标志变为真时在本地缓存该标志。

下面是示例代码：

#include <signal.h>

void calculate() {
    _Bool signalled = false;
    int sigcaught;
    size_t steps_tot = 0;

    // block signals of interest (SIGINT and SIGTERM here)
    sigset_t oldmask, newmask, sigpend;
    sigemptyset(&newmask);
    sigaddset(&newmask, SIGINT);
    sigaddset(&newmask, SIGTERM);
    sigprocmask(SIG_BLOCK, &newmask, &oldmask);

    #pragma omp parallel
    {
        int rank = omp_get_thread_num();
        size_t steps = 0;

        // keep improving result forever, unless signalled
        while (!signalled) {
            #pragma omp for
            for (size_t i = 0; i < 10000; i++) {
                // we can't break from an omp for loop...
                // instead, spin away the rest of the iterations
                if (signalled) continue;

                for (size_t j = 0; j < 1000000; j++, steps++) {
                    // ***
                    // heavy computation...
                    // ***

                    // check for signal every 10 million steps
                    if (steps % 10000000 == 0) {

                        // master thread; poll for signal
                        if (rank == 0) {
                            sigpending(&sigpend);
                            if (sigismember(&sigpend, SIGINT) || sigismember(&sigpend, SIGTERM)) {
                                if (sigwait(&newmask, &sigcaught) == 0) {
                                    printf("Interrupted by %d...\n", sigcaught);
                                    signalled = true;
                                }
                            }
                        }

                        // all threads; stop computing
                        if (signalled) break;
                    }
                }
            }
        }

        #pragma omp atomic
        steps_tot += steps;
    }

    printf("The result is ... after %zu steps\n", steps_tot);

    // optional cleanup
    sigprocmask(SIG_SETMASK, &oldmask, NULL);
}

如果使用 C++，您可能会发现以下类很有用...

#include <signal.h>
#include <vector>

class Unterminable {
    sigset_t oldmask, newmask;
    std::vector<int> signals;

public:
    Unterminable(std::vector<int> signals) : signals(signals) {
        sigemptyset(&newmask);
        for (int signal : signals)
            sigaddset(&newmask, signal);
        sigprocmask(SIG_BLOCK, &newmask, &oldmask);
    }

    Unterminable() : Unterminable({SIGINT, SIGTERM}) {}

    // this can be made more efficient by using sigandset,
    // but sigandset is not particularly portable
    int poll() {
        sigset_t sigpend;
        sigpending(&sigpend);
        for (int signal : signals) {
            if (sigismember(&sigpend, signal)) {
                int sigret;
                if (sigwait(&newmask, &sigret) == 0)
                    return sigret;
                break;
            }
        }
        return -1;
    }

    ~Unterminable() {
        sigprocmask(SIG_SETMASK, &oldmask, NULL);
    }
};

然后，calculate() 的阻塞部分可以替换为 Unterminable unterm();，以及 if ((sigcaught = unterm.poll()) > 0) {...} 的信号检查部分。当 unterm 超出范围时，会自动执行解除信号阻塞操作。

^{¹ 这并不完全正确。 OpenMP 对以取消点<的形式执行“并行中断”提供有限支持/a>.如果您选择在并行循环中使用取消点，请确保您确切知道隐式取消点的位置，以便确保计算数据在取消时保持一致。}

^{² ^{^2nowait。}}

This is a bit late, but hopefully this example code will help others in a similar position!

As osgx mentioned, OpenMP is silent on the issue of signals, but as OpenMP is often implemented with pthreads on POSIX systems we can use a pthread signal approach.

For heavy computations using OpenMP, it is likely that there are only a few locations where computation can actually be safely halted. Therefore, for the case where you want to obtain premature results we can use synchronous signal handling to safely do this. An additional advantage is that this lets us accept the signal from a specific OpenMP thread (in the example code below, we choose the master thread). On catching the signal, we simply set a flag indicating that computation should stop. Each thread should then make sure to periodically check this flag when convenient, and then wrap up its share of the workload.

By using this synchronous approach, we allow computation to exit gracefully and with very minimal change to the algorithm. On the other hand, a signal handler approach as desired may not be appropriate, as it would likely be difficult to collate the current working states of each thread into a coherent result. One disadvantage of the synchronous approach though is that computation can take a noticeable amount of time to come to a stop.

The signal checking apparatus consists of three parts:

Blocking the relevant signals. This should be done outside of the omp parallel region so that each OpenMP thread (pthread) will inherit this same blocking behaviour.
Polling for the desired signals from the master thread. One can use sigtimedwait for this, but some systems (e.g. MacOS) don't support this. More portably, we can use sigpending to poll for any blocked signals and then double check that the blocked signals are what we're expecting before accepting them synchronously using sigwait (which should return immediately here, unless some other part of the program is creating a race condition). We finally set the relevant flag.
We should remove our signal mask at the end (optionally with one final check for signals).

There are some important performance considerations and caveats:

Assuming that each inner loop iteration is small, executing the signal checking syscalls is expensive. In the example code, we check for signals only every 10 million (per-thread) iterations, corresponding to perhaps a couple seconds of wall time.
omp for loops cannot be broken out of¹, and so you must either spin for the remainder of the iterations or rewrite the loop using more basic OpenMP primitives. Regular loops (such as inner loops of an outer parallel loop) can be broken out of just fine.
If only the master thread can check for signals, then this may create an issue in programs where the master thread finishes well before the other threads. In this scenario, these other threads will be uninterruptible. To address this, you could 'pass the baton' of signal checking as each thread completes its workload, or the master thread could be forced to keep running and polling until all other threads complete².
On some architectures such as NUMA HPCs, the time to check the 'global' signalled flag may be quite expensive, so take care when deciding when and where to check or manipulate the flag. For the spin loop section, for example, one may wish to locally cache the flag when it becomes true.

Here is the example code:

#include <signal.h>

void calculate() {
    _Bool signalled = false;
    int sigcaught;
    size_t steps_tot = 0;

    // block signals of interest (SIGINT and SIGTERM here)
    sigset_t oldmask, newmask, sigpend;
    sigemptyset(&newmask);
    sigaddset(&newmask, SIGINT);
    sigaddset(&newmask, SIGTERM);
    sigprocmask(SIG_BLOCK, &newmask, &oldmask);

    #pragma omp parallel
    {
        int rank = omp_get_thread_num();
        size_t steps = 0;

        // keep improving result forever, unless signalled
        while (!signalled) {
            #pragma omp for
            for (size_t i = 0; i < 10000; i++) {
                // we can't break from an omp for loop...
                // instead, spin away the rest of the iterations
                if (signalled) continue;

                for (size_t j = 0; j < 1000000; j++, steps++) {
                    // ***
                    // heavy computation...
                    // ***

                    // check for signal every 10 million steps
                    if (steps % 10000000 == 0) {

                        // master thread; poll for signal
                        if (rank == 0) {
                            sigpending(&sigpend);
                            if (sigismember(&sigpend, SIGINT) || sigismember(&sigpend, SIGTERM)) {
                                if (sigwait(&newmask, &sigcaught) == 0) {
                                    printf("Interrupted by %d...\n", sigcaught);
                                    signalled = true;
                                }
                            }
                        }

                        // all threads; stop computing
                        if (signalled) break;
                    }
                }
            }
        }

        #pragma omp atomic
        steps_tot += steps;
    }

    printf("The result is ... after %zu steps\n", steps_tot);

    // optional cleanup
    sigprocmask(SIG_SETMASK, &oldmask, NULL);
}

If using C++, you may find the following class useful...

#include <signal.h>
#include <vector>

class Unterminable {
    sigset_t oldmask, newmask;
    std::vector<int> signals;

public:
    Unterminable(std::vector<int> signals) : signals(signals) {
        sigemptyset(&newmask);
        for (int signal : signals)
            sigaddset(&newmask, signal);
        sigprocmask(SIG_BLOCK, &newmask, &oldmask);
    }

    Unterminable() : Unterminable({SIGINT, SIGTERM}) {}

    // this can be made more efficient by using sigandset,
    // but sigandset is not particularly portable
    int poll() {
        sigset_t sigpend;
        sigpending(&sigpend);
        for (int signal : signals) {
            if (sigismember(&sigpend, signal)) {
                int sigret;
                if (sigwait(&newmask, &sigret) == 0)
                    return sigret;
                break;
            }
        }
        return -1;
    }

    ~Unterminable() {
        sigprocmask(SIG_SETMASK, &oldmask, NULL);
    }
};

The blocking part of calculate() can then be replaced by Unterminable unterm();, and the signal checking part by if ((sigcaught = unterm.poll()) > 0) {...}. Unblocking the signals is automatically performed when unterm goes out of scope.

^{¹ This is not strictly true. OpenMP supports limited support for performing a 'parallel break' in the form of cancellation points. If you choose to use cancellation points in your parallel loops, make sure you know exactly where the implicit cancellation points are so that you ensure that your computation data will be coherent upon cancellation.}

^{² Personally, I keep a count of how many threads have completed the for loop and, if the master thread completes the loop without catching a signal, it keeps polling for signals until either it catches a signal or all threads complete the loop. To do this, make sure to mark the for loop nowait.}

回复收藏 0 原文

如果没有 2024-12-22 12:34:27

OpenMP 3.1 标准没有提及信号。

据我所知，Linux/UNIX 上每个流行的 OpenMP 实现都是基于 pthread 的，因此 OpenMP 线程是 pthread 的线程。并且适用 pthread 和信号的通用规则。

OpenMP是否提供这样的控制

没有任何特定控制；但你可以尝试使用pthread的控制。唯一的问题是知道使用了多少 OpenMP 线程以及在哪里放置控制语句。

信号可以传递到 OpenMP 创建的任何线程吗？

默认情况下，是的，它将被传递到任何线程。

我的处理程序被调用，

有关信号处理程序的通常规则仍然适用。信号处理程序中允许的函数列于 http://pubs.opengroup.org/ onlinepubs/009695399/functions/xsh_chap02_04.html （在页面末尾）

并且不允许 printf（write 是）。如果您知道在发出信号时 printf 未被任何线程使用（例如，并行区域中没有 printf），则可以使用 printf。

它还能安全地终止应用程序吗（exit(0);）

是的，可以：处理程序允许 abort() 和 _exit() 。

当任何线程退出或中止时，Linux/Unix将终止所有线程。

并执行诸如锁定 OpenMP 锁之类的操作？

你不应该这样做，但是如果你知道这个锁在信号处理程序运行时不会被锁定，你可以尝试这样做。

!!更新

有一个采用 OpenMP 信令的示例 http://www.cs.colostate .edu/~cs675/OpenMPvsThreads.pdf（“OpenMP 与 C/C++ 中的线程处理”）。简而言之：在处理程序中设置一个标志，并在每次第 N 次循环迭代时在每个线程中添加对此标志的检查。

使基于信号的异常机制适应并行区域
C/C++ 中出现的情况较多
与 Fortran 应用程序的应用程序是
该程序使用复杂的用户界面。
Genehunter 是一个简单的例子，用户
可能会中断一棵家谱的计算
按 Control-C 以便可以继续
临床数据库中的下一个家谱
疾病。提前终止的处理方式为
类似 C++ 的异常的串行版本
涉及信号处理程序、setjump 的机制，
和 longjump.OpenMP 不允许非结构化控制
流穿过并行构造边界。我们
修改了OpenMP中的异常处理
版本通过将中断处理程序更改为
轮询机制。捕获的线程
control-C 信号设置共享标志。所有线程
检查循环开始处的标志
调用例程 has_hit_interrupt( )
如果设置了则跳过迭代。当循环
结束后，主机检查标志并可以轻松地
执行长跳来完成
异常退出（参见图1。）

OpenMP 3.1 standard says nothing about signals.

As I know, every popular OpenMP implementation on Linux/UNIX is based on pthreads, so OpenMP thread is pthread's thread. And generic rules of pthreads and signals apply.

Does OpenMP provide such control

No any specific control; but you can try to use pthread's control. Only problem is to know how much OpenMP threads are used and where to place controlling statement.

the signal can be delivered to any of the threads OpenMP creates?

By default, yes, it will be delivered to any thread.

my handler is called,

Usual rules about signal handler still applies. Functions allowed in signal handler are listed at http://pubs.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_04.html (at the end of page)

And printf is not allowed (write is). You can use printf if you know that at the moment of signal printf is not used by any thread (e.g. you has no printf in parallel region).

can it still safely kill the application (exit(0);)

Yes it can: abort() and _exit() are allowed from handler.

Linux/Unix will terminate all threads when any thread does exit or abort.

and do things like locking OpenMP locks?

You should not, but if you know that this lock will be not locked at the time of signal handler run, you can try to do this.

!! UPDATE

There is an example of adopting signalling to OpenMP http://www.cs.colostate.edu/~cs675/OpenMPvsThreads.pdf ("OpenMP versus Threading in C/C++"). In short: set a flag in handler and add checks of this flag in every thread at every Nth loop iteration.

Adapting a signal based exception mechanism to a parallel region
Something that occurs more with C/C++
applications that with Fortran applications is that
the program uses a sophisticated user interface.
Genehunter is a simple example where the user
may interrupt the computation of one family tree
by pressing control-C so that it can go on to the
next family tree in a clinical database about the
disease. The premature termination is handled in
the serial version by a C++ like exception
mechanism involving a signal handler, setjump,
and longjump.OpenMP does not permit unstructured control
flow to cross a parallel construct boundary. We
modified the exception handling in the OpenMP
version by changing the interrupt handler into a
polling mechanism. The thread that catches the
control-C signal sets a shared flag. All threads
check the flag at the beginning of the loop by
calling the routine has_hit_interrupt( )
and skip the iteration if it is set. When the loop
ends, the master checks the flag and can easily
execute the longjump to complete the
exceptional exit (See Figure 1.)