向进程中的所有线程发出信号

发布于 2024-10-02 18:30:29 字数 1512 浏览 3 评论 0原文

在不保留当前线程列表的情况下，我试图查看实时信号是否传递到进程中的所有线程。我的想法是这样处理：

最初安装了信号处理程序，并且信号在所有线程中都被解除阻塞。
当一个线程想要发送“广播”信号时，它会获取一个互斥体并设置一个表示广播正在进行的全局标志。
发送方自行阻塞信号（使用 pthread_sigmask），并进入循环，重复调用 raise(sig) 直到 sigpending 表明该信号已发送待处理（没有剩余的线程被信号阻塞）。
当线程接收到信号时，它们会对其采取行动，但会在信号处理程序中等待广播标志被清除，以便信号保持屏蔽状态。
发送方通过解除信号阻塞来完成循环（以便获得自己的传递）。
当发送者处理自己的信号时，它会清除全局标志，以便所有其他线程可以继续其业务。

我遇到的问题是 pthread_sigmask 没有得到尊重。如果我在 strace 下运行测试程序，一切都会正常（可能是由于不同的调度时间），但是一旦我单独运行它，发送者就会收到自己的信号（尽管已经阻止了它......？）并且其他线程都没有被调度。

有什么想法可能是错的吗？我尝试使用 sigqueue 而不是 raise，探测信号掩码，在各处添加 sleep 以确保线程耐心等待他们的信号等等，现在我不知所措。

编辑：感谢psmears的回答，我想我明白了这个问题。这是一个潜在的解决方案。反馈会很棒：

在任何给定时间，我都可以知道正在运行的线程数量，并且如果需要，我可以在广播信号期间阻止所有线程创建和退出。
想要执行广播信号的线程获取锁（因此其他线程不能同时执行该操作），然后为自己阻塞该信号，并向进程发送 num_threads 信号，然后解除阻塞本身的信号。
信号处理程序以原子方式递增计数器，并且信号处理程序的每个实例都会等待，直到该计数器等于 num_threads 才返回。
进行广播的线程也会等待计数器达到num_threads，然后释放锁。

一个可能的问题是，如果内核内存不足，信号将不会排队（Linux 似乎有这个问题）。您是否知道 sigqueue 是否会在无法对信号进行排队时可靠地通知调用者（在这种情况下，我将循环直到成功），或者信号可能会默默丢失吗？

编辑2：现在似乎可以正常工作了。根据sigqueue的文档，如果未能对信号进行排队，它会返回EAGAIN。但为了稳健性，我决定继续调用 sigqueue 直到 num_threads-1 信号处理程序运行，并在发送后交错调用 sched_yield num_threads-1 信号。

在线程创建时存在竞争条件，计算新线程，但我通过奇怪（滥用）使用读写锁解决了这个问题。线程创建是“读”，广播信号是“写”，因此除非有线程尝试广播，否则它不会在线程创建时产生任何争用。

原文

Without keeping a list of current threads, I'm trying to see that a realtime signal gets delivered to all threads in my process. My idea is to go about it like this:

Initially the signal handler is installed and the signal is unblocked in all threads.
When one thread wants to send the 'broadcast' signal, it acquires a mutex and sets a global flag that the broadcast is taking place.
The sender blocks the signal (using pthread_sigmask) for itself, and enters a loop repeatedly calling raise(sig) until sigpending indicates that the signal is pending (there were no threads remaining with the signal blocked).
As threads receive the signal, they act on it but wait in the signal handler for the broadcast flag to be cleared, so that the signal will remain masked.
The sender finishes the loop by unblocking the signal (in order to get its own delivery).
When the sender handles its own signal, it clears the global flag so that all the other threads can continue with their business.

The problem I'm running into is that pthread_sigmask is not being respected. Everything works right if I run the test program under strace (presumably due to different scheduling timing), but as soon as I run it alone, the sender receives its own signal (despite having blocked it..?) and none of the other threads ever get scheduled.

Any ideas what might be wrong? I've tried using sigqueue instead of raise, probing the signal mask, adding sleep all over the place to make sure the threads are patiently waiting for their signals, etc. and now I'm at a loss.

Edit: Thanks to psmears' answer, I think I understand the problem. Here's a potential solution. Feedback would be great:

At any given time, I can know the number of threads running, and I can prevent all thread creation and exiting during the broadcast signal if I need to.
The thread that wants to do the broadcast signal acquires a lock (so no other thread can do it at the same time), then blocks the signal for itself, and sends num_threads signals to the process, then unblocks the signal for itself.
The signal handler atomically increments a counter, and each instance of the signal handler waits until that counter is equal to num_threads to return.
The thread that did the broadcast also waits for the counter to reach num_threads, then it releases the lock.

One possible concern is that the signals will not get queued if the kernel is out of memory (Linux seems to have that issue). Do you know if sigqueue reliably informs the caller when it's unable to queue the signal (in which case I would loop until it succeeds), or could signals possibly be silently lost?

Edit 2: It seems to be working now. According to the documentation for sigqueue, it returns EAGAIN if it fails to queue the signal. But for robustness, I decided to just keep calling sigqueue until num_threads-1 signal handlers are running, interleaving calls to sched_yield after I've sent num_threads-1 signals.

There was a race condition at thread creation time, counting new threads, but I solved it with a strange (ab)use of read-write locks. Thread creation is "reading" and the broadcast signal is "writing", so unless there's a thread trying to broadcast, it doesn't create any contention at thread-creation.

分享到QQ

分享到微博