向进程中的所有线程发出信号
在不保留当前线程列表的情况下,我试图查看实时信号是否传递到进程中的所有线程。我的想法是这样处理:
- 最初安装了信号处理程序,并且信号在所有线程中都被解除阻塞。
- 当一个线程想要发送“广播”信号时,它会获取一个互斥体并设置一个表示广播正在进行的全局标志。
- 发送方自行阻塞信号(使用 pthread_sigmask),并进入循环,重复调用 raise(sig) 直到 sigpending 表明该信号已发送待处理(没有剩余的线程被信号阻塞)。
- 当线程接收到信号时,它们会对其采取行动,但会在信号处理程序中等待广播标志被清除,以便信号保持屏蔽状态。
- 发送方通过解除信号阻塞来完成循环(以便获得自己的传递)。
- 当发送者处理自己的信号时,它会清除全局标志,以便所有其他线程可以继续其业务。
我遇到的问题是 pthread_sigmask
没有得到尊重。如果我在 strace 下运行测试程序,一切都会正常(可能是由于不同的调度时间),但是一旦我单独运行它,发送者就会收到自己的信号(尽管已经阻止了它......? )并且其他线程都没有被调度。
有什么想法可能是错的吗?我尝试使用 sigqueue
而不是 raise
,探测信号掩码,在各处添加 sleep
以确保线程耐心等待他们的信号等等,现在我不知所措。
编辑:感谢psmears的回答,我想我明白了这个问题。这是一个潜在的解决方案。反馈会很棒:
- 在任何给定时间,我都可以知道正在运行的线程数量,并且如果需要,我可以在广播信号期间阻止所有线程创建和退出。
- 想要执行广播信号的线程获取锁(因此其他线程不能同时执行该操作),然后为自己阻塞该信号,并向进程发送 num_threads 信号,然后解除阻塞本身的信号。
- 信号处理程序以原子方式递增计数器,并且信号处理程序的每个实例都会等待,直到该计数器等于 num_threads 才返回。
- 进行广播的线程也会等待计数器达到
num_threads
,然后释放锁。
一个可能的问题是,如果内核内存不足,信号将不会排队(Linux 似乎有这个问题)。您是否知道 sigqueue 是否会在无法对信号进行排队时可靠地通知调用者(在这种情况下,我将循环直到成功),或者信号可能会默默丢失吗?
编辑2:现在似乎可以正常工作了。根据sigqueue
的文档,如果未能对信号进行排队,它会返回EAGAIN
。但为了稳健性,我决定继续调用 sigqueue
直到 num_threads-1
信号处理程序运行,并在发送后交错调用 sched_yield
num_threads-1
信号。
在线程创建时存在竞争条件,计算新线程,但我通过奇怪(滥用)使用读写锁解决了这个问题。线程创建是“读”,广播信号是“写”,因此除非有线程尝试广播,否则它不会在线程创建时产生任何争用。
Without keeping a list of current threads, I'm trying to see that a realtime signal gets delivered to all threads in my process. My idea is to go about it like this:
- Initially the signal handler is installed and the signal is unblocked in all threads.
- When one thread wants to send the 'broadcast' signal, it acquires a mutex and sets a global flag that the broadcast is taking place.
- The sender blocks the signal (using
pthread_sigmask
) for itself, and enters a loop repeatedly callingraise(sig)
untilsigpending
indicates that the signal is pending (there were no threads remaining with the signal blocked). - As threads receive the signal, they act on it but wait in the signal handler for the broadcast flag to be cleared, so that the signal will remain masked.
- The sender finishes the loop by unblocking the signal (in order to get its own delivery).
- When the sender handles its own signal, it clears the global flag so that all the other threads can continue with their business.
The problem I'm running into is that pthread_sigmask
is not being respected. Everything works right if I run the test program under strace
(presumably due to different scheduling timing), but as soon as I run it alone, the sender receives its own signal (despite having blocked it..?) and none of the other threads ever get scheduled.
Any ideas what might be wrong? I've tried using sigqueue
instead of raise
, probing the signal mask, adding sleep
all over the place to make sure the threads are patiently waiting for their signals, etc. and now I'm at a loss.
Edit: Thanks to psmears' answer, I think I understand the problem. Here's a potential solution. Feedback would be great:
- At any given time, I can know the number of threads running, and I can prevent all thread creation and exiting during the broadcast signal if I need to.
- The thread that wants to do the broadcast signal acquires a lock (so no other thread can do it at the same time), then blocks the signal for itself, and sends
num_threads
signals to the process, then unblocks the signal for itself. - The signal handler atomically increments a counter, and each instance of the signal handler waits until that counter is equal to
num_threads
to return. - The thread that did the broadcast also waits for the counter to reach
num_threads
, then it releases the lock.
One possible concern is that the signals will not get queued if the kernel is out of memory (Linux seems to have that issue). Do you know if sigqueue
reliably informs the caller when it's unable to queue the signal (in which case I would loop until it succeeds), or could signals possibly be silently lost?
Edit 2: It seems to be working now. According to the documentation for sigqueue
, it returns EAGAIN
if it fails to queue the signal. But for robustness, I decided to just keep calling sigqueue
until num_threads-1
signal handlers are running, interleaving calls to sched_yield
after I've sent num_threads-1
signals.
There was a race condition at thread creation time, counting new threads, but I solved it with a strange (ab)use of read-write locks. Thread creation is "reading" and the broadcast signal is "writing", so unless there's a thread trying to broadcast, it doesn't create any contention at thread-creation.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
raise() 仅将信号发送到当前线程,因此其他线程不会收到它。我怀疑
strace
使事情正常工作的事实是strace
中的一个错误(由于它的工作方式,它最终会拦截发送到进程的所有信号并重新引发它们,所以它可能会以错误的方式重新提高它们......)。您可能可以使用
kill(getpid(),)
将信号发送到整个当前进程来解决这个问题。然而,您可能会看到的另一个潜在问题是
sigpending()
可以指示在所有线程收到该信号之前该信号在进程上处于待处理状态 - 这意味着至少有一个这样的信号待处理对于该进程,还没有可用的 CPU 来运行线程来交付它...您能描述一下您想要实现的目标的更多细节吗?您希望它有多便携?几乎肯定有更好的方法来做到这一点(信号几乎总是一个令人头疼的问题,尤其是与线程混合时......)
raise()
sends the signal to the current thread (only), so other threads won't receive it. I suspect that the fact thatstrace
makes things work is a bug instrace
(due to the way it works it ends up intercepting all signals sent to the process and re-raising them, so it may be re-raising them in the wrong way...).You can probably get round that using
kill(getpid(), <signal>)
to send the signal to the current process as a whole.However, another potential issue you might see is that
sigpending()
can indicate that the signal is pending on the process before all threads have received it - all that means is that there is at least one such signal pending for the process, and no CPU has yet become available to run a thread to deliver it...Can you describe more details of what you're aiming to achieve? And how portable you want it to be? There's almost certainly a better way of doing it (signals are almost always a major headache, especially when mixed with threads...)
在多线程程序中,raise(sig) 相当于 pthread_kill(pthread_self(), sig)。
尝试kill(getpid(), sig)
In multithreaded program raise(sig) is equivalent to pthread_kill(pthread_self(), sig).
Try kill(getpid(), sig)
鉴于您显然可以锁定线程创建和销毁,您是否可以只让“广播”线程将所需的更新发布到每个线程队列中的线程本地状态,每个线程在使用线程时都会检查该队列 -地方州?如果有未完成的更新,它首先应用它们。
Given that you can apparently lock thread creation and destruction, could you not just have the "broadcasting" thread post the required updates to thread-local-state in a per-thread queue, which each thread checks whenever it goes to use the thread-local-state? If there's outstanding update(s), it first applies them.
您正在尝试同步一组线程。
从设计模式的角度来看,您的问题的 pthread 本机解决方案将是 pthread 屏障。
You are trying to synchronize a set of threads.
From a design pattern point of view the pthread native solution for your problem would be a pthread barrier.