Python 进程在重新启动后停止响应 SIGTERM / SIGINT

发布于 2024-07-26 21:48:35 字数 1680 浏览 5 评论 0原文

我遇到了一些使用看门狗进程运行的 python 进程的奇怪问题。

看门狗进程是用 python 编写的,是父进程,并且有一个名为 start_child(name) 的函数,该函数使用 subprocess.Popen 打开子进程。 记录 Popen 对象,以便看门狗可以使用 poll() 监视进程,并最终在需要时使用 terminate() 结束进程。 如果子进程意外死亡,看门狗会再次调用start_child(name)并记录新的Popen对象。

有7个子进程,也都是python的。 如果我手动运行任何子进程,我可以使用 kill 发送 SIGTERM 或 SIGINT 并获得我期望的结果(进程结束)。

但是,当从看门狗进程运行时,子进程只会在 FIRST 信号之后结束。 当看门狗重新启动子进程时,新的子进程不再响应 SIGTERM 或 SIGINT。 我不知道是什么原因造成的。

watchdog.py

class watchdog:
    # <snip> various init stuff

    def start(self):
        self.running = true

        kids = ['app1', 'app2', 'app3', 'app4', 'app5', 'app6', 'app7']
        self.processes = {}

        for kid in kids:
            self.start_child(kid)

        self.thread = threading.Thread(target=self._monitor)
        self.thread.start()

        while self.running:
            time.sleep(10)

    def start_child(self, name):
        try:
            proc = subprocess.Popen(name)
            self.processes[name] = proc
        except:
            print "oh no"
        else:
            print "started child ok"

    def _monitor(self):
        while self.running:
            time.sleep(1)
            if self.running:
                for kid, proc in self.processes.iteritems():
                    if proc.poll() is not None: # process ended
                        self.start_child(kid)

那么发生的事情是 watchdog.start() 启动所有 7 个进程,如果我发送任何进程 SIGTERM,它就会结束,监视器线程会再次启动它。 但是,如果我随后发送新进程 SIGTERM,它会忽略它。

我应该能够一遍又一遍地向重新启动的进程发送kill -15。 为什么重启后他们就忽略它?

I'm having a weird problem with some python processes running using a watchdog process.

The watchdog process is written in python and is the parent, and has a function called start_child(name) which uses subprocess.Popen to open the child process. The Popen object is recorded so that the watchdog can monitor the process using poll() and eventually end it with terminate() when needed.
If the child dies unexpectedly, the watchdog calls start_child(name) again and records the new Popen object.

There are 7 child processes, all of which are also python. If I run any of the children manually, I can send SIGTERM or SIGINT using kill and get the results I expect (the process ends).

However, when run from the watchdog process, the child will only end after the FIRST signal. When the watchdog restarts the child, the new child process no longer responds to SIGTERM or SIGINT. I have no idea what is causing this.

watchdog.py

class watchdog:
    # <snip> various init stuff

    def start(self):
        self.running = true

        kids = ['app1', 'app2', 'app3', 'app4', 'app5', 'app6', 'app7']
        self.processes = {}

        for kid in kids:
            self.start_child(kid)

        self.thread = threading.Thread(target=self._monitor)
        self.thread.start()

        while self.running:
            time.sleep(10)

    def start_child(self, name):
        try:
            proc = subprocess.Popen(name)
            self.processes[name] = proc
        except:
            print "oh no"
        else:
            print "started child ok"

    def _monitor(self):
        while self.running:
            time.sleep(1)
            if self.running:
                for kid, proc in self.processes.iteritems():
                    if proc.poll() is not None: # process ended
                        self.start_child(kid)

So what happens is watchdog.start() launches all 7 processes, and if I send any process SIGTERM, it ends, and the monitor thread starts it again. However, if I then send the new process SIGTERM, it ignores it.

I should be able to keep sending kill -15 to the restarted processes over and over again. Why do they ignore it after being restarted?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

鹿! 2024-08-02 21:48:35

正如这里所解释的: http://blogs.gentoo.org/agaffney/2005/03 /18/python_sucks ,当 Python 创建一个新线程时,它会阻止该线程(以及该线程生成的任何进程)的所有信号。

我使用 sigprocmask 修复了这个问题,通过 ctypes 调用。 这可能是也可能不是“正确”的方法,但它确实有效。

在子进程中,__init__期间:

libc = ctypes.cdll.LoadLibrary("libc.so")
mask = '\x00' * 17 # 16 byte empty mask + null terminator 
libc.sigprocmask(3, mask, None) # '3' on FreeBSD is the value for SIG_SETMASK

As explained here: http://blogs.gentoo.org/agaffney/2005/03/18/python_sucks , when Python creates a new thread, it blocks all signals for that thread (and for any processes that thread spawns).

I fixed this using sigprocmask, called through ctypes. This may or may not be the "correct" way to do it, but it does work.

In the child process, during __init__:

libc = ctypes.cdll.LoadLibrary("libc.so")
mask = '\x00' * 17 # 16 byte empty mask + null terminator 
libc.sigprocmask(3, mask, None) # '3' on FreeBSD is the value for SIG_SETMASK
夜深人未静 2024-08-02 21:48:35

在 Python 中恢复默认信号处理程序而不是通过 ctypes 不是更好吗? 在您的子进程中,使用信号模块:

import signal
for sig in range(1, signal.NSIG):
    try:
        signal.signal(sig, signal.SIG_DFL)
    except RuntimeError:
        pass

当尝试设置无法捕获的信号(例如 SIGKILL)时,会引发 RuntimeError 。

Wouldn't it be better to restore the default signal handlers within Python rather than via ctypes? In your child process, use the signal module:

import signal
for sig in range(1, signal.NSIG):
    try:
        signal.signal(sig, signal.SIG_DFL)
    except RuntimeError:
        pass

RuntimeError is raised when trying to set signals such as SIGKILL which can't be caught.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文