为什么 cron 产生的进程最终会失效?

发布于 2024-08-07 07:29:49 字数 1039 浏览 8 评论 0原文

我有一些进程在 top (和 ps)中显示为 。我从真实的脚本和程序中总结了一些东西。

在我的 crontab 中:

* * * * * /tmp/launcher.sh /tmp/tester.sh

launcher.sh 的内容(当然标记为可执行文件):

#!/bin/bash
# the real script does a little argument processing here
"$@"

tester.sh 的内容(它是当然标记为可执行文件):

#!/bin/bash
sleep 27 & # the real script launches a compiled C program in the background

ps 显示以下内容:

user       24257 24256  0 18:32 ?        00:00:00 [launcher.sh] <defunct>
user       24259     1  0 18:32 ?        00:00:00 sleep 27

请注意,tester.sh 未出现 - 它在启动后台作业后已退出。

为什么 launcher.sh 会留下来,标记为 ?它似乎只有在由 cron 启动时才会执行此操作,而不是在我自己运行它时执行此操作。

补充说明:launcher.sh是其运行系统中的通用脚本,不易修改。其他东西(crontabtester.sh,甚至我运行而不是sleep的程序)都可以更容易地修改。

I have some processes showing up as <defunct> in top (and ps). I've boiled things down from the real scripts and programs.

In my crontab:

* * * * * /tmp/launcher.sh /tmp/tester.sh

The contents of launcher.sh (which is of course marked executable):

#!/bin/bash
# the real script does a little argument processing here
"$@"

The contents of tester.sh (which is of course marked executable):

#!/bin/bash
sleep 27 & # the real script launches a compiled C program in the background

ps shows the following:

user       24257 24256  0 18:32 ?        00:00:00 [launcher.sh] <defunct>
user       24259     1  0 18:32 ?        00:00:00 sleep 27

Note that tester.sh does not appear--it has exited after launching the background job.

Why does launcher.sh stick around, marked <defunct>? It only seems to do this when launched by cron--not when I run it myself.

Additional note: launcher.sh is a common script in the system this runs on, which is not easily modified. The other things (crontab, tester.sh, even the program that I run instead of sleep) can be modiified much more easily.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

冷心人i 2024-08-14 07:29:49

因为它们不是 wait(2) 系统调用的主题。

由于将来可能有人等待这些进程,因此内核无法完全摆脱它们,或者它将无法执行 wait 系统调用,因为它没有退出状态或它存在的证据。

当您从 shell 启动一个程序时,您的 shell 会捕获 SIGCHLD 并执行各种等待操作,因此任何东西都不会长期失效。

但 cron 并不处于等待状态,它正在睡眠,因此失效的子进程可能会停留一段时间,直到 cron 醒来。


更新:回复评论...
唔。我确实设法重复了这个问题:

 PPID   PID  PGID  SESS COMMAND
    1  3562  3562  3562 cron
 3562  1629  3562  3562  \_ cron
 1629  1636  1636  1636      \_ sh <defunct>
    1  1639  1636  1636 sleep

所以,发生的事情是,我认为:

  • cron fork 和 cron child 启动 shell
  • shell (1636) 启动 sid 和 pgid 1636 并启动 sleep
  • shell 退出,发送到 cron 3562 的 SIGCHLD
  • 信号被忽略或错误处理
  • shell变成僵尸。请注意,sleep 被重新设置为 init,因此当 sleep 退出时,init 将获取信号并进行清理。我还在想知道什么时候僵尸会被收割。可能没有活跃的子进程,cron 1629 认为它可以退出,此时僵尸将被重新设置为 init 并获得收获。所以现在我们想知道 cron 应该处理的丢失的 SIGCHLD。
    • 这不一定是 vixie cron 的错。正如您在这里看到的, libdaemon 在 daemon_fork() 期间安装 SIGCHLD 处理程序,这可能会干扰中间快速退出时的信号传递 1629

      现在,我不知道我什至不知道我的 Ubuntu 系统上的 vixie cron 是否是用 libdaemon 构建的,但至少我有了一个新的理论。 :-)

Because they haven't been the subject of a wait(2) system call.

Since someone may wait for these processes in the future, the kernel can't completely get rid of them or it won't be able to execute the wait system call because it won't have the exit status or evidence of its existence any more.

When you start one from the shell, your shell is trapping SIGCHLD and doing various wait operations anyway, so nothing stays defunct for long.

But cron isn't in a wait state, it is sleeping, so the defunct child may stick around for a while until cron wakes up.


Update:   Responding to comment...
Hmm. I did manage to duplicate the issue:

 PPID   PID  PGID  SESS COMMAND
    1  3562  3562  3562 cron
 3562  1629  3562  3562  \_ cron
 1629  1636  1636  1636      \_ sh <defunct>
    1  1639  1636  1636 sleep

So, what happened was, I think:

  • cron forks and cron child starts shell
  • shell (1636) starts sid and pgid 1636 and starts sleep
  • shell exits, SIGCHLD sent to cron 3562
  • signal is ignored or mishandled
  • shell turns zombie. Note that sleep is reparented to init, so when the sleep exits init will get the signal and clean up. I'm still trying to figure out when the zombie gets reaped. Probably with no active children cron 1629 figures out it can exit, at that point the zombie will be reparented to init and get reaped. So now we wonder about the missing SIGCHLD that cron should have processed.
    • It isn't necessarily vixie cron's fault. As you can see here, libdaemon installs a SIGCHLD handler during daemon_fork(), and this could interfere with signal delivery on a quick exit by intermediate 1629

      Now, I don't even know if vixie cron on my Ubuntu system is even built with libdaemon, but at least I have a new theory. :-)

刘备忘录 2024-08-14 07:29:49

在我看来,这是由进程 CROND (由 crond 为每个任务生成)等待 stdin 上的输入引起的,该输入通过管道传输到 crontab 中命令的 stdout/stderr 。这样做是因为 cron 能够通过邮件将结果输出发送给用户。

因此 CROND 正在等待 EOF,直到用户命令及其生成的所有子进程都关闭了管道。如果完成此操作,CROND 将继续执行等待语句,然后失效的用户命令就会消失。

所以我认为你必须显式地断开脚本中每个生成的子进程与管道的连接(例如,通过将其重定向到文件或 /dev/null )。

因此以下行应该在 crontab 中工作:

* * * * * ( /tmp/launcher.sh /tmp/tester.sh &>/dev/null & ) 

to my opinion it's caused by process CROND (spawned by crond for every task) waiting for input on stdin which is piped to the stdout/stderr of the command in the crontab. This is done because cron is able to send resulting output via mail to the user.

So CROND is waiting for EOF till the user command and all it's spawned child processes have closed the pipe. If this is done CROND continues with the wait-statement and then the defunct user command disappears.

So I think you have to explicitly disconnect every spawned subprocess in your script form the pipe (e.g. by redirecting it to a file or /dev/null.

so the following line should work in crontab :

* * * * * ( /tmp/launcher.sh /tmp/tester.sh &>/dev/null & ) 
谜泪 2024-08-14 07:29:49

我怀疑 cron 正在等待会话中的所有子进程终止。请参阅 wait(2) 关于负 pid 参数的内容。您可以通过以下方式查看 SESS:

ps faxo stat,euid,ruid,tty,tpgid,sess,pgrp,ppid,pid,pcpu,comm

这是我所看到的(已编辑):

STAT  EUID  RUID TT       TPGID  SESS  PGRP  PPID   PID %CPU COMMAND
Ss       0     0 ?           -1  3197  3197     1  3197  0.0 cron
S        0     0 ?           -1  3197  3197  3197 18825  0.0  \_ cron
Zs    1000  1000 ?           -1 18832 18832 18825 18832  0.0      \_ sh <defunct>
S     1000  1000 ?           -1 18832 18832     1 18836  0.0 sleep

请注意,sh 和 sleep 位于同一个 SESS 中。

使用命令setsid(1)。这是 tester.sh:

#!/bin/bash
setsid sleep 27 # the real script launches a compiled C program in the background

请注意,您不需要 &,setsid 将其放在后台。

I suspect that cron is waiting for all subprocesses in the session to terminate. See wait(2) with respect to negative pid arguments. You can see the SESS with:

ps faxo stat,euid,ruid,tty,tpgid,sess,pgrp,ppid,pid,pcpu,comm

Here's what I see (edited):

STAT  EUID  RUID TT       TPGID  SESS  PGRP  PPID   PID %CPU COMMAND
Ss       0     0 ?           -1  3197  3197     1  3197  0.0 cron
S        0     0 ?           -1  3197  3197  3197 18825  0.0  \_ cron
Zs    1000  1000 ?           -1 18832 18832 18825 18832  0.0      \_ sh <defunct>
S     1000  1000 ?           -1 18832 18832     1 18836  0.0 sleep

Notice that the sh and the sleep are in the same SESS.

Use the command setsid(1). Here's tester.sh:

#!/bin/bash
setsid sleep 27 # the real script launches a compiled C program in the background

Notice you don't need &, setsid puts it in the background.

桃扇骨 2024-08-14 07:29:49

我建议您通过不使用两个单独的进程来解决问题:让 launcher.sh 在最后一行执行此操作:

exec "$@"

这将消除多余的进程。

I’d recommend that you solve the problem by simply not having two separate processes: Have launcher.sh do this on its last line:

exec "$@"

This will eliminate the superfluous process.

银河中√捞星星 2024-08-14 07:29:49

我在寻找类似问题的解决方案时发现了这个问题。不幸的是这个问题的答案并没有解决我的问题。

杀死已失效的进程不是一个选项,因为您需要找到并杀死其父进程。我最终通过以下方式杀死了已失效的进程:

ps -ef | grep '<defunct>' | grep -v grep | awk '{print "kill -9 ",$3}' | sh

在“grep ''”中,您可以将搜索范围缩小到您要查找的特定已失效进程。

I found this question while I was looking for a solution with a similar issue. Unfortunately answers in this question didn't solve my problem.

Killing defunct process is not an option as you need to find and kill its parent process. I ended up killing the defunct processes in the following way:

ps -ef | grep '<defunct>' | grep -v grep | awk '{print "kill -9 ",$3}' | sh

In "grep ''" you can narrow down the search to a specific defunct process you are after.

暮凉 2024-08-14 07:29:49

我已经多次测试过同样的问题。
最后我找到了解决方案。
只需在 bash 脚本之前指定“/bin/bash”,如下所示。

* * * * * /bin/bash /tmp/launcher.sh /tmp/tester.sh

I have tested the same problem so many times.
And finally I've got the solution.
Just specify the '/bin/bash' before the bash script as shown below.

* * * * * /bin/bash /tmp/launcher.sh /tmp/tester.sh
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文