为什么 cron 产生的进程最终会失效?
我有一些进程在 top
(和 ps
)中显示为
。我从真实的脚本和程序中总结了一些东西。
在我的 crontab 中:
* * * * * /tmp/launcher.sh /tmp/tester.sh
launcher.sh
的内容(当然标记为可执行文件):
#!/bin/bash
# the real script does a little argument processing here
"$@"
tester.sh
的内容(它是当然标记为可执行文件):
#!/bin/bash
sleep 27 & # the real script launches a compiled C program in the background
ps
显示以下内容:
user 24257 24256 0 18:32 ? 00:00:00 [launcher.sh] <defunct>
user 24259 1 0 18:32 ? 00:00:00 sleep 27
请注意,tester.sh
未出现 - 它在启动后台作业后已退出。
为什么 launcher.sh
会留下来,标记为
?它似乎只有在由 cron 启动时才会执行此操作,而不是在我自己运行它时执行此操作。
补充说明:launcher.sh
是其运行系统中的通用脚本,不易修改。其他东西(crontab
、tester.sh
,甚至我运行而不是sleep
的程序)都可以更容易地修改。
I have some processes showing up as <defunct>
in top
(and ps
). I've boiled things down from the real scripts and programs.
In my crontab
:
* * * * * /tmp/launcher.sh /tmp/tester.sh
The contents of launcher.sh
(which is of course marked executable):
#!/bin/bash
# the real script does a little argument processing here
"$@"
The contents of tester.sh
(which is of course marked executable):
#!/bin/bash
sleep 27 & # the real script launches a compiled C program in the background
ps
shows the following:
user 24257 24256 0 18:32 ? 00:00:00 [launcher.sh] <defunct>
user 24259 1 0 18:32 ? 00:00:00 sleep 27
Note that tester.sh
does not appear--it has exited after launching the background job.
Why does launcher.sh
stick around, marked <defunct>
? It only seems to do this when launched by cron
--not when I run it myself.
Additional note: launcher.sh
is a common script in the system this runs on, which is not easily modified. The other things (crontab
, tester.sh
, even the program that I run instead of sleep
) can be modiified much more easily.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
因为它们不是
wait(2)
系统调用的主题。由于将来可能有人等待这些进程,因此内核无法完全摆脱它们,或者它将无法执行
wait
系统调用,因为它没有退出状态或它存在的证据。当您从 shell 启动一个程序时,您的 shell 会捕获 SIGCHLD 并执行各种等待操作,因此任何东西都不会长期失效。
但 cron 并不处于等待状态,它正在睡眠,因此失效的子进程可能会停留一段时间,直到 cron 醒来。
更新:回复评论...
唔。我确实设法重复了这个问题:
所以,发生的事情是,我认为:
daemon_fork()
期间安装 SIGCHLD 处理程序,这可能会干扰中间快速退出时的信号传递 1629现在,我不知道我什至不知道我的 Ubuntu 系统上的 vixie cron 是否是用 libdaemon 构建的,但至少我有了一个新的理论。 :-)
Because they haven't been the subject of a
wait(2)
system call.Since someone may wait for these processes in the future, the kernel can't completely get rid of them or it won't be able to execute the
wait
system call because it won't have the exit status or evidence of its existence any more.When you start one from the shell, your shell is trapping SIGCHLD and doing various wait operations anyway, so nothing stays defunct for long.
But cron isn't in a wait state, it is sleeping, so the defunct child may stick around for a while until cron wakes up.
Update: Responding to comment...
Hmm. I did manage to duplicate the issue:
So, what happened was, I think:
daemon_fork()
, and this could interfere with signal delivery on a quick exit by intermediate 1629Now, I don't even know if vixie cron on my Ubuntu system is even built with libdaemon, but at least I have a new theory. :-)
在我看来,这是由进程 CROND (由 crond 为每个任务生成)等待 stdin 上的输入引起的,该输入通过管道传输到 crontab 中命令的 stdout/stderr 。这样做是因为 cron 能够通过邮件将结果输出发送给用户。
因此 CROND 正在等待 EOF,直到用户命令及其生成的所有子进程都关闭了管道。如果完成此操作,CROND 将继续执行等待语句,然后失效的用户命令就会消失。
所以我认为你必须显式地断开脚本中每个生成的子进程与管道的连接(例如,通过将其重定向到文件或 /dev/null )。
因此以下行应该在 crontab 中工作:
to my opinion it's caused by process CROND (spawned by crond for every task) waiting for input on stdin which is piped to the stdout/stderr of the command in the crontab. This is done because cron is able to send resulting output via mail to the user.
So CROND is waiting for EOF till the user command and all it's spawned child processes have closed the pipe. If this is done CROND continues with the wait-statement and then the defunct user command disappears.
So I think you have to explicitly disconnect every spawned subprocess in your script form the pipe (e.g. by redirecting it to a file or /dev/null.
so the following line should work in crontab :
我怀疑 cron 正在等待会话中的所有子进程终止。请参阅 wait(2) 关于负 pid 参数的内容。您可以通过以下方式查看 SESS:
这是我所看到的(已编辑):
请注意,sh 和 sleep 位于同一个 SESS 中。
使用命令setsid(1)。这是 tester.sh:
请注意,您不需要
&
,setsid 将其放在后台。I suspect that cron is waiting for all subprocesses in the session to terminate. See wait(2) with respect to negative pid arguments. You can see the SESS with:
Here's what I see (edited):
Notice that the sh and the sleep are in the same SESS.
Use the command setsid(1). Here's tester.sh:
Notice you don't need
&
, setsid puts it in the background.我建议您通过不使用两个单独的进程来解决问题:让
launcher.sh
在最后一行执行此操作:这将消除多余的进程。
I’d recommend that you solve the problem by simply not having two separate processes: Have
launcher.sh
do this on its last line:This will eliminate the superfluous process.
我在寻找类似问题的解决方案时发现了这个问题。不幸的是这个问题的答案并没有解决我的问题。
杀死已失效的进程不是一个选项,因为您需要找到并杀死其父进程。我最终通过以下方式杀死了已失效的进程:
在“grep ''”中,您可以将搜索范围缩小到您要查找的特定已失效进程。
I found this question while I was looking for a solution with a similar issue. Unfortunately answers in this question didn't solve my problem.
Killing defunct process is not an option as you need to find and kill its parent process. I ended up killing the defunct processes in the following way:
In "grep ''" you can narrow down the search to a specific defunct process you are after.
我已经多次测试过同样的问题。
最后我找到了解决方案。
只需在 bash 脚本之前指定“/bin/bash”,如下所示。
I have tested the same problem so many times.
And finally I've got the solution.
Just specify the '/bin/bash' before the bash script as shown below.