从 CGI 脚本启动后台进程/守护进程
我正在尝试从 CGI 脚本启动后台进程。基本上,当提交表单时,CGI 脚本将向用户指示他或她的请求正在处理,而后台脚本则执行实际处理(因为处理往往需要很长时间)。我面临的问题是 Apache 不会将父 CGI 脚本的输出发送到浏览器,直到子脚本终止。
一位同事告诉我,我想做的事情是不可能的,因为没有办法阻止 Apache 等待 CGI 脚本的整个进程树终止。然而,我也在网络上看到了许多关于“双叉”技巧的参考,该技巧应该可以完成这项工作。 这个 Stack Overflow 答案 中简要描述了该技巧,但我在其他地方看到过类似的代码。
这是我编写的一个简短的脚本,用于在 Python 中测试双分叉技巧:
import os
import sys
if os.fork():
print 'Content-type: text/html\n\n Done'
sys.exit(0)
if os.fork():
os.setsid()
sys.exit(0)
# Second child
os.chdir("/")
sys.stdout.close()
sys.stderr.close()
sys.stdin.close()
f = open('/tmp/lol.txt', 'w')
while 1:
f.write('test\n')
如果我从 shell 运行它,它会完全按照我的预期执行:原始脚本和第一个后代会死亡,第二个后代会继续运行直到被杀死手动。但如果我通过 CGI 访问它,页面将不会加载,直到我杀死第二个后代或 Apache 由于 CGI 超时而杀死它。我还尝试将第二个 sys.exit(0)
替换为 os._exit(0)
,但没有区别。
我做错了什么?
I'm trying to launch a background process from a CGI scripts. Basically, when a form is submitted the CGI script will indicate to the user that his or her request is being processed, while the background script does the actual processing (because the processing tends to take a long time.) The problem I'm facing is that Apache won't send the output of the parent CGI script to the browser until the child script terminates.
I've been told by a colleague that what I want to do is impossible because there is no way to prevent Apache from waiting for the entire process tree of a CGI script to die. However, I've also seen numerous references around the web to a "double fork" trick which is supposed to do the job. The trick is described succinctly in this Stack Overflow answer, but I've seen similar code elsewhere.
Here's a short script I wrote to test the double-fork trick in Python:
import os
import sys
if os.fork():
print 'Content-type: text/html\n\n Done'
sys.exit(0)
if os.fork():
os.setsid()
sys.exit(0)
# Second child
os.chdir("/")
sys.stdout.close()
sys.stderr.close()
sys.stdin.close()
f = open('/tmp/lol.txt', 'w')
while 1:
f.write('test\n')
If I run this from the shell, it does exactly what I'd expect: the original script and first descendant die, and the second descendant keeps running until it's killed manually. But if I access it through CGI, the page won't load until I kill the second descendant or Apache kills it because of the CGI timeout. I've also tried replacing the second sys.exit(0)
with os._exit(0)
, but there is no difference.
What am I doing wrong?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
不要分叉 - 单独运行批处理
这种双分叉方法是某种黑客行为,对我来说这表明不应该这样做:)。无论如何,对于 CGI 来说。一般原则是,如果某件事太难完成,那么你可能采用了错误的方法。
幸运的是,您提供了所需的背景信息 - 一个 CGI 调用来启动一些独立发生的处理并返回给调用者。当然 - 有一些 UNIX 命令可以做到这一点 - 安排命令在特定时间(
at
)或 CPU 空闲时(batch
)运行。因此,请这样做:这样就完成了。请记住,如果有一些输出到 stdout/stderr,它将被邮寄给用户(这有利于调试,但否则脚本可能应该保持安静)。
附言。我刚刚记得 Windows 也有
at
的版本,因此通过对调用进行少量修改,您也可以在 Windows 上的 apache 下使用它(与 fork 技巧相比,fork 技巧不会在 Windows 上工作)。聚苯硫醚。确保运行 CGI 的进程未在
/etc/at.deny
中排除在调度批处理作业之外Don't fork - run batch separately
This double-forking approach is some kind of hack, which to me is indication it shouldn't be done :). For CGI anyway. Under the general principle that if something is too hard to accomplish, you are probably approaching it the wrong way.
Luckily you give the background info on what you need - a CGI call to initiate some processing that happens independently and to return back to the caller. Well sure - there are unix commands that do just that - schedule command to run at specific time (
at
) or whenever CPU is free (batch
). So do this instead:And there you have it. Keep in mind that if there is some output to stdout/stderr, that will be mailed to the user (which is good for debugging but otherwise script probably should keep quiet).
PS. i just remembered that Windows also has version of
at
, so with minor modification of the invocation you can have that work under apache on windows too (vs fork trick that won't work on windows).PPS. make sure the process running CGI is not excluded in
/etc/at.deny
from scheduling batch jobs我不建议以这种方式解决这个问题。如果您需要异步执行某些任务,为什么不使用像 beanstalkd 这样的工作队列而不是尝试分叉关闭请求中的任务?有适用于 python 的 beanstalkd 客户端库。
I wouldn't suggets going about the problem this way. If you need to execute some task asynchronously, why not use a work queue like beanstalkd instead of trying to fork off the tasks from the request? There are client libraries for beanstalkd available for python.
我认为有两个问题:
setsid
位于错误的位置,并且在瞬态子进程之一中执行缓冲 IO 操作:您已经得到了原始进程(祖父母,打印“成功”),中间父母和孙子(“lol.txt”)。
在孙子生成后,
os.setsid()
调用将在中间父代中执行。生成孙子。创建孙子后,中间父级无法影响孙子的会话。试试这个:这会在生成孙子之前创建一个新会话。然后中间父进程死亡,使会话没有进程组领导者,确保任何打开终端的调用都会失败,确保终端输入或输出永远不会阻塞,或者向子进程发送意外的信号。
请注意,我还将
success
移至祖父母;无法保证调用fork(2)
后哪个子进程将首先运行,并且您面临子进程被生成的风险,并可能尝试将输出写入标准在中间父级有机会将success
写入远程客户端之前,出现错误或标准错误。在这种情况下,流会很快关闭,但是,在多个进程之间混合标准 IO 流肯定会带来困难:如果可以的话,将其全部保留在一个进程中。
编辑我发现了一个我无法解释的奇怪行为:
最后一行,
第二个fork pid
之后,仅在os.sleep(1)时出现
调用被注释掉。当调用保持原样时,最后一行永远不会出现在浏览器中。 (但否则所有内容都会打印到浏览器。)I think there are two issues:
setsid
is in the wrong place and doing buffered IO operations in one of the transient children:You've got the original process (grandparent, prints "success"), the middle parent, and the grandchild ("lol.txt").
The
os.setsid()
call is being performed in the middle parent after the grandchild has been spawned. The middle parent can't influence the grandchild's session after the grandchild has been created. Try this:This creates a new session before spawning the grandchild. Then the middle parent dies, leaving the session without a process group leader, ensuring that any calls to open a terminal will fail, making sure there's never any blocking on terminal input or output, or sending unexpected signals to the child.
Note that I've also moved the
success
to the grandparent; there's no guarantee of which child will run first after callingfork(2)
, and you run the risk that the child would be spawned, and potentially try to write output to standard out or standard error, before the middle parent could have had a chance to writesuccess
to the remote client.In this case, the streams are closed quickly, but still, mixing standard IO streams among multiple processes is bound to give difficulty: keep it all in one process, if you can.
Edit I've found a strange behavior I can't explain:
The last line,
after second fork pid
, only appears when theos.sleep(1)
call is commented out. When the call is left in place, the last line never appears in the browser. (But otherwise all the content is printed to the browser.)我需要像这样破坏标准输出和标准错误:
I needed to break the stdout as well as the stderr like this:
好的,如果您不需要启动另一个脚本而是继续使用同一个脚本在后台执行漫长的过程,我将添加一个更简单的解决方案。这将让您立即给出客户端看到的等待消息,并继续服务器处理,即使客户端终止浏览器会话:
我已经阅读了一半的互联网一周,但没有成功,最后我尝试测试是否有
sys.stdout.close()
和os.close(sys.stdout.fileno())
之间存在一个巨大差异:第一个没有执行任何操作,而第二个则关闭了与 Web 服务器的管道并完全断开与客户端的连接。 fork 是唯一必要的,因为网络服务器会在一段时间后终止其进程,并且您的长进程可能需要更多时间才能完成。Ok, I'm adding a simpler solution, if you don't need to start another script but continue in the same one to do the long process in background. This will let you give a waiting message instantly seen by the client and continue your server processing even if the client kill the browser session:
I have read half the Internet for one week without success on this one, finally I tried to test if there is a difference between
sys.stdout.close()
andos.close(sys.stdout.fileno())
and there is an huge one: The first didn't do anything while the second closed the pipe from the web server and completly disconnected from the client. The fork is only necessary because the webserver will kill its processes after a while and your long process probably needs more time to complete.正如其他答案所指出的,从 CGI 脚本启动持久进程是很棘手的,因为该进程必须完全将自己与 CGI 程序分离。我发现一个很棒的通用程序是 daemon。它会为您处理涉及打开文件句柄、进程组、根目录等的混乱细节。因此,此类 CGI 程序的模式是:
最初的帖子描述了这样一种情况:您希望 CGI 程序快速返回,同时生成一个后台进程来完成处理该请求。但也存在这样的情况:您的 Web 应用程序依赖于正在运行的服务,而该服务必须保持活动状态。 (其他人已经讨论过使用 beanstalkd 来处理作业。但是如何确保 beanstalkd 本身还活着?)一种方法是从 CGI 脚本中重新启动服务(如果它已关闭)。在您对服务器的控制有限且不能依赖 cron 或 init.d 机制之类的环境中,此方法很有意义。
As other answers have noted, it is tricky to start a persistent process from your CGI script because the process must cleanly dissociate itself from the CGI program. I have found that a great general-purpose program for this is daemon. It takes care of the messy details involving open file handles, process groups, root directory, etc etc for you. So the pattern of such a CGI program is:
The original post describes the case where you want your CGI program to return quickly, while spawning off a background process to finish handling that one request. But there is also the case where your web application depends on a running service which must be kept alive. (Other people have talked about using beanstalkd to handle jobs. But how do you ensure that beanstalkd itself is alive?) One way to do this is to restart the service (if it's down) from within the CGI script. This approach makes sense in an environment where you have limited control over the server and can't rely on things like cron or an init.d mechanism.
在某些情况下,将工作传递给守护进程或 cron 是不合适的。有时你确实需要分叉,让父进程退出(让 Apache 高兴)并让子进程缓慢地发生一些事情。
对我有用的:生成网络输出后,在分叉之前:
fflush(stdout), close(0), close(1), close(2); // 在进程中 BEFORE YOU FORK
然后 fork() 并让父进程立即 exit(0);
然后孩子又这么做了
关闭(0),关闭(1),关闭(2);
还有一个
setid();
......然后继续做任何需要做的事情。
为什么你需要在孩子身上关闭它们,即使它们在原始过程中提前被关闭,这让我感到困惑,但这就是有效的。没有第二组比赛就没有结束。这是在 Linux 上(在树莓派上)。
There are situations where passing work off to a daemon or cron is not appropriate. Sometimes you really DO need to fork, let the parent exit (to keep Apache happy) and let something slow happen in the child.
What worked for me: When done generating web output, and before the fork:
fflush(stdout), close(0), close(1), close(2); // in the process BEFORE YOU FORK
Then fork() and have the parent immediately exit(0);
The child then AGAIN does
close(0), close(1), close(2);
and also a
setsid();
...and then gets on with whatever it needs to do.
Why you need to close them in the child even though they were closed in the primordial process in advance is confusing to me, but this is what worked. It didn't without the 2nd set of closes. This was on Linux (on a raspberry pi).
我没有尝试过使用 fork,但在调用后台进程之前,我通过在原始消息之后执行 sys.stdout.flush() 来完成您所要求的任务。
IE
I haven't tried using
fork
but I have accomplished what you're asking by executing asys.stdout.flush()
after the original message, before calling the background process.i.e.
我的头还在痛。我尝试了所有可能的方法来使用 fork 和 stdout 关闭、清空或任何其他方式的代码,但没有任何效果。未完成的进程输出显示取决于网络服务器(Apache 或其他)配置,在我的情况下,无法更改它,因此尝试使用“Transfer-Encoding: chunked;chunk=CRLF”和“sys.stdout.flush” ()”也不起作用。这是最终有效的解决方案。
简而言之,使用类似的内容:
我使用“X”参数来区分父级和子级,因为我为两者调用相同的脚本,但您可以通过调用另一个脚本来更简单。如果完整的示例有用,请询问。
My head still hurting on that one. I tried all possible ways to use your code with fork and stdout closing, nulling or anything but nothing worked. The uncompleted process output display depends on webserver (Apache or other) config, and in my case it wasn't an option to change it, so tries with "Transfer-Encoding: chunked;chunk=CRLF" and "sys.stdout.flush()" didn't worked either. Here is the solution that finally worked.
In short, use something like:
I use the "X" parameter to make the distinction between parent and child because I call the same script for both, but you could do it simpler by calling another script. If a complete example would be useful, please ask.
对于使用 at/batch 解决方案遇到“sh: 1: 语法错误:重定向意外” 的用户,请尝试使用如下内容:
确保安装了 at 命令并且运行应用程序的用户 ins' /etc/at.deny 中的 t
For thous that have
"sh: 1: Syntax error: redirection unexpected"
with the at/batch solution try using something like this:Make sure that the at command is installed and the user running the application ins't in /etc/at.deny