使用 fork 和套接字时处理非正常关闭
我有一个服务器,它侦听套接字连接并根据请求执行不同类型的操作。 其中之一是长期存在的数据库查询,服务器为此进行分叉。
服务器保留所有活动子级的日志,每当要求关闭时,它都会在退出之前杀死所有子级。 有几次我遇到过服务器崩溃或被非正常杀死的情况,导致子进程成为孤儿进程。 如果我尝试再次恢复服务器,它将拒绝说侦听套接字无法绑定,因为该地址/端口已经绑定。
我正在寻找一种方法来改善这种情况,以便主服务器进程能够立即回来。 我尝试过监视子进程的父进程是否存在,并在 at 消失后立即退出,但这只会导致僵尸进程,并且套接字似乎仍然被绑定。
服务器是用Python编写的,但欢迎任何语言的解释或建议。
I have a server that listens for socket connections and perform different kind of actions, depending on the request. One of them is long lived database queries, for which the server forks.
The server keeps a log of all the active children and whenever asked to shutdown, it will kill all it's children before exiting. A couple of times I have encountered the situation that the server crashed or was killed ungracefully, which lead to the child process becoming orphan. If I try to bring the server back again, it will refuse saying the the listening socket is not able to bind because that address/port is already bound.
I am looking for a way to improve this kind of situation, so that the main server process can come back right away. I've tried monitoring the parent existance from the child and exiting as soon at is gone, but this has only resulted in having zombie processes and the socket seems to still be bound.
The server is written in Python, but any explanation or suggestion in any language is welcome.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
让您的服务器成为进程组的领导者。 在这种情况下,当组长退出时,孩子们就会被终止。
Make your server the leader of a process group. In that case children are terminated when the group leader exits.
也许当您分叉时,与子进程断绝关系,以便父进程不是在操作系统中注册的父进程。 家长真的需要和孩子沟通吗? 如果没有,这可能是一个选择。
您可以以不同的方式跟踪子进程。 您将不会再收到 SIGCHLD 事件。
Perhaps when you fork, disown the child, so that the parent process isn't the parent registered with the OS. Does the parent really need to communicate with the child? If not this may be an option.
You can keep track of child processes, but in a different way. You won't get SIGCHLD events anymore.
在调用 Listen() 之前在套接字上使用它:
它允许您的程序使用该套接字,即使它之前被另一个传出 TCP 连接随机选择(端口 <1024 不会发生)。 但它也应该可以直接帮助您解决问题!
不相关:
还有可能发生另一件坏事:如果您的孩子被分叉,他们会继承每个打开的文件描述符。 如果他们只是分叉并启动另一个长时间运行的程序,这些程序也会有一个指向您的侦听套接字的开放句柄,因此它会继续使用(使用 lsof 和 netstat 命令查找!)
所以应该这样称呼:
但我从未尝试过在主程序中,如果它分叉子程序,它显然不会帮助您,因为子程序是分叉的,而不是用 exec 运行。
但请记住它并在主程序中的监听套接字上调用它! 以防万一您运行外部程序
Use this on your socket before you call listen():
It allows your programm to use that socket, even it was randomly picked before by another outgoing TCP-connection (cannot happen for ports <1024). But it should also help directly with your problem!!
Unrelated:
There is another bad thing that can happen: If your childs are forked, they inherit EVERY open filedescriptor. If they simply fork and launch another long running programm, those will also have an open handle to your listen-socket, so it stays in use (find out with lsof and netstat command!)
So one should call this:
But I never tried it in the main programm if it forks off childs and it clearly will not help you because the childs are forked, not run with exec.
But keep it in mind and call it on your listen socket in the main programm anyway! Just in case you run an external programm