如何以最少的停机时间移交 TCP 侦听套接字?
虽然这个问题被标记为 EventMachine,但任何语言的通用 BSD 套接字解决方案也非常受欢迎。
一些背景:
我有一个应用程序侦听 TCP 套接字。它通过常规 System V 风格的初始化脚本启动和关闭。
我的问题是它需要一些时间来启动才能准备好为 TCP 套接字提供服务。它并不太长,也许只有 5 秒,但如果在工作日需要重新启动,那么 5 秒就太长了。现有连接保持打开状态并正常完成也很重要。
应用程序重新启动的原因有补丁、升级等。不幸的是,我发现自己的处境是,每隔一段时间,我就需要在生产中做这种事情。
问题:
我正在寻找一种方法来将 TCP 侦听套接字从一个进程巧妙地移交到另一个进程,从而只获得一瞬间的停机时间。我希望现有连接/套接字保持打开状态并完成旧进程中的处理,而新进程开始为新连接提供服务。
是否有一些经过验证的方法可以使用 BSD 套接字来执行此操作? (EventMachine 解决方案的奖励积分。)
是否有开源库可以实现此功能,我可以按原样使用,或用作参考? (再次强调,非 Ruby 和非 EventMachine 解决方案也值得赞赏!)
While this question is tagged EventMachine, generic BSD-socket solutions in any language are much appreciated too.
Some background:
I have an application listening on a TCP socket. It is started and shut down with a regular System V style init script.
My problem is that it needs some time to start up before it is ready to service the TCP socket. It's not too long, perhaps only 5 seconds, but that's 5 seconds too long when a restart needs to be performed during a workday. It's also crucial that existing connections remain open and are finished normally.
Reasons for a restart of the application are patches, upgrades, and the like. I unfortunately find myself in the position that, every once in a while, I need to do this kind of thing in production.
The question:
I'm looking for a way to do a neat hand-over of the TCP listening socket, from one process to another, and as a result get only a split second of downtime. I'd like existing connections / sockets to remain open and finish processing in the old process, while the new process starts servicing new connectinos.
Is there some proven method of doing this using BSD-sockets? (Bonus points for an EventMachine solution.)
Are there perhaps open-source libraries out there implementing this, that I can use as is, or use as a reference? (Again, non-Ruby and non-EventMachine solutions are appreciated too!)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
有几种方法可以在不停机的情况下完成此操作,只需对服务器程序进行适当修改即可。
一种是在服务器本身中实现重启能力,例如在接收到特定信号或其他消息时。然后,程序将执行其新版本,将侦听套接字的文件描述符号传递给它,例如作为参数。此套接字将具有
FD_CLOEXEC
清除标志(默认)以便继承它。由于其他套接字将继续由原始进程提供服务,并且不应传递给新进程,因此应在这些套接字上设置标志,例如使用fcntl()
。在分叉并执行新进程后,原始进程可以继续并关闭侦听套接字,而不会中断服务,因为新进程现在正在侦听该套接字。如果您不希望旧服务器必须分叉并执行新服务器本身,另一种方法是使用 Unix 域套接字 用于在新旧服务器进程之间进行通信。新的服务器进程在启动时可以在文件系统中的已知位置检查此类套接字。如果存在,新服务器将连接到此套接字并请求旧服务器使用 SCM_RIGHTS 将其侦听套接字作为辅助数据传输。 cmsg 末尾给出了一个示例(3)。
There are a couple of ways to do this with no downtime, with appropriate modifications to the server program.
One is to implement a restart capability in the server itself, for example upon receipt of a certain signal or other message. The program would then exec its new version, passing it the file descriptor number of the listening socket e.g. as an argument. This socket would have the
FD_CLOEXEC
flag clear (the default) so that it would be inherited. Since the other sockets will continue to be serviced by the original process and should not be passed on to the new process, the flag should be set on those e.g. usingfcntl()
. After forking and execing the new process, the original process can go ahead and close the listening socket without any interruption to the service, since the new process is now listening on that socket.An alternative method, if you do not want the old server to have to fork and exec the new server itself, would be to use a Unix-domain socket to communicate between the old and new server process. A new server process could check for such a socket in a well-known location in the file system when it is starting. If present, the new server would connect to this socket and request that the old server transfer its listening socket as ancillary data using SCM_RIGHTS. An example of this is given at the end of cmsg(3).
Jean-Paul Calderone 撰写了 详细演示于2004年全面解决了您使用Twisted的问题,包括套接字迁移和其他问题。
Jean-Paul Calderone wrote a detailed presentation in 2004 on a holistic solution to your problem using Twisted, including socket migration and other issues.