如何在不丢失任何请求的情况下升级Nginx?
根据 Nginx 文档:
如果需要替换 nginx 二进制文件 使用新的(升级到 新版本或添加/删除服务器 模块),你可以在没有任何 服务停机 - 没有传入 请求将会丢失。
我和我的同事试图弄清楚:这是如何工作的?。我们知道(我们认为):
- 一次只能有一个进程在侦听端口 80
- Nginx 创建一个套接字并将其连接到端口 80
- 父进程及其任何子进程都可以绑定到同一个套接字,这就是 Nginx 的方式可以有多个子进程响应请求
我们还使用 Nginx 做了一些实验,如下所示:
- 向当前主进程发送
kill -USR2
- 重复运行
ps -ef | grep unicorn
查看任何 unicorn 进程,以及它们自己的 pid 和父进程 pid - 观察到新的 master 进程最初是旧 master 进程的子进程,但是当旧的 master 进程消失时,新的 master 进程会被删除。 master 进程的 ppid 为 1。
显然,新的 master 进程可以在旧的 master 进程都运行时监听相同的套接字,因为那时,新的 master 进程是旧的 master 进程的子进程。但不知何故,新的主进程可能会成为……嗯……没有人的孩子?
我认为这是标准的 Unix 东西,但是我对进程、端口和套接字的理解非常模糊。有人能更详细地解释一下吗?我们的假设有错误吗?有没有一本书我可以读来真正理解这些东西?
According to the Nginx documentation:
If you need to replace nginx binary
with a new one (when upgrading to a
new version or adding/removing server
modules), you can do it without any
service downtime - no incoming
requests will be lost.
My coworker and I were trying to figure out: how does that work?. We know (we think) that:
- Only one process can be listening on port 80 at a time
- Nginx creates a socket and connects it to port 80
- A parent process and any of its children can all bind to the same socket, which is how Nginx can have multiple worker children responding to requests
We also did some experiments with Nginx, like this:
- Send a
kill -USR2
to the current master process - Repeatedly run
ps -ef | grep unicorn
to see any unicorn processes, with their own pids and their parent pids - Observe that the new master process is, at first, a child of the old master process, but when the old master process is gone, the new master process has a ppid of 1.
So apparently the new master process can listen to the same socket as the old one while they're both running, because at that time, the new master is a child of the old master. But somehow the new master process can then become... um... nobody's child?
I assume this is standard Unix stuff, but my understanding of processes and ports and sockets is pretty darn fuzzy. Can anybody explain this in better detail? Are any of our assumptions wrong? And is there a book I can read to really grok this stuff?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
有关具体信息: http://www.csc.villanova.edu/~mdamian/ Sockets/TcpSockets.htm 描述 TCP 套接字的 C 库。
我认为关键是,在进程分叉并持有套接字文件描述符后,父进程和子进程都可以对其调用accept()。
这就是流程。 Nginx,正常启动:
然后 Nginx 分叉。父进程照常运行,但子进程立即执行新的二进制文件。 exec() 清除旧程序、内存和正在运行的线程,但继承打开的文件描述符:请参阅 http ://linux.die.net/man/2/execve。我怀疑 exec() 调用将打开的文件描述符的编号作为命令行参数传递。
作为升级的一部分启动的子进程:
For specifics: http://www.csc.villanova.edu/~mdamian/Sockets/TcpSockets.htm describes the C library for TCP sockets.
I think the key is that after a process forks while holding a socket file descriptor, the parent and child are both able to call accept() on it.
So here's the flow. Nginx, started normally:
Then Nginx forks. The parent keeps running as usual, but the child immediately execs the new binary. exec() wipes out the old program, memory, and running threads, but inherits open file descriptors: see http://linux.die.net/man/2/execve. I suspect the exec() call passes the number of the open file descriptor as a command line parameter.
The child, started as part of an upgrade:
我不知道 nginx 是如何做到的,但基本上,它可以只是
exec
新的二进制文件,将侦听套接字与新进程一起携带(实际上,它仍然是相同的进程,它只是替换了程序在其中执行)。侦听套接字积压了传入连接,只要启动速度足够快,它就应该能够在溢出之前开始处理它们。如果没有,它可能会先 fork,exec,并等待它启动到准备好处理传入请求的程度,然后移交侦听套接字的命令(文件描述符在 fork 时继承,两者都可以访问它)在退出之前通过某种内部机制。注意到你的观察,这看起来像它正在做的事情(如果你的父进程死亡,你的 ppid 被重新分配给 init,即 pid 1)如果它有多个进程竞争在同一个监听套接字上接受(同样,我不知道 nginx 如何是吗,也许它有一个调度过程?),然后你可以通过命令它们执行新程序来逐个替换它们,如上所述,但一次一个,以免出错。请注意,在这样的过程中,永远不会有任何新的 pid 或父/子关系发生变化。
至少,我想我可能会这么做,不假思索地这么做。
I have no idea how nginx does it, but basically, it could just
exec
the new binary, carrying the listening socket with it the new process (actually, it remains the same process, it just replaces the program executing in it). The listening socket has a backlog of incoming connections, and as long as it's fast enough to boot up, it should be able to start processing them before it overflows. If not, it could probably fork first, exec, and wait for it to boot up to the point where it's ready to process incoming requests, then hand over the command of the listening socket (file descriptors are inherited when forking, both have access to it) via some internal mechanism, before exiting. Noting your observations, this looks like what it's doing (if your parent process dies, your ppid is reassigned to init, i.e. pid 1)If it has multiple processes competing to accept on the same listening socket (again, I have no idea how nginx does it, perhaps it has a dispatching process?), then you could replace them one by one, by ordering them to exec the new program, as above, but one at a time, as to never drop the ball. Note that during such a process there would never be any new pids or parent/child relationship changes.
At least, I think that's probably how I would do it, off the top of my head.