是“epoll”吗？ Tornadoweb（或Nginx）如此快的根本原因是什么？

发布于 2024-08-27 10:11:59 字数 245 浏览 19 评论 0 原文

Tornadoweb 和 Nginx 是目前流行的 Web 服务器，许多基准测试表明它们在某些情况下比 Apache 具有更好的性能。所以我的问题是：

“epoll”是它们如此快的最根本原因吗？如果我想编写一个好的套接字服务器，我可以从中学到什么？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

秋意浓 2024-09-03 10:11:59

如果您想编写一个套接字服务器，那么几年前 Dan Kegel 的 C10k 文章是一个很好的起点：

http://www.kegel.com/c10k.html

我还发现 Beej 的网络编程指南非常方便：

http://beej.us/guide/bgnet/

最后，如果您需要很好的参考资料，可以阅读 W. Richard Stevens 等人编写的《UNIX 网络编程》。等：

http://www.amazon.com/Unix-Network-Programming-Sockets- Networking/dp/0131411551/ref=dp_ob_title_bk

无论如何，为了回答你的问题，Apache 和 Nginx 之间的主要区别是 Apache 每个客户端使用一个线程，具有阻塞 I/O，而 Nginx 是单线程，非阻塞 I/O。 Apache的工作池确实减少了启动和销毁进程的开销，但在服务多个客户端时，它仍然使CPU在多个线程之间切换。另一方面，Nginx 在一个线程中处理所有请求。当一个请求需要发出网络请求（例如，向后端）时，Nginx 会将回调附加到后端请求，然后处理另一个活动的客户端请求。实际上，这意味着它返回到事件循环（epoll、kqueue 或 select）并请求有要报告内容的文件描述符。请注意，主事件循环中的系统调用实际上是一个阻塞操作，因为在文件描述符之一准备好读取或写入之前没有任何操作。

这就是 Nginx 和 Tornado 能够高效地为许多并发客户端提供服务的主要原因：只有一个进程（从而节省 RAM）和一个线程（从而节省上下文切换中的 CPU）。至于epoll，它只是select的一个更高效的版本。如果有 N 个打开的文件描述符（套接字），它可以让您在 O(1) 而不是 O(N) 时间内挑选出准备读取的文件描述符。事实上，如果你使用 --with-select_module 选项编译 Nginx，Nginx 可以使用 select 而不是 epoll，而且我敢打赌它仍然比 Apache 更高效。我对 Apache 的内部结构不太熟悉，但是快速 grep 显示它确实使用了 select 和 epoll —— 可能是当服务器正在侦听多个端口/接口时，或者如果它为单个客户端同时发出后端请求时。

顺便说一句，我开始尝试编写一个基本的套接字服务器，并想弄清楚 Nginx 为何如此高效。在仔细研究 Nginx 源代码并阅读上面链接的那些指南/书籍之后，我发现编写 Nginx 模块而不是我自己的服务器会更容易。因此，现在半传奇的 Emiller's Guide to Nginx Module Development 诞生了：

http://www .evamiller.org/nginx-modules-guide.html

（警告：该指南是针对 Nginx 0.5-0.6 编写的，API 可能已更改。）如果您使用 HTTP 做任何事情，我会说给 Nginx一个镜头，因为它解决了与愚蠢客户打交道的所有棘手细节。例如，我为了好玩而编写的小型套接字服务器对所有客户端都运行良好——除了 Safari，但我一直不明白为什么。即使对于其他协议，Nginx 也可能是正确的选择；事件很好地从协议中抽象出来，这就是它可以代理 HTTP 和 IMAP 的原因。 Nginx 代码库组织得非常好，并且编写得非常好，但有一个例外值得一提。当谈到手动协议解析器时，我不会跟随它的领导；相反，使用解析器生成器。我在这里写了一些关于在 Nginx 中使用解析器生成器 (Ragel) 的内容：

http://www.evamiller.org/nginx-modules-guide-advanced.html#parsing

所有这些可能比您想要的信息更多，但希望您会发现其中一些有用。

If you're looking to write a socket server, a good starting point is Dan Kegel's C10k article from a few years back:

http://www.kegel.com/c10k.html

I also found Beej's Guide to Network Programming to be pretty handy:

http://beej.us/guide/bgnet/

Finally, if you need a great reference, there's UNIX Network Programming by W. Richard Stevens et. al.:

http://www.amazon.com/Unix-Network-Programming-Sockets-Networking/dp/0131411551/ref=dp_ob_title_bk

Anyway, to answer your question, the main difference between Apache and Nginx is that Apache uses one thread per client with blocking I/O, whereas Nginx is single-threaded with non-blocking I/O. Apache's worker pool does reduce the overhead of starting and destorying processes, but it still makes the CPU switch between several threads when serving multiple clients. Nginx, on the other hand, handles all requests in one thread. When one request needs to make a network request (say, to a backend), Nginx attaches a callback to the backend request and then works on another active client request. In practice, this means it returns to the event loop (epoll, kqueue, or select) and asks for file descriptors that have something to report. Note that the system call in main event loop is actually a blocking operation, because there's nothing to do until one of the file descriptors is ready for reading or writing.

So that's the main reason Nginx and Tornado are efficient at serving many simultaneous clients: there's only ever one process (thus saving RAM) and only one thread (thus saving CPU from context switches). As for epoll, it's just a more efficient version of select. If there are N open file descriptors (sockets), it lets you pick out the ones ready for reading in O(1) instead of O(N) time. In fact, Nginx can use select instead of epoll if you compile it with the --with-select_module option, and I bet it will still be more efficient than Apache. I'm not as familiar with Apache internals, but a quick grep shows it does use select and epoll -- probably when the server is listening to multiple ports/interfaces, or if it does simultaneous backend requests for a single client.

Incidentally, I got started with this stuff trying to write a basic socket server and wanted to figure out how Nginx was so freaking efficient. After poring through the Nginx source code and reading those guides/books I linked to above, I discovered it'd be easier to write Nginx modules instead of my own server. Thus was born the now-semi-legendary Emiller's Guide to Nginx Module Development:

http://www.evanmiller.org/nginx-modules-guide.html

(Warning: the Guide was written against Nginx 0.5-0.6 and APIs may have changed.) If you're doing anything with HTTP, I'd say give Nginx a shot because it's worked out all the hairy details of dealing with stupid clients. For example, the small socket server that I wrote for fun worked great with all clients -- except Safari, and I never figured out why. Even for other protocols, Nginx might be the right way to go; the eventing is pretty well abstracted from the protocols, which is why it can proxy HTTP as well as IMAP. The Nginx code base is extremely well-organized and very well-written, with one exception that bears mentioning. I wouldn't follow its lead when it comes to hand-rolling a protocol parser; instead, use a parser generator. I've written some stuff about using a parser generator (Ragel) with Nginx here:

http://www.evanmiller.org/nginx-modules-guide-advanced.html#parsing

All of this was probably more information than you wanted, but hopefully you'll find some of it useful.

回复收藏 0 原文