如何在现代多核/多插槽计算机上扩展 TCP 侦听器
我有一个用 C 语言编写的守护进程,需要同时处理 20-150K TCP 连接。它们是长期运行的连接,很少会断开。它们在任何给定时间传输的数据量都非常小(很少超过 MTU,甚至......这是一个激励/响应协议),但对它们的响应时间至关重要。我想知道当前的 UNIX 社区正在使用什么来获取大量套接字,并最大限度地减少它们的响应延迟。我见过围绕复用连接到分叉工作池、线程(每个连接)、静态大小的线程池的设计。有什么建议吗?
I have a daemon to write in C, that will need to handle 20-150K TCP connections simultaneously. They are long running connections, and rarely ever tear down. They have a very small amount of data (rarely exceeding MTU even.. it's a stimulus/response protocol) in transmit at any given time, but response times to them are critical. I'm wondering what the current UNIX community is using to get large amounts of sockets, and minimizing the latency on response of them. I've seen designs revolving around multiplexing connects to fork worker pools, threads (per connection), static sized thread pools. Any suggestions?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
如果性能至关重要,那么您确实需要采用多线程事件循环解决方案 - 即一个工作线程池来处理您的连接。不幸的是,没有一个抽象库可以在大多数 Unix 平台上执行此操作(请注意,libevent 与大多数事件循环库一样只是单线程的),因此您必须自己完成这些肮脏的工作。
在 Linux 上,这意味着将边缘触发的 epoll 与工作线程池一起使用(Windows 将具有 I/O 完成端口,该端口在多线程环境中也可以正常工作 - 我不确定其他 Unix 是否如此)。
顺便说一句,我已经做了一些工作,试图在 上抽象 Linux 和 Windows I/O 完成端口上的边缘触发 epoll http://nginetd.cmeerw.org(正在进行中,但可能提供一些想法)。
If performance is critical then you'll really want to go for a multithreaded event loop solution - i.e. a pool of worker threads to handle your connections. Unfortunately, there is no abstraction library to do this that works on most Unix platforms (note that libevent is only single-threaded as are most of these event-loop libraries), so you'll have to do the dirty work yourself.
On Linux that means using edge-triggered epoll with a pool of worker threads (Windows would have I/O completion ports which also works fine in a multithreaded environment - I am not sure about other Unixes).
BTW, I have done some work trying to abstract edge-triggered epoll on Linux and Windows I/O completion ports on http://nginetd.cmeerw.org (it is work in progress, but might provide some ideas).
如果您具有系统配置访问权限不要过度,并设置一些 iptables/pf/etc 来跨 n 个守护进程实例(进程)负载平衡连接,如下所示开箱即用。根据守护进程的性质,n 的阻塞程度应与系统上的核心数或更高数倍有关。这种方法看起来很粗糙,但它可以处理损坏的守护进程,甚至在必要时重新启动它们。此外,迁移会很顺利,因为您可以开始将新连接转移到另一组进程(例如,新版本或迁移到新盒子),而不是服务中断。最重要的是,您可以获得一些功能,例如源亲和性,它可以显着帮助缓存和争用有问题的会话。
如果您没有系统访问权限(或者操作员不会被打扰),您可以使用负载均衡器守护进程(有很多开源守护进程)而不是 iptables/pf/etc 并使用n 服务守护进程,如上。
此外,这种方法有助于分离端口的权限。如果外部服务需要在低端口(<1024)上提供服务,则只需要运行特权/或管理员/root 或内核的负载均衡器。)
我过去编写了几个 IP 负载均衡器,它可能非常有用生产中容易出错。您不想支持和调试它。此外,与外部代码相比,运营和管理更倾向于对您的代码进行事后猜测。
If you have system configuration access don't over-do it and set up some iptables/pf/etc to load-balance connections across n daemon instances (processes) as this will work out of the box. Depending on how blocking the nature of the daemon n should be from the number of cores on the system or several times higher. This approach looks crude but it can handle broken daemons and even restart them if necessary. Also migration would be smooth as you could start diverting new connections to another set of processes (for example, a new release or migrating to a new box) instead of service interruptions. On top of that you get several features like source affinity wich can help significantly caching and contention of problematic sessions.
If you don't have system access (or ops can't be bothered), you can use load balancer daemon (there are plenty of open source ones) instead of iptables/pf/etc and use also n service daemons, like above.
Also this approach helps with separating privileges of ports. If the external service needs to service on a low port (<1024) you only need the load balancer running privileged/or admin/root, or kernel.)
I've written several IP load balancers in the past and it can be very error-prone in production. You don't want to support and debug that. Also operations and management will tend second-guess your code more than external code.
最简单的建议是使用 libevent,它可以轻松编写一个简单的非阻止符合您要求的单线程服务器。
如果每个响应的处理需要一些时间,或者它使用一些阻塞 API(就像数据库中的几乎任何内容),那么您将需要一些线程。
一个答案是工作线程,您可以在其中生成一组线程,每个线程都侦听某个队列以进行工作。如果您愿意,它可以是单独的进程,而不是线程。主要区别在于告诉工作人员要做什么的通信机制。
另一种方法是使用多个线程,并为每个线程提供这 150K 连接的一部分。每个服务器都有自己的进程循环,并且工作原理与单线程服务器类似,但侦听端口除外,该端口将由单个线程处理。这有助于在核心之间分散负载,但如果您使用阻塞资源,它将阻塞由该特定线程处理的所有连接。
如果你小心的话,libevent 允许你使用第二种方式;但还有一个替代方案:libev。它不像 libevent 那样广为人知,但它特别支持多循环方案。
the easiest suggestion is to use libevent, it makes it easy to write a simple non-blocking single-threaded server that would comply with your requirements.
if the processing for each response takes some time, or if it uses some blocking API (like almost anything from a DB), then you'll need some threading.
One answer is the worker threads, where you spawn a set of threads, each listening on some queue to work. it can be separate processes, instead of threads, if you like. The main difference would be the communications mechanism to tell the workers what to do.
A different way to do is to use several threads, and give to each of them a portion of those 150K connections. each will have it's own process loop and work mostly like the single-threaded server, except for the listening port, which will be handled by a single thread. This helps spreading the load between cores, but if you use a blocking resource, it would block all the connections handled by this specific thread.
libevent lets you use the second way if you're careful; but there's also an alternative: libev. it's not as well known as libevent, but it specifically supports the multi-loop scheme.
我认为哈维尔的回答最有道理。如果您想测试该理论,请查看 node javascript 项目。
Node 基于 Google 的 v8 引擎,该引擎将 javascript 编译为机器代码,并且对于某些任务来说速度与 c 一样快。它也基于 libev,被设计为完全非阻塞,这意味着您不必担心线程之间的上下文切换(一切都在单个事件循环上运行)。在这方面它与 erlang 非常相似。
现在,使用 Node.js 在 JavaScript 中编写高性能服务器变得非常非常容易。您还可以稍加努力,用 c 编写自定义代码,并为节点创建绑定以调用它来进行实际处理(查看节点源代码以了解如何执行此操作 - 文档有点粗略,位于那一刻)。作为一个更丑陋的替代方案,您可以将自定义 C 代码构建为应用程序并使用 stdin/stdout 与其进行通信。
我自己测试了超过 150k 连接的节点,绝对没有任何问题(当然,如果所有这些连接要同时通信,您将需要一些强大的硬件)。 Node.js 中的 TCP 连接平均仅使用 2-3k 内存,因此理论上每 1GB RAM 可以处理 350-500k 连接。
注意 - Node.js 目前在 Windows 上不受支持,但它仅处于开发的早期阶段,我想它会在某个阶段被移植。
注 2 - 您必须确保从 Node 调用的代码不会阻塞
i think javier's answer makes the most sense. if you want to test the theory out, then check out the node javascript project.
Node is based on Google's v8 engine which compiles javascript to machine code and is as fast as c for certain tasks. It is also based on libev and is designed to be completely non-blocking, meaning you don't have to worry about context switching between threads (everything runs on a single event loop). It is very similar to erlang in that respect.
Writing high performance servers in javascript is now really, really easy with node. You could also, with a little bit of effort, write your custom code in c and create bindings for node to call into it to do your actual processing (look at the node source to see how to do this - documentation is a little sketchy at the moment). as an uglier alternative, you could build your custom c code as an application and use stdin/stdout to communicate with it.
I've tested node myself with upwards of 150k connections with absolutely no issues (of course you will need some serious hardware if all these connections are going to be communicating at once). A TCP connection in node.js on average uses only 2-3k of memory so you could theoretically handle 350-500k connections per 1GB of RAM.
Note - Node.js is not currently supported on windows, but it is only at an early stage of development and i'd imagine it will be ported at some stage.
Note 2 - you will have to ensure the code you are calling into from Node does not block
为了提高 select(2) 性能,已经开发了几个系统: kqueue 、 epoll 和
/dev/poll
。在所有这些系统中,您可以有一个等待任务的工作线程池;当完成其中一个文件句柄时,您将不会被迫一遍又一遍地设置所有文件句柄。Several systems have been developed to improve on select(2) performance: kqueue, epoll, and
/dev/poll
. In all these systems, you can have a pool of worker threads waiting for tasks; you will not be forced to setup all file handles over and over again when done with one of them.你必须从头开始吗?您可以使用类似 gearman 的东西。
do you have to start from scratch? You could use something like gearman.