当前位置：文江博客话题详情

epoll、poll、线程池有什么区别？

发布于 2024-10-01 05:34:35 字数 230 浏览 11 评论 0原文

有人可以解释一下epoll、poll和线程池之间的区别吗？

有什么优点/缺点？
对框架有什么建议吗？
对于简单/基本的教程有什么建议吗？
看起来 epoll 和 poll 是 Linux 特定的...... Windows 是否有等效的替代方案？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

醉城メ夜风 2024-10-08 05:34:35

线程池并不真正属于与 poll 和 epoll 相同的类别，因此我假设您指的是线程池，如“线程池处理多个连接，每个连接一个线程”。

的优缺点

线程池
- 对于中小型并发来说相当高效，甚至可以超越其他技术。
- 利用多个核心。
- 即使某些系统（例如 Linux）原则上可以很好地调度 100,000 个线程，但其扩展能力也无法超出“数百个”。
- 简单的实现会出现“惊群”问题。
- 除了上下文切换和惊群之外，还必须考虑内存。每个线程都有一个堆栈（通常至少为一兆字节）。因此，一千个线程仅用于堆栈就需要 1 GB 的 RAM。即使未提交该内存，它仍然会在 32 位操作系统下占用大量地址空间（在 64 位操作系统下并不是真正的问题）。
- 线程可以实际上使用epoll，尽管显而易见的方法（所有线程都在epoll_wait上阻塞）没有用，因为epoll会唤醒每个线程都在等待它，所以它仍然会遇到相同的问题。
  - 最佳解决方案：单线程监听 epoll，进行输入复用，并将完整的请求交给线程池。
  - futex 是您的朋友，例如与每个线程的快进队列相结合。尽管文档记录很差且难以操作，但 futex 确实提供了所需的内容。 epoll 可能一次返回多个事件，而 futex 可以让您以精确控制的方式高效且一次唤醒 N 个阻塞线程（N理想情况下为 min(num_cpu, num_events)），并且在最好的情况下，它根本不涉及额外的系统调用/上下文切换。
  - 实施起来并不简单，需要小心。
fork（又名老式线程池）
- 对于中小型并发来说相当高效。
- 规模无法超出“几百”。
- 上下文切换的成本要高得多（不同的地址空间！）。
- 在旧系统上的扩展性明显较差，因为 fork 的成本要高得多（所有页面的深层复制）。即使在现代系统上，fork 也不是“免费的”，尽管开销主要是由写时复制机制合并的。在也被修改的大型数据集上，fork之后的大量页面错误可能会对性能产生负面影响。
- 但是，事实证明，该产品可以可靠工作 30 多年。
- 非常容易实施且坚如磐石：如果任何进程崩溃，世界不会终结。你（几乎）不会做错任何事。
- 非常容易出现“惊群效应”。
投票 / 选择
- 两种风格（BSD 与 System V）或多或少是相同的。
- 有些陈旧且缓慢，使用起来有些尴尬，但几乎没有平台不支持它们。
- 等待一组描述符“发生某些事情”
  - 允许一个线程/进程一次处理多个请求。
  - 没有多核使用。
- 每次等待时都需要将描述符列表从用户空间复制到内核空间。需要对描述符执行线性搜索。这限制了其有效性。
- 无法很好地扩展到“数千”（事实上，大多数系统上的硬限制约为 1024，或者在某些系统上低至 64）。
- 使用它是因为如果您只处理十几个描述符（不存在性能问题），或者如果您必须支持没有更好的平台，那么它是可移植的。请勿用于其他用途。
- 从概念上讲，服务器变得比分叉服务器稍微复杂一些，因为您现在需要维护许多连接和每个连接的状态机，并且必须在请求传入时在请求之间进行多路复用、组装部分请求等。一个简单的分叉服务器只知道一个套接字（好吧，两个，计算监听套接字），读取直到它有它想要的东西或直到连接半关闭，然后写入它想要的任何东西。它不担心阻塞、准备就绪或饥饿，也不担心一些不相关的数据传入，这是其他进程的问题。
epoll
- 仅限 Linux。
- 昂贵的修改与高效等待的概念：
  - 添加描述符时将有关描述符的信息复制到内核空间 (epoll_ctl)
    - 这种情况通常很少发生。
  - 等待事件时不需要将数据复制到内核空间（epoll_wait）
    - 这通常是经常发生的事情。
  - 将等待者（或者更确切地说是它的 epoll 结构）添加到描述符的等待队列中
    - 因此，描述符知道谁在听，并在适当的时候直接向服务员发出信号，而不是让服务员搜索描述符列表
    - 与轮询工作方式相反的方式
    - 就描述符数量而言，k 较小（非常快）的 O(1)，而不是 O(n)
- 与 timerfd 和 eventfd 配合得很好（计时器分辨率和准确性也令人惊叹）。
- 与 signalfd 配合良好，消除了信号的尴尬处理，使它们以非常优雅的方式成为正常控制流的一部分。
- 一个 epoll 实例可以递归地托管其他 epoll 实例
- 此编程模型所做的假设：
  - 大多数描述符大部分时间都是空闲的，很少有事情（例如“接收到数据”、“连接关闭”）实际上发生在少数描述符上。
  - 大多数时候，您不想在集合中添加/删除描述符。
  - 大多数时候，您都在等待事情发生。
- 一些小陷阱：
  - 级别触发的 epoll 会唤醒等待它的所有线程（这是“按预期工作”），因此将 epoll 与线程池一起使用的简单方法是无用的。至少对于 TCP 服务器来说，这不是什么大问题，因为无论如何都必须首先组装部分请求，因此天真的多线程实现不会以任何方式执行。
  - 无法像人们期望的那样进行文件读/写（“始终就绪”）。
  - 直到最近才可以与 AIO 一起使用，现在可以通过 eventfd 来使用，但需要一个（迄今为止）未记录的函数。
  - 如果上述假设不成立，则 epoll 可能效率较低，而 poll 的性能可能相同或更好。
  - epoll 无法实现“魔法”，即就发生的事件数量而言，它仍然必然是 O(N)。
  - 但是，epoll 与新的 recvmmsg 系统调用配合得很好，因为它一次返回多个就绪通知（可用的数量最多，最多可达您指定的数量） 最大事件）。这使得可以在繁忙的服务器上使用一个系统调用接收例如 15 个 EPOLLIN 通知，并使用第二个系统调用读取相应的 15 条消息（系统调用减少 93%！）。不幸的是，一个 recvmmsg 调用上的所有操作都引用相同的套接字，因此它对于基于 UDP 的服务最有用（对于 TCP，必须有一种 recvmmsmsg系统调用还需要每个项目的套接字描述符！）。
  - 描述符应该始终设置为非阻塞，并且即使在使用epoll时也应该检查EAGAIN，因为在某些特殊情况下epoll 报告就绪情况，随后的读取（或写入）将仍然阻塞。某些内核上的 poll/select 也是如此（尽管它可能已被修复）。
  - 如果采用幼稚的实现方式，缓慢的发送者可能会陷入饥饿。当盲目地读取直到收到通知后返回 EAGAIN 时，有可能无限期地从快速发送方读取新传入的数据，同时让慢速发送方完全挨饿（只要数据保持足够快的速度，您就可以可能很长一段时间都看不到 EAGAIN！）。以同样的方式适用于poll/select。
  - 边缘触发模式在某些情况下会出现一些怪癖和意外行为，因为文档（手册页和 TLPI）含糊不清（“可能”、“应该”、“可能”），有时会对其操作产生误导。< br>
    文档指出，等待一个 epoll 的多个线程都会收到信号。它进一步指出，通知会告诉您自上次调用 epoll_wait 以来是否发生了 IO 活动（或者自描述符打开以来，如果没有先前的调用）。
    边缘触发模式中真实的、可观察到的行为更接近于“唤醒调用了 epoll_wait 的第一个线程，表明自任何人<以来已经发生了 IO 活动。 /em> 最后一次调用 epoll_wait 或描述符上的读/写函数，此后仅再次向下一个报告就绪情况对于任何人调用描述符上的读（或写）函数之后发生的任何操作，线程调用或已阻塞在 epoll_wait 中”。这也有点道理……只是不完全符合文档的建议。
kqueue
- BSD 类似于 epoll，不同的用法，相似的效果。
- 也适用于 Mac OS X
- 据说速度更快（我从未使用过它，所以无法判断这是否属实）。
- 在单个系统调用中注册事件并返回结果集。
IO完成端口
- 适用于 Windows 的 Epoll，或者更确切地说是 epoll 的增强版。
- 与以某种方式可等待或可警报的一切无缝协作（套接字、可等待计时器、文件操作、线程、进程）
- 如果说 Microsoft 在 Windows 中做对了一件事，那就是完成端口：
  - 开箱即用，无需担心任何数量的线程
  - 没有惊群
  - 按照 LIFO 顺序逐个唤醒线程
  - 保持缓存温暖并最大限度地减少上下文切换
  - 尊重机器上处理器的数量或提供所需数量的工作人员
- 允许应用程序发布事件，这有助于实现非常简单、故障安全且高效的并行工作队列实现（在我的系统上每秒调度多达 500,000 个任务）。
- 小缺点：添加文件描述符后不易删除（必须关闭并重新打开）。

框架

libevent -- 2.0 版本还支持 Windows 下的完成端口。

ASIO -- 如果您在项目中使用 Boost，无需再犹豫：您已经可以将其用作 boost-asio 。

对于简单/基本的教程有什么建议吗？

上面列出的框架附带了大量文档。 Linux docs 和 MSDN 解释了 epoll和完成端口广泛。

使用 epoll 的迷你教程：

int my_epoll = epoll_create(0);  // argument is ignored nowadays

epoll_event e;
e.fd = some_socket_fd; // this can in fact be anything you like

epoll_ctl(my_epoll, EPOLL_CTL_ADD, some_socket_fd, &e);

...
epoll_event evt[10]; // or whatever number
for(...)
    if((num = epoll_wait(my_epoll, evt, 10, -1)) > 0)
        do_something();

IO 完成端口的迷你教程（注意使用不同的参数调用 CreateIoCompletionPort 两次）：（

HANDLE iocp = CreateIoCompletionPort(INVALID_HANDLE_VALUE, 0, 0, 0); // equals epoll_create
CreateIoCompletionPort(mySocketHandle, iocp, 0, 0); // equals epoll_ctl(EPOLL_CTL_ADD)

OVERLAPPED o;
for(...)
    if(GetQueuedCompletionStatus(iocp, &number_bytes, &key, &o, INFINITE)) // equals epoll_wait()
        do_something();

这些迷你图省略了所有类型的错误检查，希望我没有犯任何拼写错误，但它们应该用于大部分可以给你一些想法。）

编辑：
请注意，完成端口 (Windows) 在概念上与 epoll（或 kqueue）相反。正如其名称所示，它们表示“完成”，而不是“准备就绪”。也就是说，您发出一个异步请求并忘记它，直到一段时间后您被告知它已完成（成功或不太成功，并且也存在“立即完成”的特殊情况）。
使用 epoll，您会阻塞，直到收到通知“某些数据”（可能只有一个字节）已到达并且可用，或者有足够的缓冲区空间，以便您可以在不阻塞的情况下执行写入操作。只有这样，您才开始实际操作，然后希望该操作不会阻塞（与您期望的不同，对此没有严格的保证 - 因此，最好将描述符设置为非阻塞并检查 EAGAIN [EAGAIN 和用于套接字的 EWOULDBLOCK，因为哦，高兴的是，该标准允许两个不同的错误值]）。

Threadpool does not really fit into the same category as poll and epoll, so I will assume you are referring to threadpool as in "threadpool to handle many connections with one thread per connection".

Pros and cons

threadpool
- Reasonably efficient for small and medium concurrency, can even outperform other techniques.
- Makes use of multiple cores.
- Does not scale well beyond "several hundreds" even though some systems (e.g. Linux) can in principle schedule 100,000s of threads just fine.
- Naive implementation exhibits "thundering herd" problem.
- Apart from context switching and thundering herd, one must consider memory. Each thread has a stack (typically at least a megabyte). A thousand threads therefore take a gigabyte of RAM just for stack. Even if that memory is not committed, it still takes away considerable address space under a 32 bit OS (not really an issue under 64 bits).
- Threads can actually use epoll, though the obvious way (all threads block on epoll_wait) is of no use, because epoll will wake up every thread waiting on it, so it will still have the same issues.
  - Optimal solution: single thread listens on epoll, does the input multiplexing, and hands complete requests to a threadpool.
  - futex is your friend here, in combination with e.g. a fast forward queue per thread. Although badly documented and unwieldy, futex offers exactly what's needed. epoll may return several events at a time, and futex lets you efficiently and in a precisely controlled manner wake N blocked threads at a time (N being min(num_cpu, num_events) ideally), and in the best case it does not involve an extra syscall/context switch at all.
  - Not trivial to implement, takes some care.
fork (a.k.a. old fashion threadpool)
- Reasonably efficient for small and medium concurrency.
- Does not scale well beyond "few hundreds".
- Context switches are much more expensive (different address spaces!).
- Scales significantly worse on older systems where fork is much more expensive (deep copy of all pages). Even on modern systems fork is not "free", although the overhead is mostly coalesced by the copy-on-write mechanism. On large datasets which are also modified, a considerable number of page faults following fork may negatively impact performance.
- However, proven to work reliably for over 30 years.
- Ridiculously easy to implement and rock solid: If any of the processes crash, the world does not end. There is (almost) nothing you can do wrong.
- Very prone to "thundering herd".
poll / select
- Two flavours (BSD vs. System V) of more or less the same thing.
- Somewhat old and slow, somewhat awkward usage, but there is virtually no platform that does not support them.
- Waits until "something happens" on a set of descriptors
  - Allows one thread/process to handle many requests at a time.
  - No multi-core usage.
- Needs to copy list of descriptors from user to kernel space every time you wait. Needs to perform a linear search over descriptors. This limits its effectiveness.
- Does not scale well to "thousands" (in fact, hard limit around 1024 on most systems, or as low as 64 on some).
- Use it because it's portable if you only deal with a dozen descriptors anyway (no performance issues there), or if you must support platforms that don't have anything better. Don't use otherwise.
- Conceptually, a server becomes a little more complicated than a forked one, since you now need to maintain many connections and a state machine for each connection, and you must multiplex between requests as they come in, assemble partial requests, etc. A simple forked server just knows about a single socket (well, two, counting the listening socket), reads until it has what it wants or until the connection is half-closed, and then writes whatever it wants. It doesn't worry about blocking or readiness or starvation, nor about some unrelated data coming in, that's some other process's problem.
epoll
- Linux only.
- Concept of expensive modifications vs. efficient waits:
  - Copies information about descriptors to kernel space when descriptors are added (epoll_ctl)
    - This is usually something that happens rarely.
  - Does not need to copy data to kernel space when waiting for events (epoll_wait)
    - This is usually something that happens very often.
  - Adds the waiter (or rather its epoll structure) to descriptors' wait queues
    - Descriptor therefore knows who is listening and directly signals waiters when appropriate rather than waiters searching a list of descriptors
    - Opposite way of how poll works
    - O(1) with small k (very fast) in respect of the number of descriptors, instead of O(n)
- Works very well with timerfd and eventfd (stunning timer resolution and accuracy, too).
- Works nicely with signalfd, eliminating the awkward handling of signals, making them part of the normal control flow in a very elegant manner.
- An epoll instance can host other epoll instances recursively
- Assumptions made by this programming model:
  - Most descriptors are idle most of the time, few things (e.g. "data received", "connection closed") actually happen on few descriptors.
  - Most of the time, you don't want to add/remove descriptors from the set.
  - Most of the time, you're waiting on something to happen.
- Some minor pitfalls:
  - A level-triggered epoll wakes all threads waiting on it (this is "works as intended"), therefore the naive way of using epoll with a threadpool is useless. At least for a TCP server, it is no big issue since partial requests would have to be assembled first anyway, so a naive multithreaded implementation won't do either way.
  - Does not work as one would expect with file read/writes ("always ready").
  - Could not be used with AIO until recently, now possible via eventfd, but requires a (to date) undocumented function.
  - If the above assumptions are not true, epoll can be inefficient, and poll may perform equally or better.
  - epoll cannot do "magic", i.e. it is still necessarily O(N) in respect to the number of events that occur.
  - However, epoll plays well with the new recvmmsg syscall, since it returns several readiness notifications at a time (as many as are available, up to whatever you specify as maxevents). This makes it possible to receive e.g. 15 EPOLLIN notifications with one syscall on a busy server, and read the corresponding 15 messages with a second syscall (a 93% reduction in syscalls!). Unluckily, all operations on one recvmmsg invokation refer to the same socket, so it is mostly useful for UDP based services (for TCP, there would have to be a kind of recvmmsmsg syscall which also takes a socket descriptor per item!).
  - Descriptors should always be set to nonblocking and one should check for EAGAIN even when using epoll because there are exceptional situations where epoll reports readiness and a subsequent read (or write) will still block. This is also the case for poll/select on some kernels (though it has presumably been fixed).
  - With a naive implementation, starvation of slow senders is possible. When blindly reading until EAGAIN is returned upon receiving a notification, it is possible to indefinitely read new incoming data from a fast sender while completely starving a slow sender (as long as data keeps coming in fast enough, you might not see EAGAIN for quite a while!). Applies to poll/select in the same manner.
  - Edge-triggered mode has some quirks and unexpected behaviour in some situations, since the documentation (both man pages and TLPI) is vague ("probably", "should", "might") and sometimes misleading about its operation.
    The documentation states that several threads waiting on one epoll are all signalled. It further states that a notification tells you whether IO activity has happened since the last call to epoll_wait (or since the descriptor was opened, if there was no previous call).
    The true, observable behaviour in edge-triggered mode is much closer to "wakes the first thread that has called epoll_wait, signalling that IO activity has happened since anyone last called either epoll_wait or a read/write function on the descriptor, and thereafter only reports readiness again to the next thread calling or already blocked in epoll_wait, for any operations happening after anyone called a of read (or write) function on the descriptor". It kind of makes sense, too... it just isn't exactly what the documentation suggests.
kqueue
- BSD analogon to epoll, different usage, similar effect.
- Also works on Mac OS X
- Rumoured to be faster (I've never used it, so cannot tell if that is true).
- Registers events and returns a result set in a single syscall.
IO Completion ports
- Epoll for Windows, or rather epoll on steroids.
- Works seamlessly with everything that is waitable or alertable in some way (sockets, waitable timers, file operations, threads, processes)
- If Microsoft got one thing right in Windows, it is completion ports:
  - Works worry-free out of the box with any number of threads
  - No thundering herd
  - Wakes threads one by one in a LIFO order
  - Keeps caches warm and minimizes context switches
  - Respects number of processors on machine or delivers the desired number of workers
- Allows the application to post events, which lends itself to a very easy, failsafe, and efficient parallel work queue implementation (schedules upwards of 500,000 tasks per second on my system).
- Minor disadvantage: Does not easily remove file descriptors once added (must close and re-open).

Frameworks

libevent -- The 2.0 version also supports completion ports under Windows.

ASIO -- If you use Boost in your project, look no further: You already have this available as boost-asio.

Any suggestions for simple/basic tutorials?

The frameworks listed above come with extensive documentation. The Linux docs and MSDN explains epoll and completion ports extensively.

Mini-tutorial for using epoll:

int my_epoll = epoll_create(0);  // argument is ignored nowadays

epoll_event e;
e.fd = some_socket_fd; // this can in fact be anything you like

epoll_ctl(my_epoll, EPOLL_CTL_ADD, some_socket_fd, &e);

...
epoll_event evt[10]; // or whatever number
for(...)
    if((num = epoll_wait(my_epoll, evt, 10, -1)) > 0)
        do_something();

Mini-tutorial for IO completion ports (note calling CreateIoCompletionPort twice with different parameters):

HANDLE iocp = CreateIoCompletionPort(INVALID_HANDLE_VALUE, 0, 0, 0); // equals epoll_create
CreateIoCompletionPort(mySocketHandle, iocp, 0, 0); // equals epoll_ctl(EPOLL_CTL_ADD)

OVERLAPPED o;
for(...)
    if(GetQueuedCompletionStatus(iocp, &number_bytes, &key, &o, INFINITE)) // equals epoll_wait()
        do_something();

(These mini-tuts omit all kind of error checking, and hopefully I didn't make any typos, but they should for the most part be ok to give you some idea.)

EDIT:
Note that completion ports (Windows) conceptually work the other way around as epoll (or kqueue). They signal, as their name suggests, completion, not readiness. That is, you fire off an asynchronous request and forget about it until some time later you're told that it has completed (either successfully nor not so much successfully, and there is the exceptional case of "completed immediately" too).
With epoll, you block until you are notified that either "some data" (possibly as little as one byte) has arrived and is available or there is sufficient buffer space so you can do a write operation without blocking. Only then, you start the actual operation, which then will hopefully not block (other than you would expect, there is no strict guarantee for that -- it is therefore a good idea to set descriptors to nonblocking and check for EAGAIN [EAGAIN and EWOULDBLOCK for sockets, because oh joy, the standard allows for two different error values]).

回复收藏 0 原文

~没有更多了~