捕获数据包后会发生什么？

发布于 2024-08-27 05:41:38 字数 427 浏览 16 评论 0原文

我一直在阅读关于网卡捕获数据包后会发生什么的内容，我读得越多，我就越困惑。

首先，我读过传统上，在网卡捕获数据包后，它会被复制到内核空间中的一个内存块，然后复制到用户空间，供随后处理数据包数据的任何应用程序使用。然后我读到了 DMA，其中 NIC 直接将数据包复制到内存中，绕过 CPU。网卡也是如此->内核内存->用户空间内存流仍然有效吗？另外，大多数NIC（例如Myricom）是否使用DMA来提高数据包捕获率？

其次，RSS（接收端缩放）在 Windows 和 Linux 系统中的工作方式是否相似？我只能在 MSDN 文章中找到关于 RSS 如何工作的详细解释，其中讨论了 RSS（和 MSI-X）如何在 Windows Server 2008 上工作。但是 RSS 和 MSI-X 的相同概念应该仍然适用于 Linux 系统，对吧？

谢谢。

问候，雷恩

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

过潦 2024-09-03 05:41:38

这个过程如何进行主要取决于驱动程序作者和硬件，但对于我查看或编写的驱动程序以及我使用过的硬件，通常是这样的工作方式：

在驱动程序初始化时，它将分配一定数量的缓冲区并将其提供给 NIC。
当 NIC 接收到数据包时，它会从缓冲区列表中取出下一个地址，将数据直接通过 DMA 传输到其中，并通过中断通知驱动程序。
驱动程序获得中断，并且可以将缓冲区移交给内核，或者分配一个新的内核缓冲区并复制数据。 “零拷贝网络”是前者，显然需要操作系统的支持。（更多内容见下文）
驱动程序需要分配一个新的缓冲区（在零拷贝情况下），或者它将重新使用该缓冲区。无论哪种情况，缓冲区都会返回给 NIC 以供将来的数据包使用。

内核中的零拷贝网络并没有那么糟糕。一直到用户空间的零复制要困难得多。用户层获取数据，但网络数据包由标头和数据组成。至少，真正的零复制一直到用户态都需要 NIC 的支持，以便它可以将 DMA 数据包传输到单独的标头/数据缓冲区中。一旦内核将数据包路由到其目的地并验证校验和（对于 TCP，如果 NIC 支持则在硬件中，如果不支持则在软件中），标头将被回收；请注意，如果内核必须自己计算校验和，它会也可以复制数据：查看数据会导致缓存未命中，并且通过调整代码可以免费将其复制到其他地方）。

即使假设所有星星都对齐，当系统接收到数据时，数据实际上并不在用户缓冲区中。在应用程序请求数据之前，内核不知道数据最终会去哪里。考虑像 Apache 这样的多进程守护进程的情况。有许多子进程，它们都在同一个套接字上侦听。您还可以建立连接，fork()，并且两个进程都能够recv()传入数据。

互联网上的 TCP 数据包通常为 1460 字节的有效负载（1500 的 MTU = 20 字节 IP 标头 + 20 字节 TCP 标头 + 1460 字节数据）。 1460 不是 2 的幂，并且与您发现的任何系统上的页面大小都不匹配。这给数据流的重组带来了问题。请记住，TCP 是面向流的。发送方写入之间没有区别，在接收端等待的两个 1000 字节写入将完全消耗在 2000 字节读取中。

更进一步，考虑用户缓冲区。这些由应用程序分配。为了始终用于零复制，缓冲区需要进行页面对齐，并且不与其他任何内容共享该内存页面。在 recv() 时，理论上，内核可以将旧页面重新映射到包含数据的页面，并将其“翻转”到位，但这会因上述重组问题而变得复杂，因为连续的数据包将打开单独的页面。内核可以限制它交还给每个数据包有效负载的数据，但这将意味着大量额外的系统调用、页面重新映射以及可能较低的总体吞吐量。

我实际上只是触及了这个主题的表面。 2000 年代初期，我在几家公司工作过，试图将零拷贝概念扩展到用户空间。我们甚至在用户层实现了 TCP 堆栈，并为使用该堆栈的应用程序完全规避了内核，但这带来了一系列问题，而且从来都不是生产质量问题。这是一个非常难解决的问题。

How this process plays out is mostly up to the driver author and the hardware, but for the drivers I've looked at or written and the hardware I've worked with, this is usually the way it works:

At driver initialization, it will allocate some number of buffers and give these to the NIC.
When a packet is received by the NIC, it pulls the next address off its list of buffers, DMAs the data directly into it, and notifies the driver via an interrupt.
The driver gets the interrupt, and can either turn the buffer over to the kernel or it will allocate a new kernel buffer and copy the data. "Zero copy networking" is the former and obviously requires support from the operating system. (more below on this)
The driver needs to either allocate a new buffer (in the zero-copy case) or it will re-use the buffer. In either case, the buffer is given back to the NIC for future packets.

Zero-copy networking within the kernel isn't so bad. Zero-copy all the way down to userland is much harder. Userland gets data, but network packets are made up of both header and data. At the least, true zero-copy all the way to userland requires support from your NIC so that it can DMA packets into separate header/data buffers. The headers are recycled once the kernel routes the packet to its destination and verifies the checksum (for TCP, either in hardware if the NIC supports it or in software if not; note that if the kernel has to compute the checksum itself, it'd may as well copy the data, too: looking at the data incurs cache misses and copying it elsewhere can be for free with tuned code).

Even assuming all the stars align, the data isn't actually in your user buffer when it is received by the system. Until an application asks for the data, the kernel doesn't know where it will end up. Consider the case of a multi-process daemon like Apache. There are many child processes, all listening on the same socket. You can also establish a connection, fork(), and both processes are able to recv() incoming data.

TCP packets on the Internet are usually 1460 bytes of payload (MTU of 1500 = 20 byte IP header + 20 byte TCP header + 1460 bytes data). 1460 is not a power of 2 and won't match a page size on any system you'll find. This presents problems for reassembly of the data stream. Remember that TCP is stream-oriented. There is no distinction between sender writes, and two 1000 byte writes waiting at the received will be consumed entirely in a 2000 byte read.

Taking this further, consider the user buffers. These are allocated by the application. In order to be used for zero-copy all the way down, the buffer needs to be page-aligned and not share that memory page with anything else. At recv() time, the kernel could theoretically remap the old page with the one containing the data and "flip" it into place, but this is complicated by the reassembly issue above since successive packets will be on separate pages. The kernel could limit the data it hands back to each packet's payload, but this will mean a lot of additional system calls, page remapping and likely lower throughput overall.

I'm really only scratching the surface on this topic. I worked at a couple of companies in the early 2000s trying to extend the zero-copy concepts down into userland. We even implemented a TCP stack in userland and circumvented the kernel entirely for applications using the stack, but that brought its own set of problems and was never production quality. It's a very hard problem to solve.

回复收藏 0 原文