*nix 系统中 NIC 数据包和用户应用程序之间的内存副本计数？

发布于 2024-08-29 08:22:04 字数 314 浏览 15 评论 0原文

这只是与我一直想知道的一些高性能计算相关的一般问题。某个低延迟消息传递供应商在其支持文档中谈到使用原始套接字将数据直接从网络设备传输到用户应用程序，并在这样做时谈到比无论如何都进一步减少消息传递延迟（在其他公认的情况下）深思熟虑的设计决策）。

因此，我的问题是针对那些熟悉 Unix 或类 Unix 系统上的网络堆栈的人。使用这种方法他们可能能够实现多大的差异？请随意回答内存副本、获救鲸鱼的数量或威尔士大小的区域；）

据我了解，他们的消息传递是基于 UDP 的，因此建立 TCP 连接等没有问题。我们将不胜感激地考虑这个话题！

最美好的祝愿，

迈克

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

北恋 2024-09-05 08:22:04

有一些图片http://vger.kernel.org/~davem/tcp_output.html< /a>
谷歌搜索 tcp_transmit_skb() 这是 tcp 数据路径的关键部分。他的网站上有一些更有趣的东西 http://vger.kernel.org/~davem/

在数据路径的 user - tcp transmit 部分中，存在从用户到 skb 的 1 个副本，其中 skb_copy_to_page （当通过tcp_sendmsg()发送）并使用do_tcp_sendpages0 copy（由tcp_sendpage(调用）））。需要复制来保留数据备份，以防未交付段的情况。内核中的 skb 缓冲区可以被克隆，但它们的数据将保留在第一个（原始）skb 中。 Sendpage 可以从其他内核部分获取页面并保留它进行备份（我认为有类似 COW 的东西）

调用路径（从 lxr 手动）。发送tcp_push_one/__tcp_push_pending_frames

tcp_sendmsg() <-  sock_sendmsg <- sock_readv_writev <- sock_writev <- do_readv_writev

tcp_sendpage() <- file_send_actor <- do_sendfile

接收tcp_recv_skb()

tcp_recvmsg() <-  sock_recvmsg <- sock_readv_writev <- sock_readv <- do_readv_writev

tcp_read_sock() <- ... spliceread for new kernels.. smth sendfile for older

在接收中可以有1个副本从内核到用户skb_copy_datagram_iovec（从tcp_recvmsg调用）。对于 tcp_read_sock() 来说，可以有副本。它将调用sk_read_actor回调函数。如果它对应于文件或内存，它可能（也可能不）从 DMA 区域复制数据。如果是其他网络，它有一个接收到的数据包的 skb，并且可以就地重用其数据。

对于 udp - 接收 = 1 个副本 - 从 udp_recvmsg 调用 skb_copy_datagram_iovec。传输 = 1 个副本 -- udp_sendmsg -> ip_append_data -> ip_append_data -> getfrag（似乎是 ip_generic_getfrag，有 1 个来自用户的副本，但可能是一个没有页面复制的 smth sendpage/splicelike。）

一般来说，从用户空间发送/接收到用户空间时必须至少有 1 个副本，而使用零副本时必须有 0 个副本（惊讶！）带有数据的内核空间源/目标缓冲区。所有标头均在不移动数据包的情况下添加，启用 DMA 的（所有现代）网卡将从启用 DMA 的地址空间中的任何位置获取数据。对于古老的卡，需要PIO，因此会多一份副本，从内核空间到PCI/ISA/smthelse I/O寄存器/内存。

UPD：在从NIC（但这与NIC相关，我检查了8139too）到tcp堆栈的路径中，还有另一个副本：从rx_ring到skb，接收相同：从skb到tx缓冲区<强>+1副本。您必须填写 ip 和 tcp 标头，但是 skb 是否包含它们或它们的位置？

There are some pictures http://vger.kernel.org/~davem/tcp_output.html
Googled with tcp_transmit_skb() which is a key part of tcp datapath. There are some more interesting thing on his site http://vger.kernel.org/~davem/

In user - tcp transmit part of datapath there is 1 copy from user to skb with skb_copy_to_page (when sending by tcp_sendmsg()) and 0 copy with do_tcp_sendpages (called by tcp_sendpage()). Copy is needed to keep a backup of data for case of undelivered segment. skb buffers in kernel can be cloned, but their data will stay in first (original) skb. Sendpage can take a page from other kernel part and keep it for backup (i think there is smth like COW)

Call paths (manually from lxr). Sending tcp_push_one/__tcp_push_pending_frames

tcp_sendmsg() <-  sock_sendmsg <- sock_readv_writev <- sock_writev <- do_readv_writev

tcp_sendpage() <- file_send_actor <- do_sendfile

Receive tcp_recv_skb()

tcp_recvmsg() <-  sock_recvmsg <- sock_readv_writev <- sock_readv <- do_readv_writev

tcp_read_sock() <- ... spliceread for new kernels.. smth sendfile for older

In receive there can be 1 copy from kernel to user skb_copy_datagram_iovec (called from tcp_recvmsg). And for tcp_read_sock() there can be copy. It will call sk_read_actor callback function. If it correspond to file or memory, it may (or may not) copy data from DMA zone. If it is a other network, it has an skb of received packet and can reuse its data inplace.

For udp - receive = 1 copy -- skb_copy_datagram_iovec called from udp_recvmsg. transmit = 1 copy -- udp_sendmsg -> ip_append_data -> getfrag (seems to be ip_generic_getfrag with 1 copy from user, but may be a smth sendpage/splicelike without page copiing.)

Generically speaking there is must be at least 1 copy when sending from/receiving to userspace and 0 copy when using zero-copy (surprise!) with kernel-space source/target buffers for data. All headers are added without moving a packet, DMA-enabled (all modern) network card will take data from any place in DMA-enabled address space. For ancient cards PIO is needed, so there will be one more copy, from kernel space to PCI/ISA/smthelse I/O registers/memory.

UPD: In path from NIC (but this is nic-dependent, i checked 8139too) to tcp stack there is one more copy: from rx_ring to skb and the same for receive: from skb to tx buffer +1copy. You must to fill in ip and tcp header, but does skb contain them or place for them?

回复收藏 0 原文