刷新内核的 TCP 缓冲区以获取“MSG_MORE”标记的数据包

发布于 2024-08-27 11:55:11 字数 1692 浏览 7 评论 0原文

send() 的手册页显示了 MSG_MORE 标志，该标志被断言其行为类似于 TCP_CORK。我有一个围绕 send() 的包装函数：

int SocketConnection_Write(SocketConnection *this, void *buf, int len) {
    errno = 0;

    int sent = send(this->fd, buf, len, MSG_NOSIGNAL);

    if (errno == EPIPE || errno == ENOTCONN) {
        throw(exc, &SocketConnection_NotConnectedException);
    } else if (errno == ECONNRESET) {
        throw(exc, &SocketConnection_ConnectionResetException);
    } else if (sent != len) {
        throw(exc, &SocketConnection_LengthMismatchException);
    }

    return sent;
}

假设我想使用内核缓冲区，我可以使用 TCP_CORK，在需要时启用，然后禁用它来刷新缓冲区。但另一方面，这就需要额外的系统调用。因此，使用 MSG_MORE 对我来说似乎更合适。我只需将上面的 send() 行更改为：

int sent = send(this->fd, buf, len, MSG_NOSIGNAL | MSG_MORE);

根据 lwm.net，如果数据包足够大，将自动刷新：

如果应用程序将该选项设置为一个socket，内核不会发送出去短数据包。相反，它会等待直到出现足够的数据来填充最大大小的数据包，然后发送它。当 TCP_CORK 关闭时，任何剩余数据将在电线。

但本节仅涉及TCP_CORK。现在，刷新 MSG_MORE 数据包的正确方法是什么？

我只能想到两种可能性：

带有空的缓冲区的呼叫send（），没有msg_more被
重新设置为此页

不幸的是，整个主题的记录非常少，我无法'在互联网上找不到太多。

我还想知道如何检查一切是否按预期进行？显然，通过 strace 运行服务器不是一个选择。那么最简单的方法是使用 netcat 然后查看它的 strace 输出？或者内核会以不同的方式处理通过环回接口传输的流量吗？

原文

send()'s man page reveals the MSG_MORE flag which is asserted to act like TCP_CORK. I have a wrapper function around send():

int SocketConnection_Write(SocketConnection *this, void *buf, int len) {
    errno = 0;

    int sent = send(this->fd, buf, len, MSG_NOSIGNAL);

    if (errno == EPIPE || errno == ENOTCONN) {
        throw(exc, &SocketConnection_NotConnectedException);
    } else if (errno == ECONNRESET) {
        throw(exc, &SocketConnection_ConnectionResetException);
    } else if (sent != len) {
        throw(exc, &SocketConnection_LengthMismatchException);
    }

    return sent;
}

Assuming I want to use the kernel buffer, I could go with TCP_CORK, enable whenever it is necessary and then disable it to flush the buffer. But on the other hand, thereby the need for an additional system call arises. Thus, the usage of MSG_MORE seems more appropriate to me. I'd simply change the above send() line to:

int sent = send(this->fd, buf, len, MSG_NOSIGNAL | MSG_MORE);

According to lwm.net, packets will be flushed automatically if they are large enough:

If an application sets that option on
a socket, the kernel will not send out
short packets. Instead, it will wait
until enough data has shown up to fill
a maximum-size packet, then send it.
When TCP_CORK is turned off, any
remaining data will go out on the
wire.

But this section only refers to TCP_CORK. Now, what is the proper way to flush MSG_MORE packets?

I can only think of two possibilities:

Call send() with an empty buffer and without MSG_MORE being set
Re-apply the TCP_CORK option as described on this page

Unfortunately the whole topic is very poorly documented and I couldn't find much on the Internet.

I am also wondering how to check that everything works as expected? Obviously running the server through strace is not an option. So the simplest way would be to use netcat and then look at its strace output? Or will the kernel handle traffic transmitted over a loopback interface differently?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

嗼ふ静 2024-09-03 11:55:11

我查看了内核源代码，这两个假设似乎都是正确的。以下代码摘自 net/ipv4/tcp.c (2.6.33.1)。

static inline void tcp_push(struct sock *sk, int flags, int mss_now,
                int nonagle)
{
    struct tcp_sock *tp = tcp_sk(sk);

    if (tcp_send_head(sk)) {
        struct sk_buff *skb = tcp_write_queue_tail(sk);
        if (!(flags & MSG_MORE) || forced_push(tp))
            tcp_mark_push(tp, skb);
        tcp_mark_urg(tp, flags, skb);
        __tcp_push_pending_frames(sk, mss_now,
                      (flags & MSG_MORE) ? TCP_NAGLE_CORK : nonagle);
    }
}

因此，如果未设置该标志，挂起的帧肯定会被刷新。但这是仅当缓冲区不为空时的情况：

static ssize_t do_tcp_sendpages(struct sock *sk, struct page **pages, int poffset,
             size_t psize, int flags)
{
(...)
    ssize_t copied;
(...)
    copied = 0;

    while (psize > 0) {
(...)
        if (forced_push(tp)) {
            tcp_mark_push(tp, skb);
            __tcp_push_pending_frames(sk, mss_now, TCP_NAGLE_PUSH);
        } else if (skb == tcp_send_head(sk))
            tcp_push_one(sk, mss_now);
        continue;

wait_for_sndbuf:
        set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
wait_for_memory:
        if (copied)
            tcp_push(sk, flags & ~MSG_MORE, mss_now, TCP_NAGLE_PUSH);

        if ((err = sk_stream_wait_memory(sk, &timeo)) != 0)
            goto do_error;

        mss_now = tcp_send_mss(sk, &size_goal, flags);
    }

out:
    if (copied)
        tcp_push(sk, flags, mss_now, tp->nonagle);
    return copied;

do_error:
    if (copied)
        goto out;
out_err:
    return sk_stream_error(sk, flags, err);
}

while循环体永远不会被执行，因为psize不大于0。然后，在 out 部分，还有一次机会，tcp_push() 被调用，但因为 copied 仍然有其默认值，所以它会失败以及。

因此发送长度为 0 的数据包永远不会导致刷新。

下一个理论是重新应用 TCP_CORK。我们先看一下代码：

static int do_tcp_setsockopt(struct sock *sk, int level,
        int optname, char __user *optval, unsigned int optlen)
{

(...)

    switch (optname) {
(...)

    case TCP_NODELAY:
        if (val) {
            /* TCP_NODELAY is weaker than TCP_CORK, so that
             * this option on corked socket is remembered, but
             * it is not activated until cork is cleared.
             *
             * However, when TCP_NODELAY is set we make
             * an explicit push, which overrides even TCP_CORK
             * for currently queued segments.
             */
            tp->nonagle |= TCP_NAGLE_OFF|TCP_NAGLE_PUSH;
            tcp_push_pending_frames(sk);
        } else {
            tp->nonagle &= ~TCP_NAGLE_OFF;
        }
        break;

    case TCP_CORK:
        /* When set indicates to always queue non-full frames.
         * Later the user clears this option and we transmit
         * any pending partial frames in the queue.  This is
         * meant to be used alongside sendfile() to get properly
         * filled frames when the user (for example) must write
         * out headers with a write() call first and then use
         * sendfile to send out the data parts.
         *
         * TCP_CORK can be set together with TCP_NODELAY and it is
         * stronger than TCP_NODELAY.
         */
        if (val) {
            tp->nonagle |= TCP_NAGLE_CORK;
        } else {
            tp->nonagle &= ~TCP_NAGLE_CORK;
            if (tp->nonagle&TCP_NAGLE_OFF)
                tp->nonagle |= TCP_NAGLE_PUSH;
            tcp_push_pending_frames(sk);
        }
        break;
(...)

可以看到，flush有两种方式。您可以将 TCP_NODELAY 设置为 1 或将 TCP_CORK 设置为 0。幸运的是，两者都不会检查该标志是否已设置。因此，我最初的重新应用 TCP_CORK 标志的计划可以优化为仅禁用它，即使当前尚未设置它。

我希望这可以帮助有类似问题的人。

I have taken a look at the kernel source and both assumptions seem to be true. The following code are extracts from net/ipv4/tcp.c (2.6.33.1).

static inline void tcp_push(struct sock *sk, int flags, int mss_now,
                int nonagle)
{
    struct tcp_sock *tp = tcp_sk(sk);

    if (tcp_send_head(sk)) {
        struct sk_buff *skb = tcp_write_queue_tail(sk);
        if (!(flags & MSG_MORE) || forced_push(tp))
            tcp_mark_push(tp, skb);
        tcp_mark_urg(tp, flags, skb);
        __tcp_push_pending_frames(sk, mss_now,
                      (flags & MSG_MORE) ? TCP_NAGLE_CORK : nonagle);
    }
}

Hence, if the flag is not set, the pending frames will definitely be flushed. But this is be only the case when the buffer is not empty:

static ssize_t do_tcp_sendpages(struct sock *sk, struct page **pages, int poffset,
             size_t psize, int flags)
{
(...)
    ssize_t copied;
(...)
    copied = 0;

    while (psize > 0) {
(...)
        if (forced_push(tp)) {
            tcp_mark_push(tp, skb);
            __tcp_push_pending_frames(sk, mss_now, TCP_NAGLE_PUSH);
        } else if (skb == tcp_send_head(sk))
            tcp_push_one(sk, mss_now);
        continue;

wait_for_sndbuf:
        set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
wait_for_memory:
        if (copied)
            tcp_push(sk, flags & ~MSG_MORE, mss_now, TCP_NAGLE_PUSH);

        if ((err = sk_stream_wait_memory(sk, &timeo)) != 0)
            goto do_error;

        mss_now = tcp_send_mss(sk, &size_goal, flags);
    }

out:
    if (copied)
        tcp_push(sk, flags, mss_now, tp->nonagle);
    return copied;

do_error:
    if (copied)
        goto out;
out_err:
    return sk_stream_error(sk, flags, err);
}

The while loop's body will never be executed because psize is not greater 0. Then, in the out section, there is another chance, tcp_push() gets called but because copied still has its default value, it will fail as well.

So sending a packet with the length 0 will never result in a flush.

The next theory was to re-apply TCP_CORK. Let's take a look at the code first:

static int do_tcp_setsockopt(struct sock *sk, int level,
        int optname, char __user *optval, unsigned int optlen)
{

(...)

    switch (optname) {
(...)

    case TCP_NODELAY:
        if (val) {
            /* TCP_NODELAY is weaker than TCP_CORK, so that
             * this option on corked socket is remembered, but
             * it is not activated until cork is cleared.
             *
             * However, when TCP_NODELAY is set we make
             * an explicit push, which overrides even TCP_CORK
             * for currently queued segments.
             */
            tp->nonagle |= TCP_NAGLE_OFF|TCP_NAGLE_PUSH;
            tcp_push_pending_frames(sk);
        } else {
            tp->nonagle &= ~TCP_NAGLE_OFF;
        }
        break;

    case TCP_CORK:
        /* When set indicates to always queue non-full frames.
         * Later the user clears this option and we transmit
         * any pending partial frames in the queue.  This is
         * meant to be used alongside sendfile() to get properly
         * filled frames when the user (for example) must write
         * out headers with a write() call first and then use
         * sendfile to send out the data parts.
         *
         * TCP_CORK can be set together with TCP_NODELAY and it is
         * stronger than TCP_NODELAY.
         */
        if (val) {
            tp->nonagle |= TCP_NAGLE_CORK;
        } else {
            tp->nonagle &= ~TCP_NAGLE_CORK;
            if (tp->nonagle&TCP_NAGLE_OFF)
                tp->nonagle |= TCP_NAGLE_PUSH;
            tcp_push_pending_frames(sk);
        }
        break;
(...)

As you can see, there are two ways to flush. You can either set TCP_NODELAY to 1 or TCP_CORK to 0. Luckily, both won't check whether the flag is already set. Thus, my initial plan to re-apply the TCP_CORK flag can be optimized to just disable it, even if it's currently not set.

I hope this helps someone with similar issues.

回复收藏 0 原文

臻嫒无言 2024-09-03 11:55:11

这是大量的研究......我所能提供的就是这个经验性的帖子注释：

发送一堆设置了 MSG_MORE 的数据包，然后发送一个没有 MSG_MORE 的数据包，整个数据包都会消失。它对这样的事情是一种治疗：

  for (i=0; i<mg_live.length; i++) {
        // [...]
        if ((n = pth_send(sock, query, len, MSG_MORE | MSG_NOSIGNAL)) < len) {
           printf("error writing to socket (sent %i bytes of %i)\n", n, len);
           exit(1);
        }
     }
  }

  pth_send(sock, "END\n", 4, MSG_NOSIGNAL);

也就是说，当您一次发送所有数据包，并且有一个明确定义的结束时......并且您只使用一个套接字。

如果您尝试在上述循环中间写入另一个套接字，您可能会发现 Linux 释放了先前保存的数据包。至少这似乎是我现在遇到的麻烦。但这对您来说可能是一个简单的解决方案。

That's a lot of research... all I can offer is this empirical post note:

Sending a bunch of packet with MSG_MORE set, followed by a packet without MSG_MORE, the whole lot goes out. It works a treat for something like this:

  for (i=0; i<mg_live.length; i++) {
        // [...]
        if ((n = pth_send(sock, query, len, MSG_MORE | MSG_NOSIGNAL)) < len) {
           printf("error writing to socket (sent %i bytes of %i)\n", n, len);
           exit(1);
        }
     }
  }

  pth_send(sock, "END\n", 4, MSG_NOSIGNAL);

That is, when you're sending out all the packets at once, and have a clearly defined end... AND you are only using one socket.

If you tried writing to another socket in the middle of the above loop, you may find that Linux releases the previously held packets. At least that appears to be the trouble I'm having right now. But it might be an easy solution for you.

回复收藏 0 原文

~没有更多了~