当前位置：文江博客话题详情

真正的“持续时间”是什么可以用插座上的so_linger设置？

发布于 2025-01-23 10:45:56 字数 99 浏览 4 评论 0 原文

人页面对该选项几乎没有解释，尽管网络上有大量信息，并且在Stackoverflow上的答案中，我发现那里提供的许多信息甚至与本身相矛盾。那么，该设置真正有益，为什么我需要设置或更改它？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

猫烠⑼条掵仅有一顆心 2025-01-30 10:45:56

当TCP插座断开连接时，系统必须考虑三件事：

该套接字的发送缓冲器中仍然可能有未输入的数据，如果插座立即关闭，该数据将丢失。

。
飞行中可能仍然有数据，也就是说，数据已经发送到另一边，但是另一侧尚未确认已经正确收到了数据，并且可能必须讨厌或丢失。
关闭TCP插座是三向握手，没有第三个数据包的确认。由于发件人不知道第三个数据包是否已经到来，因此必须等待一段时间，看看第二个数据包是否被恢复。如果这样做，第三个已经丢失并且必须怨恨。

当您使用 COLLEA（）调用 CLOSE（）呼叫关闭套接字时，系统通常不会立即破坏套接字，而是首先尝试解决上述所有三个问题，以防止数据丢失并确保清洁断开连接。所有这些都发生在后台（通常在操作系统内核中），因此，尽管 CLOSS（）立即返回调用，但插座仍可能还活着一段时间，甚至发送剩余数据。系统特定的上限绑定了系统将尝试获得干净的断开连接，然后它最终将放弃并破坏插座，即使这意味着数据丢失了。请注意，此时间限制可以在分钟的范围内！

有一个名为 so_linger 控制系统将如何关闭套接字的套接字选项。您可以使用该选项打开或关闭缠绕，如果打开，请设置超时（如果关闭，也可以设置超时，但该超时没有效果）。

默认值是关闭缠绕，这意味着 CLOSE（）立即返回，并且套接字关闭过程的详细信息将留在系统中，该系统通常会如上所述处理。

如果您打开缠绕并设置除零以外的超时，则 CLOSE（）将不会立即返回。它只有在解决问题（1）和（2）已解决（所有数据已发送，没有数据中的数据）或达到超时的情况下返回。两者都是这种情况，可以通过接近呼叫的结果看到。如果成功，所有剩余的数据都会被发送并确认，如果是失败，并且 errno 设置为 ewoldblock ，则暂停了，并且可能丢失了一些数据。

如果插座， CLOSS（）也不会阻止，即使在持续时间之外也不会阻止。在这种情况下，无法在同一套接字上两次呼叫 CLOSE（），因此无法获得关闭操作的结果。即使插座徘徊，一旦关闭返回，插座文件描述符也应已无效，并使用该描述符再次致电关闭，也应导致 errno 设置为 ebadf （“ 糟糕的文件描述符”）。

但是，即使您将持续时间设置为真正短的东西，例如一秒钟，并且插座不会持续更长的时间，但在处理问题（3）之后，它仍然会持续一段时间。为了确保干净的断开连接，实施必须确保对方也已断开连接，否则剩余的数据仍可能到达已经已经存在的连接。因此，插座将进入状态大多数系统调用 time_wait 并在该状态下以系统特定的时间保持在该状态，而不管是否缠绕，无论设置了哪个持续时间。

除一种特殊情况外：如果您启用挥之不去但将持续时间设置为零，则几乎改变了一切。在这种情况下，调用 CLOSE（）将真正关闭套接字。这意味着无论插座是阻塞还是非阻滞， collect（）一次返回。仍然丢弃发送缓冲区中的任何数据。飞行中的任何数据都被忽略，并且可能正确到达另一侧。并且插座也未使用普通的TCP闭合握手关闭（ fin-ack ），使用重置立即杀死它（ rst ）。结果，如果对方试图在重置后通过插座发送某些东西，则此操作将在 econnreset （“ ”连接被强行关闭 by peer。“），而正常的关闭将导致 epipe （“ 插座不再连接。”）。虽然大多数程序都会将 epipe 视为无害事件，但如果他们没想到会发生这种情况，它们倾向于将 econnreset 视为困难。

请注意，这描述了原始BSD插座实现中所示的套接字行为（原始意味着这甚至可能与FreeBSD，OpenBSD，NetBSD等现代BSD实现的行为不符。尽管BSD插座API已被当今几乎所有其他主要操作系统（Linux，Android，Windows，MacOS，iOS等）复制，但这些系统上的行为有时会有所不同，许多该API的其他方面。

例如，如果在BSD上关闭了带有数据缓冲区中数据的非阻滞套接字，则持续存在并且持续时间不是零，则关闭呼叫将立即返回，但会表明故障，错误将为 ewoldblock （就像在撞击后暂停后的阻塞套接字时一样）。 Windows也是如此。在MacOS上，情况并非如此， CLOSE（）将始终立即返回并表示成功，而不管发送缓冲区中的数据是否有什么。在Linux的情况下，尽管插座是非阻滞，但在这种情况下， CLOSE（）调用实际上会阻止持续超时。

要了解有关不同系统如何实际处理不同持续设置的更多信息，请查看以下链接：

https://www.nybek.com/blog/blog/2015/04/04/29/so_linger-so_linger-onnon-non-non--non--non--non--non--non--non--阻止示波器/

还有一个页面，上面有一个用于阻止插座的结果，但不幸的是，互联网档案没有捕获它，并且Orignal博客永远消失了。测试代码仍然可用，但我无法访问所有平台来重新创建测试结果：

https ：//github.com/nybek/linger-tools

如您所见，该行为可能还会根据 shutdown（）是否在 collect（）之前都调用（） 和其他系统特定方面（包括设置挥之不去的超时之类的东西）将产生效果，尽管完全关闭了。

另一个特定系统的行为是，如果您的进程死亡而不首先关闭插座会发生什么。在这种情况下，系统将代表您关闭插座，而某些系统在必须这样做时倾向于忽略任何持续设置，而只是回到系统的默认行为。无论如何，它们都无法在套接字上“阻止”套接字关闭，但是某些系统甚至会忽略零超时，并在这种情况下执行 fin-ack 。

因此，设置零的持续超时会阻止插座输入 time_wait 状态并不正确。这取决于插座的关闭方式（ shutdown（）， close（）），它已关闭（您自己的代码还是系统）正在阻止或非阻滞，最终在您的代码上运行的系统上。唯一可以做出的真实语句是：

如果您手动关闭阻塞的套接字（至少在关闭它之前，可能以前可能是非障碍物），并且此插座的超时为零，则该插座持续了，这是您的避免此套接字的最佳机会将进入 time_wait 状态。不能保证它不会，但是如果那不会阻止这种情况发生，除非您有办法确保对方的同伴会启动近距离为你;因为只有启动关闭操作的侧面可能最终以 time_wait 状态。

因此，我的个人专业提示是：如果您设计了一个严重的客户协议，请以通常首先关闭连接的方式进行设计，因为服务器插座通常最终以 time_wait 状态，但更不希望通过 rst 关闭连接，因为这可能会导致以前发送给客户的数据的数据丢失。

When a TCP socket is disconnected, there are three things the system has to consider:

There might still be unsent data in the send-buffer of that socket which would get lost if the socket is closed immediately.
There might still be data in flight, that is, data has already been sent out to the other side but the other side has not yet acknowledged to have received that data correctly and it may have to be resent or otherwise is lost.
Closing a TCP socket is a three-way handshake with no confirmation of the third packet. As the sender doesn't know if the third packet has ever arrived, it has to wait some time and see if the second one gets resend. If it does, the third one has been lost and must be resent.

When you close a socket using the close() call, the system will usually not immediately destroy the socket but will first try to resolve all the three issues above to prevent data loss and ensure a clean disconnect. All of that happens in the background (usually within the operating system kernel), so despite the close() call returning immediately, the socket may still be alive for a while and even send out remaining data. There is a system specific upper time bound how long the system will try to get a clean disconnect before it will eventually give up and destroy the socket anyway, even if that means that data is lost. Note that this time limit can be in the range of minutes!

There is a socket option named SO_LINGER that controls how the system will close a socket. You can turn lingering on or off using that option and if is turned on, set a timeout (you can set a timeout also if turned off but that timeout has no effect).

The default is that lingering is turned off, which means close() returns immediately and the details of the socket closing process are left up to the system which will usually deal with it as described above.

If you turn lingering on and set a timeout other than zero, close() will not return immediately. It will only return when issue (1) and (2) have been resolved (all data has been sent, no data is in flight anymore) or if that timeout has been hit. Which of both was the case can be seen by the result of the close call. If it is success, all remaining data got sent and acknowledged, if it is failure and errno is set to EWOULDBLOCK, the timeout has been hit and some data might have been lost.

In case of a non-blocking socket, close() will not block, not even with a linger time other than zero. In that case there is no way to get the result of the close operation as you cannot ever call close() twice on the same socket. Even if the socket is lingering, once close returned, the socket file descriptor should have been invalidated and calling close again with that descriptor should result in a failure with errno set to EBADF ("bad file descriptor").

However, even if you set linger time to something really short, like one second and the socket won't linger for longer than one second, it will still stay around for a while after lingering to deal with issue (3) above. To ensure a clean disconnect, the implementation must ensure that the other side also has disconnected that connection, otherwise remaining data may still arrive for that already dead connection. So the socket will go into a state most systems call TIME_WAIT and stay in that state for a system specific amount of time, regardless if lingering is on and regardless what linger time has been set.

Except for one special case: If you enable lingering but set the linger time to zero, this changes pretty much everything. In that case a call to close() will really close the socket immediately. That means no matter if the socket is blocking or non-blocking, close() returns at once. Any data still in the send buffer is just discarded. Any data in flight is ignored and may or may not have arrived correctly at the other side. And the socket is also not closed using a normal TCP close handshake (FIN-ACK), it is killed instantly using a reset (RST). As a result, if the other side tries to send something over the socket after the reset, this operation will fail with ECONNRESET ("A connection was forcibly closed by the peer."), whereas a normal close would result in EPIPE ("The socket is no longer connected."). While most programs will treat EPIPE as a harmless event, they tend to treat ECONNRESET as a hard error if they didn't expect that to happen.

Please note that this describes the socket behavior as found in the original BSD socket implementation (original means that this may not even match the behavior of modern BSD implementations such as FreeBSD, OpenBSD, NetBSD, etc.). While the BSD socket API has been copied by pretty much all other major operating systems today (Linux, Android, Windows, macOS, iOS, etc.), the behavior on these systems sometimes varies, as is also true with many other aspects of that API.

E.g. If a non-blocking socket with data in the send buffer is closed on BSD, linger is on and linger time is not zero, the close call will return at once but it will indicate a failure and the error will be EWOULDBLOCK (just like in case of a blocking socket after the linger timeout has been hit). Same holds true for Windows. On macOS this is not the case, close() will always return at once and indicate success, regardless of data in the send buffer or not. And in case of Linux, the close() call will actually block in that case up to the linger timeout, despite the socket being non-blocking.

To learn more about how different systems actually deal with different linger settings, have a look at the following link:

https://www.nybek.com/blog/2015/04/29/so_linger-on-non-blocking-sockets/

There was also a page with results for blocking sockets but unfortunately the Internet Archive did not capture it and the orignal blog is gone for good. The test code is still available but I don't have access to all platforms to re-create the test results:

https://github.com/nybek/linger-tools

As you can see, the behavior might also change depending on whether shutdown() has been called prior to close() and other system specific aspects, including things like setting a lingering timeout will have an effect despite lingering being turned off completely.

Another system specific behavior is what happens if your processes dies without closing a socket first. In that case the system will close the socket on your behalf and some systems tend to ignore any linger setting when they have to do so and just fall back to the system's default behavior. They cannot "block" on socket close in that case anyway but some systems will even ignore a timeout of zero and do a FIN-ACK in that case.

So it's not true that setting a linger timeout of zero will prevent sockets from ever entering the TIME_WAIT state. It depends on how the socket has been closed (shutdown(), close()), by whom it has been closed (your own code or the system), whether it was blocking or non-blocking, and ultimately, on the system your code is running on. The only true statement that can be made is:

If you manually close a socket that is blocking (at least the moment you close it, might have been non-blocking before) and this socket has lingering enabled with timeout of zero, this is your best chance to avoid that this socket will go into TIME_WAIT state. There is no guarantee it won't but if that won't prevent it from happening, there is nothing else you could do to prevent it from happening, unless you have a way to ensure that the peer on the other side will initiate the close for you; as only the side initiating the close operation may end up in a TIME_WAIT state.

So my personal pro tip is: If you design a sever-client-protocol, design it in such a way that normally the client closes the connection first because it is very undesirable that server sockets typically end up in TIME_WAIT state but it's even more undesirable that connections are closed by RST as that can lead to data loss of data previously sent to the client.