异步接收能否保证连接失败的检测?

发布于 2024-10-06 11:08:22 字数 697 浏览 0 评论 0原文

据我所知,TCP 套接字上的阻塞接收并不总是通过返回 -1 值或引发连接错误来检测连接错误(由于网络故障或远程端点故障) IO异常:有时它可能会无限期地挂起。

解决此问题的一种方法是为阻塞接收设置超时。如果已知接收时间的上限,则可以将该上限设置为超时,并且当超时到期时,可以简单地认为连接丢失;当这样的上限事先未知时,例如在一个连接保持开放以接收发布的发布-订阅系统中,要设置的超时将有些任意,但其到期可能会触发 ping/pong 请求来验证连接(以及端点)仍然有效。

我想知道使用异步接收是否也可以解决检测连接失败的问题。在 boost::asio 中,我会调用 socket::asynch_read_some() 注册一个要异步调用的处理程序,而在 java.nio 中,我会将通道配置为非阻塞,并将其注册到选择器OP_READ 兴趣标志。我想正确的连接失败检测意味着,在第一种情况下,将使用非 0 error_code 调用处理程序,而在第二种情况下,选择器将选择有故障的通道,但随后的 read()通道上的 将返回 -1 或抛出 IOException

这种行为是通过异步接收来保证的,还是可能存在连接失败后的情况,例如,在 boost::asio 中,处理程序永远不会被调用,或者在 java.nio 中,选择器永远不会选择通道?

非常感谢。

From what I know, a blocking receive on a TCP socket does not always detect a connection error (due either to a network failure or to a remote-endpoint failure) by returning a -1 value or raising an IO exception: sometimes it could just hang indefinitely.

One way to manage this problem is to set a timeout for the blocking receive. In case an upper bound for the reception time is known, this bound could be set as timeout and the connection could be considered lost simply when the timeout expires; when such an upper bound is not known a priori, for example in a pub-sub system where a connection stays open to receive publications, the timeout to be set would be somewhat arbitrary but its expiration could trigger a ping/pong request to verify that the connection (and the endpoint too) is still up.

I wonder whether the use of asynchronous receive also manages the problem of detecting a connection failure. In boost::asio I would call socket::asynch_read_some() registering an handler to be asynchronously called, while in java.nio I would configure the channel as non-blocking and register it to a selector with an OP_READ interest flag. I imagine that a correct connection-failure detection would mean that, in the first case the handler would be called with a non-0 error_code, while in the second case the selector would select the faulty channel but a subsequent read() on the channel would either return -1 or throw an IOException.

Is this behaviour guaranteed with asynchronous receive, or could there be scenarios where after a connection failure, for example, in boost::asio the handler will never be called or in java.nio the selector will never select the channel?

Thank you very much.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

茶花眉 2024-10-13 11:08:22

我相信您指的是 TCP 半开连接 问题(RFC 793该术语的含义)。在这种情况下,接收操作系统永远不会收到连接丢失的指示,因此它永远不会通知应用程序。应用程序是同步读取还是异步读取不受影响。

当连接的发送端以某种方式不再意识到网络连接时,就会出现问题。例如,当

  • 传输操作系统突然终止/重新启动(断电、操作系统故障/BSOD 等)时,可能会发生这种情况。

  • 当两侧之间出现网络中断时,发送方关闭其一侧并清理其一侧:例如,传输操作系统在中断期间干净地重新启动,传输Windows操作系统从网络中拔出

当发生这种情况时,接收方可能正在等待数据或 FIN,永远不会来。除非接收方发送消息,否则它无法意识到发送方不再知道接收方。

您的解决方案(超时)是解决该问题的一种方法,但它应该包括向发送方发送消息。同样,读取是同步还是异步并不重要,只是它不会无限期地读取并等待数据或 FIN。另一种解决方案是使用某些 TCP 堆栈支持的 TCP KEEPALIVE 功能。但任何通用解决方案的困难部分通常是 确定适当的超时,因为超时高度依赖于特定应用程序的特性。

I believe you're referring to the TCP half-open connection problem (the RFC 793 meaning of the term). Under this scenario, the receiving OS will never receive indication of the lost connection, so it will never notify the app. Whether the app is readding synchronously or asynchronously doesn't enter into it.

The problem occurs when the transmitting side of the connection somehow is no longer aware of the network connection. This can happen, for example, when

  • the transmitting OS abruptly terminates/restarts (power outage, OS failure/BSOD, etc.).

  • the transmitting side closes its side while there is a network disruption between the two sides and cleans up its side: e.g transmitting OS reboots cleanly during disruption, transmitting Windows OS is unplugged from the network

When this happens, the receiving side may be waiting for data or a FIN that will never come. Unless the receiving side sends a message, there's no way for it to realize the transmitting side is no longer aware of the receiving side.

Your solution (a timeout) is one way to address the issue, but it should include sending a message to the transmitting side. Again, it doesn't matter the read is synchronous or asynchronous, just that it doesn't read and wait indefinitely for data or a FIN. Another solution is using a TCP KEEPALIVE feature that is supported by some TCP stacks. But the hard part of any generalized solution is usually determining a proper timeout, since the timeout is highly dependent on characteristics of the specific application.

半衬遮猫 2024-10-13 11:08:22

由于 TCP 的工作原理,您通常必须发送数据才能注意到硬连接失败,并发现不会返回任何 ACK 数据包。某些协议尝试通过定期使用 keep-alive 或 ping 数据包来识别此类情况:如果一侧在 X 时间内没有收到此类数据包(可能是在自己尝试失败之后),则可以认为连接已死亡。

为了回答您的问题,除了阻塞本身的行为之外,阻塞和非阻塞接收应该执行相同的操作,因此两者都会遇到同样的问题。为了确保您可以检测到来自远程主机的静默故障,您必须使用我所描述的一种保持活动的形式。

Because of how TCP works, you will typically have to send data in order to notice a hard connection failure, to find out that no ACK packet will ever be returned. Some protocols attempt to identify conditions like this by periodically using a keep-alive or ping packet: if one side does not receive such a packet in X time (and perhaps after trying and failing one itself), it can consider the connection dead.

To answer your question, blocking and non-blocking receive should perform identically except for the act of blocking itself, so both will suffer from this same issue. In order to make sure that you can detect a silent failure from the remote host, you'll have to use a form of keep-alive like I described.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文