检测不可靠网络上的 TCP 丢失
我正在不可靠的无线电网络(自制)上进行一些实验,使用非常基本的java套接字编程在终端节点之间来回传输消息。
设置如下:
节点 A --- 中继节点 --- 节点 B
我经常遇到的一个问题是,连接不知何故断开,节点 A 或 B 都不知道链路已断开,但仍继续传输数据。 TCP 连接也不会超时。我添加了一条心跳消息,导致一段时间后超时,但我仍然想知道 TCP 不超时的根本原因是什么。
以下是我在设置套接字时启用的选项:
channel.socket().setKeepAlive(false);
channel.socket().setTrafficClass(0x08); // for max throughput
这种行为很奇怪,因为它与我拥有有线网络时完全不同。在有线网络上,我可以通过拔出以太网线来模拟断开连接,但是,一旦我重新插入以太网线,连接就会重新建立,并且消息开始再次传递。
在无线电网络上,连接永远不会重新建立,一旦连接悄然消失,消息就永远不会恢复。
是否还有一些其他未知的 java 实现或我可以使用的套接字设置,另外,为什么我首先会看到这种行为?
是的,在任何人说什么之前,我知道 TCP 不是不可靠网络的首选,但在这种情况下,我想确保不丢失数据包。
I am doing some experimentation over an unreliable radio network (home brewed) using very rudimentary java socket programming to transfer messages back and forth between the end nodes.
The setup is as follows:
Node A --- Relay Node --- Node B
One problem I am constantly running into is that somehow the connection drops out and neither Node A or B knows that the link is dead, and yet continues to transmit data. The TCP connection does not time out either. I have added in a heartbeat message that causes a timeout after a while, but I still would like to know what is the underlying cause of why TCP does not time out.
Here are the options I am enabling when setting up a socket:
channel.socket().setKeepAlive(false);
channel.socket().setTrafficClass(0x08); // for max throughput
This behavior is strange since it is totally different than when I have a wired network. On a wired network, I can simulate a disconnected connection by pulling out the ethernet cord, however, once I plug the cord back in, the connection becomes restablished and messages begin to be passed through once more.
On the radio network, the connection is never reestablished and once it silently dies, the messages never resume.
Is there some other unknown java implentation or setting for a socket that I can use, also, why am I seeing this behavior in the first place?
And yes, before anyone says anything, I know TCP is not the preffered choice over an unreliable network, but in this case I wanted to ensure no packet loss.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
TCP 协议被设计为安静的。 RFC 要求保活心跳频率不超过 2 小时。除非您可以控制两端的系统来更改默认的2小时心跳(有时,需要重建内核),否则您必须在自己的应用程序中添加心跳。
如果发送心跳,仍然需要等待重传超时,该超时根据RTT而变化。在高延迟网络上,超时可能非常高,但应该在几分钟内。
您会在本地网络上收到通知,因为系统可以检测链路断开状态并断开该网络上的所有连接。
顺便说一句,您需要将 Keepalive 设置为 TRUE,而不是 false。有了Keepalive,你至少可以得到缓慢的心跳。
The TCP protocol was designed to be quiet. The RFC requires keepalive heartbeat no more frequent than 2 hours. Unless you have control over the system on both ends to change the default 2 hour heartbeat (sometimes, it requires kernel rebuild), you have to add heartbeat in your own app.
If you send heartbeat, it still needs to wait till Retransmit Timeout, which varies depending on the RTT. On a high-latency network, the timeout can be very high but it should be within minutes.
You get notification on local network because the system can detect link-down status and drop all connections on that network.
BTW, you want set Keepalive to TRUE, instead of false. With Keepalive, you at least get the slow heartbeat.
在OSI 7层模型中,前两层是物理层和数据链路层。在有线以太网上运行数据链路协议的物理硬件可以检测电缆何时被拉动。您的无线硬件和相应的协议可能没有那么多。如果第 1/2 层没有发出已断开连接的信号,则 TCP 堆栈无法执行任何超时操作。
In the OSI 7-layer model, the first two layers are physical and data link. Your physical hardware running the data link protocol on wired ethernet can detect when the cable is pulled. Your wireless hardware, and corresponding protocol, probably not so much. The TCP stack can't do anything to timeout if the layer1/2 stuff isn't signaling that it is disconnected.
定义“从不”?
我希望您最终会收到发送失败的通知。您可能只是希望比实际情况更早收到通知。 TCP 堆栈将重新传输未获得 ACK 的段,并且每次尝试重新传输之前的超时时间都会在每次重新传输时加倍。根据堆栈如何确定何时重新传输,在堆栈决定连接断开之前,它可能会比您预期的时间更长,只有那时它才会通知您。
请参阅此处:http://www.ietf.org/rfc/rfc2988.txt ,此处: http://msdn.microsoft.com/en-us/library /ms819737.aspx 等。
您习惯于使用有线网络,驱动程序可以在其中通知更高级别的层连接已物理断开。如果您要配置有线网络以通过路由器进行路由,然后故意将其设置为不正确路由,那么您可能会看到类似的行为......
Define 'never'?
I expect you will be notified by a send failing eventually. You're probably just expecting to be notified sooner than you will be. The TCP stack will be retransmitting segments that it doesn't get ACKs for and the timeout before retransmission for each attempt is doubled each time it retransmits. Depending on how the stack is working out when to retransmit it's probably going to be longer than you're expecting before the stack will decide that the connection is broken and only then will it let you know.
See here: http://www.ietf.org/rfc/rfc2988.txt, here: http://msdn.microsoft.com/en-us/library/ms819737.aspx, etc.
You're used to having a wired network where the drivers can notify higher level layers that the connection has been physically broken. If you were to configure a wired network to route via a router which you then deliberately set up to not route correctly then you'd probably see similar behaviour....