如何最大程度地减少 UDP 数据包丢失
我每秒接收约 3000 个 UDP 数据包,每个数据包的大小约为 200 字节。我编写了一个 java 应用程序,它侦听这些 UDP 数据包并将数据写入文件。然后服务器以先前指定的速率发送 15000 条消息。写入文件后,它仅包含约 3500 条消息。使用wireshark,我确认我的网络接口收到了所有15000 条消息。之后,我尝试更改套接字的缓冲区大小(最初为 8496 字节):
(java.net.MulticastSocket)socket.setReceiveBufferSize(32*1024);
该更改将保存的消息数量增加到约 8000 条。我不断地将缓冲区大小增加到 1MB。此后,保存的消息数量达到~14400条。将缓冲区大小增加到更大的值不会增加保存的消息数量。我想我已经达到了允许的最大缓冲区大小。尽管如此,我仍然需要捕获我的网络接口收到的所有 15000 条消息。
任何帮助将不胜感激。提前致谢。
I am receiving ~3000 UDP packets per second, each of them having a size of ~200bytes. I wrote a java application which listens to those UDP packets and just writes the data to a file. Then the server sends 15000 messages with previously specified rate. After writing to the file it contains only ~3500 messages. Using wireshark I confirmed that all 15000 messages were received by my network interface. After that I tried changing the buffer size of the socket (which was initially 8496bytes):
(java.net.MulticastSocket)socket.setReceiveBufferSize(32*1024);
That change increased the number of messages saved to ~8000. I kept increasing the buffer size up to 1MB. After that, number of messages saved reached ~14400. Increasing buffer size to larger values wouldn't increase the number of messages saved. I think I have reached the maximum allowed buffer size. Still, I need to capture all 15000 messages which were received by my network interface.
Any help would be appreciated. Thanks in advance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
闻起来像一个错误,很可能在您的代码中。如果 UDP 数据包通过网络传递,它们将在本地排队等待传递,正如您在 Wireshark 中看到的那样。也许您的程序在从套接字读取数据时没有及时取得进展 - 是否有专门的线程来执行此任务?
通过检测程序丢失了哪些数据包,您也许能够取得一些进展。如果所有丢失的数据包都是较早的数据包,则可能在程序等待接收数据之前就已经发送了数据。如果他们都晚了,也许退出得太早了。如果它们是定期的,那么您的代码可能会出现一些问题,导致循环接收数据包。等等。
无论如何,您似乎对丢失的数据包异常焦虑。从设计上来说,UDP 并不是一种可靠的传输方式。如果这些多播数据包的丢失是您的系统的一个问题(而不仅仅是您出于性能原因想要解决的一个谜),那么系统设计就是错误的。
Smells like a bug, most likely in your code. If the UDP packets are delivered over the network, they will be queued for delivery locally, as you've seen in Wireshark. Perhaps your program just isn't making timely progress on reading from its socket - is there a dedicated thread for this task?
You might be able to make some headway by detecting which packets are being lost by your program. If all the packets lost are early ones, perhaps the data is being sent before the program is waiting to receive them. If they're all later, perhaps it exits too soon. If they are at regular intervals there may be some trouble in your code which loops receiving packets. etc.
In any case you seem exceptionally anxious about lost packets. By design UDP is not a reliable transport. If the loss of these multicast packets is a problem for your system (rather than just a mystery that you'd like to solve for performance reasons) then the system design is wrong.
您遇到的问题似乎是写入文件时出现延迟。我会在写入文件(或在另一个线程中写入文件)之前将所有数据读入内存,
但是,如果无法请求再次发送数据包,则无法确保 100% 的数据包通过 UDP 接收(TCP 为你做的事情)
The problem you appear to be having is that you get delay writing to a file. I would read all the data into memory before writing to the file (or writing to a file in another thread)
However, there is no way to ensure 100% of packet are received with UDP without the ability to ask for packets to be sent again (something TCP does for you)
我看到您正在使用 UDP 发送文件内容。在 UDP 中,数据包的顺序是不确定的。如果您不担心顺序,则可以将所有数据包放入队列中,并让另一个线程处理该队列并将内容写入文件。这样,套接字读取器线程就不会因为文件操作而被阻塞。
I see that you are using UDP to send the file contents. In UDP the order of packets is not assured. If you not worried about the order, you put all the packets in a queue and have another thread process the queue and write the contents to file. By this the socket reader thread is not blocked because of file operations.
接收缓冲区大小在操作系统级别配置。
例如,在 Linux 系统上,
sysctl -w net.core.rmem_max=26214400
如本文所示https://access.redhat.com/site/documentation/en-US/JBoss_Enterprise_Web_Platform/5/html/Administration_And_Configuration_Guide/jgroups-perf-udpbuffer.html
The receive buffer size is configured at OS level.
For example on Linux system,
sysctl -w net.core.rmem_max=26214400
as in this articlehttps://access.redhat.com/site/documentation/en-US/JBoss_Enterprise_Web_Platform/5/html/Administration_And_Configuration_Guide/jgroups-perf-udpbuffer.html
这只是 Windows 的答案,但网络控制器卡属性中的以下更改对我们的用例的数据包丢失造成了显着差异。
我们消耗了大约 200 Mbps 的 UDP 数据,并且在中等服务器负载下遇到了大量数据包丢失。
使用的网卡是 Asus ROG Aerion 10G 卡,但我希望大多数高端网络控制器卡都具有类似的属性。您可以通过设备管理器->网卡->右键单击->属性->高级选项来访问它们。
1.增加接收缓冲区数量:
默认值为512;我们可以将其增加到 1024。在我们的例子中,接受了更高的设置,但是一旦超过 1024,网卡就会被禁用。在网卡级别拥有大量可用缓冲区可以使系统对传输延迟有更大的容忍度数据从网卡缓冲区传输到套接字缓冲区,我们的应用程序最终可以在其中读取数据。
2.将中断审核率设置为“关闭”:
如果我理解正确的话,中断审核会将多个“缓冲区填充”通知(通过中断)合并为一个通知。因此,CPU 会减少中断频率,并在每次中断期间获取多个缓冲区。这减少了 CPU 使用率,但增加了就绪缓冲区在获取之前被覆盖的机会,以防中断服务延迟。
此外,我们增加了套接字缓冲区大小(正如OP已经做的那样),并在套接字级别启用了循环缓冲,正如Len Holgate在评论中建议的那样,这应该还提高了对处理套接字缓冲区的延迟的容忍度。
This is a Windows only answer, but the following changes in the Network Controller Card properties made a DRAMATIC difference in packet loss for our use-case.
We are consuming around 200 Mbps of UDP data and were experiencing substantial packet loss under moderate server load.
The network card in use is an Asus ROG Aerion 10G card, but I would expect most high-end network controller cards to expose similar properties. You can access them via Device Manager->Network card->Right-Click->Properties->Advanced Options.
1. Increase number of Receive Buffers:
Default value was 512; we could increase it up to 1024. In our case, higher settings were accepted, but the network card becomes disabled once we exceed 1024. Having a larger number of available buffers at the network-card level gives the system more tolerance to latency in transferring data from the network card buffers to the socket buffers where our apps finally can read the data.
2. Set Interrupt Moderation Rate to 'Off':
If I understood correctly, interrupt moderation coalesces multiple "buffer fill" notifications (via interrupts) into a single notification. So, the CPU will be interrupted less-often and fetch multiple buffers during each interrupt. This reduces CPU usage, but increases the chance a ready buffer is overwritten before being fetched, in case the interrupt is serviced late.
Additionally, we increased the socket buffer size (as the OP already did) and also enabled Circular Buffering at the socket level, as suggested by Len Holgate in a comment, this should also increase tolerance to latency in processing the socket buffers.