当前位置：文江博客话题详情

如何监控Linux UDP缓冲区可用空间？

发布于 2024-08-22 01:13:00 字数 373 浏览 5 评论 0原文

我在 Linux 上有一个 Java 应用程序，它打开 UDP 套接字并等待消息。

在重负载下几个小时后，出现数据包丢失，即数据包由内核接收，但不是由我的应用程序接收（我们在嗅探器中看到丢失的数据包，我们在 netstat 中看到 UDP 数据包丢失，我们没有看到这些数据包）在我们的应用程序日志中）。

我们尝试扩大套接字缓冲区，但这没有帮助 - 我们开始丢失数据包，但仅此而已。

为了调试，我想知道在任何给定时刻操作系统 udp 缓冲区有多满。谷歌搜索，但没有找到任何东西。你能帮助我吗？

PS 各位，我知道 UDP 不可靠。但是 - 我的计算机接收所有 UDP 消息，而我的应用程序无法使用其中的一些消息。我想最大限度地优化我的应用程序，这就是问题的原因。谢谢。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

月亮是我掰弯的 2024-08-29 01:13:00

UDP 是一个完全可行的协议。这是同样的老情况：正确的工具适合正确的工作！

如果您有一个程序等待 UDP 数据报，然后在返回等待另一个数据报之前处理它们，那么您的处理时间需要始终快于数据报最坏情况的到达率。如果不是，则 UDP 套接字接收队列将开始填满。

这对于短突发来说是可以容忍的。队列完全执行其应该执行的操作——对数据报进行排队，直到您准备好为止。但是，如果平均到达率经常导致队列积压，那么就需要重新设计您的程序了。这里有两个主要选择：通过巧妙的编程技术减少经过的处理时间，和/或多线程程序。还可以在程序的多个实例之间采用负载平衡。

如前所述，在 Linux 上，您可以检查 proc 文件系统以获取有关 UDP 最新情况的状态。例如，如果我 cat /proc/net/udp 节点，我会得到如下内容：

$ cat /proc/net/udp   
  sl  local_address rem_address   st tx_queue rx_queue tr tm->when retrnsmt   uid  timeout inode ref pointer drops             
  40: 00000000:0202 00000000:0000 07 00000000:00000000 00:00000000 00000000     0        0 3466 2 ffff88013abc8340 0           
  67: 00000000:231D 00000000:0000 07 00000000:0001E4C8 00:00000000 00000000  1006        0 16940862 2 ffff88013abc9040 2237    
 122: 00000000:30D4 00000000:0000 07 00000000:00000000 00:00000000 00000000  1006        0 912865 2 ffff88013abc8d00 0

由此，我可以看到用户 id 1006 拥有的套接字，正在侦听端口 0x231D (8989)，并且接收队列约为 128KB。由于 128KB 是我系统上的最大大小，这告诉我我的程序在跟上到达的数据报方面非常弱。到目前为止，已经发生了 2237 次丢弃，这意味着 UDP 层无法将更多数据报放入套接字队列中，而必须丢弃它们。

您可以观察程序随时间的行为，例如使用：

watch -d 'cat /proc/net/udp|grep 00000000:231D'

另请注意，netstat 命令执行相同的操作： netstat -c --udp -an

我的 weenie 程序的解决方案将是多-线。

干杯!

UDP is a perfectly viable protocol. It is the same old case of the right tool for the right job!

If you have a program that waits for UDP datagrams, and then goes off to process them before returning to wait for another, then your elapsed processing time needs to always be faster than the worst case arrival rate of datagrams. If it is not, then the UDP socket receive queue will begin to fill.

This can be tolerated for short bursts. The queue does exactly what it is supposed to do – queue datagrams until you are ready. But if the average arrival rate regularly causes a backlog in the queue, it is time to redesign your program. There are two main choices here: reduce the elapsed processing time via crafty programming techniques, and/or multi-thread your program. Load balancing across multiple instances of your program may also be employed.

As mentioned, on Linux you can examine the proc filesystem to get status about what UDP is up to. For example, if I cat the /proc/net/udp node, I get something like this:

$ cat /proc/net/udp   
  sl  local_address rem_address   st tx_queue rx_queue tr tm->when retrnsmt   uid  timeout inode ref pointer drops             
  40: 00000000:0202 00000000:0000 07 00000000:00000000 00:00000000 00000000     0        0 3466 2 ffff88013abc8340 0           
  67: 00000000:231D 00000000:0000 07 00000000:0001E4C8 00:00000000 00000000  1006        0 16940862 2 ffff88013abc9040 2237    
 122: 00000000:30D4 00000000:0000 07 00000000:00000000 00:00000000 00000000  1006        0 912865 2 ffff88013abc8d00 0

From this, I can see that a socket owned by user id 1006, is listening on port 0x231D (8989) and that the receive queue is at about 128KB. As 128KB is the max size on my system, this tells me my program is woefully weak at keeping up with the arriving datagrams. There have been 2237 drops so far, meaning the UDP layer cannot put any more datagrams into the socket queue, and must drop them.

You could watch your program's behaviour over time e.g. using:

watch -d 'cat /proc/net/udp|grep 00000000:231D'

Note also that the netstat command does about the same thing: netstat -c --udp -an

My solution for my weenie program, will be to multi-thread.

Cheers!

回复收藏 0 原文

生活了然无味 2024-08-29 01:13:00

Linux 提供了文件 /proc/net/udp 和 /proc/net/udp6，其中列出了所有打开的 UDP 套接字（分别适用于 IPv4 和 IPv6）。在它们中，tx_queue 和 rx_queue 列以字节为单位显示传出和传入队列。

如果一切按预期工作，您通常不会在这两列中看到任何不为零的值：应用程序生成数据包后，它们就会通过网络发送，一旦这些数据包从网络到达，您的应用程序就会唤醒并接收它们（recv 调用立即返回）。如果您的应用程序打开了套接字，但未调用 recv 来接收数据，或者处理此类数据的速度不够快，您可能会看到 rx_queue 上升。

回复收藏 0 原文

冷默言语 2024-08-29 01:13:00

rx_queue 会告诉您任何给定时刻的队列长度，但它不会告诉您队列有多满，即高水位线。无法持续监控该值，也无法以编程方式获取该值（请参阅如何获取 UDP 套接字的排队数据量？）。

我可以想象监控队列长度的唯一方法是将队列移动到您自己的程序中。换句话说，启动两个线程——一个以最快的速度读取套接字并将数据报转储到队列中；另一个以最快的速度读取套接字并将数据报转储到队列中。另一个是您的程序从该队列中拉出并处理数据包。当然，这是假设您可以确保每个线程都位于单独的 CPU 上。现在您可以监控自己队列的长度并跟踪最高水位线。

回复收藏 0 原文