Linux 多播 sendto() 性能因本地侦听器而降低
我们有一个“发布者”应用程序,它使用多播发送数据。该应用程序对性能极其敏感(我们正在微秒级进行优化)。侦听此已发布数据的应用程序可以(并且通常)与发布应用程序位于同一台计算机上。
我们最近注意到一个有趣的现象:执行 sendto() 的时间与机器上侦听器的数量成比例增加。
例如,假设没有侦听器,sendto() 调用的基本时间为 5 微秒。每个额外的侦听器都会使 sendto() 调用的时间增加大约 2 微秒。因此,如果我们有 10 个侦听器,那么现在 sendto() 调用需要 2*10+5 = 25 微秒。
对我来说,这表明 sendto() 调用会阻塞,直到数据被复制到每个侦听器为止。
对听方的分析也支持这一点。如果有 10 个侦听器,每个侦听器都会比前一个侦听器晚两微秒接收数据。 (即,第一个侦听器在大约 5 微秒内获取数据,最后一个侦听器在大约 23--25 微秒内获取数据。)
有没有办法在编程级别或系统级别更改此行为?像非阻塞/异步 sendto() 调用之类的东西?或者至少只阻塞直到消息被复制到内核内存中,这样它就可以返回而无需等待所有侦听器)?
We have a "publisher" application that sends out data using multicast. The application is extremely performance sensitive (we are optimizing at the microsecond level). Applications that listen to this published data can be (and often are) on the same machine as the publishing application.
We recently noticed an interesting phenomenon: the time to do a sendto() increases proportionally to the number of listeners on the machine.
For example, let's say with no listeners the base time for our sendto() call is 5 microseconds. Each additional listener increases the time of the sendto() call by about 2 microseconds. So if we have 10 listeners, now the sendto() call takes 2*10+5 = 25 microseconds.
This to me suggests that the sendto() call blocks until the data has been copied to every single listener.
Analysis of the listening side supports this as well. If there are 10 listeners, each listener receives the data two microseconds later than the previous. (I.e., the first listener gets the data in about five microseconds, and the last listener gets the data in about 23--25 microseconds.)
Is there any way, either at the programmatic level or the system level to change this behavior? Something like a non-blocking/asynchronous sendto() call? Or at least block only until the message is copied into the kernel's memory, so it can return without waiting on all the listeners)?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
多播循环的效率极其低下,不应该用于高性能消息传递。正如您所注意到的,对于每次发送,内核都会将消息复制到每个本地侦听器。
推荐的方法是使用单独的 IPC 方法来分发到同一主机上的其他线程和进程,无论是共享内存还是 unix 套接字。
例如,可以使用 ZeroMQ 套接字轻松实现这一点,方法是在同一 ZeroMQ 套接字上的 PGM 多播连接之上添加 IPC 连接。
Multicast loop is incredibly inefficient and shouldn't be used for high performance messaging. As you noted for every send the kernel is copying the message to every local listener.
The recommended approach is to use a separate IPC method to distribute to other threads and processes on the same host, either shared memory or unix sockets.
For example this can easily be implemented using ZeroMQ sockets by adding an IPC connection above a PGM multicast connection on the same ZeroMQ socket.
很抱歉问了一个显而易见的问题,但是套接字是非阻塞的吗? (将
O_NONBLOCK
添加到端口标志集 - 请参阅fcntl
)Sorry for asking the obvious, but is the socket nonblocking? (add
O_NONBLOCK
to the set of flags for the port -- seefcntl
)