与多个进程共享连接和数据的最快方法?
我有多个应用程序进程,每个进程都连接到服务器并从服务器接收数据。通常,连接到的服务器和检索的数据在进程之间重叠。因此,网络上存在大量不必要的数据重复,连接数量超出了应有的数量(这会给服务器带来负担),并且数据最终会冗余地存储在应用程序的内存中。
一种解决方案是将多个应用程序进程合并为一个应用程序进程,但在大多数情况下,它们在逻辑上确实是不同的,这可能需要多年的工作。
不幸的是,延迟非常重要,而且数据量巨大(任何一个数据都可能不会太大,但是一旦客户端发出请求,服务器将随着数据变化发送快速的更新流,这可能会超过20MB/s,这些都需要以尽可能短的延迟提供给请求的应用程序)。
我想到的解决方案是编写一个本地守护进程,应用程序进程将从该守护进程请求数据。该守护进程将检查是否已存在与相应服务器的连接,如果不存在则建立一个连接。然后它将检索数据并使用共享内存(由于延迟问题,否则我会使用套接字)将数据提供给请求应用程序。
在短期内,一个更简单的想法只能解决冗余连接,那就是使用unix域套接字(这将在unix操作系统上运行,尽管我更喜欢在可能的情况下坚持使用跨平台库)在所有域之间共享套接字描述符。进程,因此它们共享一个连接。这样做的问题是消耗缓冲区——我希望所有进程都能看到通过套接字传输的所有内容,如果我正确理解这种方法,则在套接字上的一个进程中进行读取将阻止其他进程在其上看到相同的数据。下一次读取(共享描述符内的偏移量将被增加)。
I have multiple app processes that each connect to servers and receive data from them. Often the servers being connected to and the data being retrieved overlaps between processes. So there is a lot of unnecessary duplication of the data across the network, more connections than should be necessary (which taxes the servers), and the data ends up getting stored redundantly in memory in the apps.
One solution would be to combine the multiple app processes into a single one -- but for the most part they really are logically distinct, and that could be years of work.
Unfortunately, latency is critically important, and the volume of data is huge (any one datum may not be too big, but once a client makes a request the server will send a rapid stream of updates as the data changes, which can be upwards of 20MB/s, and these all need to be given to the requesting apps with the shortest possible delay).
The solution that comes to mind is to code a local daemon process, that the app processes would request data from. The daemon would check if a connection to the appropriate server already exists, and if not make one. Then it would retrieve the data and using shared memory (due to latency concern, otherwise I'd use sockets) give the data to the requesting app.
A simpler idea in the short term that would only solve the redundant connections would be to use unix domain sockets (this will run on a unix OS, though I prefer to stick to crossplatform libs when I can) to share a socket descriptor between all the processes, so they share a single connection. The issue with this is consuming the buffer -- I want all the processes to see everything coming over the socket, and if I understand right with this approach a read in one process on the socket will prevent other processes from seeing the same data on their next read (the offset within the shared descriptor will be bumped).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我建议您看一下 ZeroMQ。这可能有助于解决您的问题。我不认为 20MB/s 很高...您应该能够通过使用 ZeroMQ 中的 TCP 传输来实现该吞吐量水平。还支持其他传输机制,包括使用 OpenPGM 的可靠多播。计划添加 UNIX 管道作为传输机制。
消息传递可能比共享内存更安全、更容易。值得注意的是,如果您使用消息传递而不是共享内存,那么您可以将应用程序组件拆分到服务器集群中......这可能会给您带来比共享内存更好的性能,具体取决于您的瓶颈所在。
I recommend that you take a look at ZeroMQ. This might help solve your problem. I don't think that 20MB/s is very high ... you should be able to achieve that level of throughput by just using the TCP transport in ZeroMQ. There is also support for other transport mechanisms, including reliable multicast using OpenPGM. There are plans to add UNIX pipes as a transport mechanism.
Messaging will probably be safer and easier than shared memory. Notably if you use messaging instead of shared memory then you can split up your application components across a cluster of servers ... which might give you significantly better performance than shared memory, depending on where your bottlenecks are.
我相信通过共享内存公开数据的专用服务是您最好的选择。其次是通过命名管道多播数据的服务,只不过您的目标是 Unix 变体而不是 Windows。
另一种选择是 UDP 多播,以便数据复制发生在硬件或驱动程序级别。唯一的问题是,UDP 数据传输不能保证按顺序传输,也不能保证传输成功。
我认为共享物理套接字是一种黑客行为,应该避免,您最好实现一个驱动程序,该驱动程序可以透明地执行您希望守护进程执行的操作(例如,进程将套接字视为普通套接字,除非在内部套接字映射到单个套接字,其中存在在虚拟套接字之间重新广播数据的逻辑。)不幸的是,要使其正确的工作量将是巨大的,并且如果完成时间是一个问题,那么共享套接字并不是真正的好方法take(无论是在驱动程序级别完成,还是通过其他一些黑客手段,例如跨进程共享套接字描述符。)
共享套接字还假设它是仅推送连接,例如,在应用程序级别没有发生流量协商(例如,请求数据或确认数据接收。)
完成任务的快速途径可能是查看 BNC 等项目并转换代码,或劫持总体思路来完成您需要的操作。将流量复制到本地套接字不应产生巨大的延迟,尽管您将使用 NIC(和关联的缓冲区)来进行所有数据复制,并且如果您接近硬件的极限(或者驱动程序很差和/或TCP 堆栈实现),那么您可能会遇到服务器死机的情况。在我工作的地方,我们已经看到数据复制在驱动程序级别使用千兆位以太网卡,因此这并非闻所未闻。
如果您想保持平台独立性和高性能,同时不引入任何可能在 5 年内由于内核或硬件/驱动程序更改而变得不支持的内容,那么共享内存是最好的选择。
I believe a dedicated service that exposes the data via shared memory is your best bet. Secondary from that would be a service that multicasts the data via named pipes, except that you're targeting a Unix variant and not Windows.
Another option would be UDP multicast, so that the data replication occurs at the hardware or driver level. The only problem is that UDP data delivery is not guaranteed to be in order, nor is it guaranteed to deliver at all.
I think sharing the physical socket is a hack and should be avoided, you would be better off implementing a driver that did what you wanted the daemon to do transparently (e.g. processes saw the socket as a normal socket except internally the socket was mapped to a single socket, where logic existed to re-broadcast the data among the virtual sockets.) Unfortunately the level of effort to get it right would be significant, and if time to complete is a concern sharing the socket isn't really a good route to take (whether done at the driver level, or via some other hacky means such as sharing the socket descriptor cross-process.)
Sharing the socket also assumes that it is a push-only connection, e.g. no traffic negotiation is ocurring at the app level (requests for data, for example, or acknowledgements of data receipt.)
A quick-path to completion may be to look at projects such as BNC and convert the code, or hijack the general idea, to do what you need. Replicating traffic to local sockets shouldn't incur a huge latency, though you would be exercising the NIC (and associated buffers) for all of the data replication and if you are nearing the limit of the hardware (or have a poor driver and/or TCP stack implementation) then you may wind up with a dead server. Where I work we've seen data replication tank a gigabit ether card at the driver level, so it's not unheard of.
Shared Memory is the best bet if you want to remain platform independent, and performant, while not introducing anything that may become unsupportable in 5 years time due to kernel or hardware/driver changes.