大量文件之间大量数据的最快套接字方法
我正在构建一个套接字应用程序,需要将大量小/中型文件(例如 5-100kb 大小的文件)洗牌到许多不同的客户端(有点像 Web 服务器,但仍然不完全)。
我应该使用标准的 poll/epoll (linux) 还是 Winsock (win32) 中的异步套接字,或者是否有任何具有更高性能的方法(例如 win32 上的重叠 I/O)?
Linux 和 Windows 都是可能的平台!
I'm building a socket application that need to shuffle a lot of small/medium sized files, something like 5-100kb sized files to a lot of different clients (sort of like a web server, but still not quite).
Should I just go with the standard poll/epoll (linux) or async sockets in winsock (win32), or are there any methods with even more performance around (overlapped i/o on win32 for example) ?
Both Linux and Windows are possible platforms!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
在 Linux 上,使用 epoll 解复用多个套接字是通过 TCP 进行并行 I/O 的最快方法。
但我还要提到,为了可移植性(并且由于您似乎对 Linux 或 Windows 感兴趣),您应该研究 Boost.Asio。它具有可移植的 API,但在 Linux 上使用
epoll
并在 Windows 上使用重叠 I/O,因此您可以构建高效的和可移植网络应用程序。此外,由于您正在处理文件,因此在执行 I/O 时还应该实现双缓冲以获得最佳性能。换句话说,您使用两个缓冲区发送/接收每个文件。例如,在发送端,您从磁盘读取到一个缓冲区,然后通过网络发送该缓冲区,而另一个线程将下一个数据块从磁盘读取到第二个缓冲区。这样就可以将磁盘 I/O 与网络 I/O 重叠。
On Linux, demultiplexing multiple sockets using
epoll
is the fastest possible way to do parallel I/O over TCP.But I'll also mention that in the interest of portability, (and since you seem to be interested in either Linux or Windows), you should look into Boost.Asio. It has a portable API, but uses
epoll
on Linux and overlapped I/O on windows, so you can built highly efficient and portable networking apps.Also since you're working with files, you should also implement double buffering when performing I/O for maximum performance. In other words, you send / recv each file using two buffers. For example, on the sending side, you read from disk into one buffer and then send that buffer over the network, while another thread reads the next block of data from disk into the second buffer. This way you overlap disk I/O with network I/O.
在 Linux 上,
sendfile()
是一个高性能 API,专门用于从文件向套接字发送数据(您仍然需要使用poll
进行复用,它只是替代读
/写
部分)。On Linux,
sendfile()
is a high performance API specifically for sending data from files to sockets (you will still need to usepoll
to multiplex, it is just a replacement for theread
/write
part).除了
epoll
之外,Linuxsendfile(2)
看起来很适合您在服务器端的需求。In addition to
epoll
it looks like Linuxsendfile(2)
would be a good fit for your needs on the server side.在 Windows 上,您可以尝试使用 TransmitFile,它有可能通过避免内核空间来提高性能<->用户空间数据复制。
On windows you may try using TransmitFile, which has a potential of boosting your performance by avoiding kernel space <-> user space data copying.
不幸的是,如果您想要最大可能的性能,您仍然需要在 Windows 和 Linux 上手工编写 I/O 代码,因为当前可用的抽象库不能很好地扩展到多线程(如果有的话)。
如果您想要可移植性(和易用性),Boost asio 可能是最好的选择,但在多线程可扩展性方面它确实有其局限性(请参阅C++ Socket 服务器 - 无法使 CPU 饱和) - 我想主要问题是将超时处理集成到多线程事件循环中,而无需过多锁定。
本质上,为了获得最大性能,您希望在 Windows 上使用带有工作线程池的 I/O 完成端口,在 Linux 上使用带有工作线程池的边缘触发 epoll。
Unfortunately, if you want maximum possible performance, you will still have to hand-craft your I/O code on Windows and Linux as currently available abstraction libraries don't scale that well to multiple threads (if at all).
Boost asio is probably the best option if you want portability (and ease of use), but it does have it's limitations when it comes to multithreaded scalability (see C++ Socket Server - Unable to saturate CPU) - I guess the main problem is to integrate timeout handling without excessive locking into a multithreaded event loop.
Essentially, what you would want to use for maximum performance is I/O completion ports with a pool of worker threads on Windows and edge-triggered epoll with a pool of worker threads on Linux.
不要过早地优化你的程序。
假设这不是一个过早的优化,最简单的做法就是将所有数据保留在内存中。如果您愿意,您可以 mmap() 它们,或者只是在启动时加载它们。发送内存中已经存在的内容是理所当然的。
话虽如此,尝试使用(例如)epoll 多路复用很多东西可能会有点头痛,你能不使用别人已经编写的东西吗?
Don't optimise your program prematurely.
Assuming it isn't a premature optimisation, the easiest thing to do is just keep all the data in memory. You can mmap() them if you like, or just load them in at startup time. Sending stuff that's already in memory is a no-brainer.
Having said that, trying to multiplex lots of things with (e.g.) epoll can be a bit of a headache, can you not use something someone's already written?