大量文件之间大量数据的最快套接字方法

发布于 2024-08-13 07:00:38 字数 217 浏览 6 评论 0原文

我正在构建一个套接字应用程序,需要将大量小/中型文件(例如 5-100kb 大小的文件)洗牌到许多不同的客户端(有点像 Web 服务器,但仍然不完全)。

我应该使用标准的 poll/epoll (linux) 还是 Winsock (win32) 中的异步套接字,或者是否有任何具有更高性能的方法(例如 win32 上的重叠 I/O)?

Linux 和 Windows 都是可能的平台!

I'm building a socket application that need to shuffle a lot of small/medium sized files, something like 5-100kb sized files to a lot of different clients (sort of like a web server, but still not quite).

Should I just go with the standard poll/epoll (linux) or async sockets in winsock (win32), or are there any methods with even more performance around (overlapped i/o on win32 for example) ?

Both Linux and Windows are possible platforms!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

故事还在继续 2024-08-20 07:00:38

在 Linux 上,使用 epoll 解复用多个套接字是通过 TCP 进行并行 I/O 的最快方法。

但我还要提到,为了可移植性(并且由于您似乎对 Linux 或 Windows 感兴趣),您应该研究 Boost.Asio。它具有可移植的 API,但在 Linux 上使用 epoll 并在 Windows 上使用重叠 I/O,因此您可以构建高效的可移植网络应用程序。

此外,由于您正在处理文件,因此在执行 I/O 时还应该实现双缓冲以获得最佳性能。换句话说,您使用两个缓冲区发送/接收每个文件。例如,在发送端,您从磁盘读取到一个缓冲区,然后通过网络发送该缓冲区,而另一个线程将下一个数据块从磁盘读取到第二个缓冲区。这样就可以将磁盘 I/O 与网络 I/O 重叠。

On Linux, demultiplexing multiple sockets using epoll is the fastest possible way to do parallel I/O over TCP.

But I'll also mention that in the interest of portability, (and since you seem to be interested in either Linux or Windows), you should look into Boost.Asio. It has a portable API, but uses epoll on Linux and overlapped I/O on windows, so you can built highly efficient and portable networking apps.

Also since you're working with files, you should also implement double buffering when performing I/O for maximum performance. In other words, you send / recv each file using two buffers. For example, on the sending side, you read from disk into one buffer and then send that buffer over the network, while another thread reads the next block of data from disk into the second buffer. This way you overlap disk I/O with network I/O.

温柔少女心 2024-08-20 07:00:38

在 Linux 上,sendfile() 是一个高性能 API,专门用于从文件向套接字发送数据(您仍然需要使用 poll 进行复用,它只是替代/部分)。

On Linux, sendfile() is a high performance API specifically for sending data from files to sockets (you will still need to use poll to multiplex, it is just a replacement for the read/write part).

梦一生花开无言 2024-08-20 07:00:38

除了 epoll 之外,Linux sendfile(2) 看起来很适合您在服务器端的需求。

In addition to epoll it looks like Linux sendfile(2) would be a good fit for your needs on the server side.

帅冕 2024-08-20 07:00:38

在 Windows 上,您可以尝试使用 TransmitFile,它有可能通过避免内核空间来提高性能<->用户空间数据复制。

On windows you may try using TransmitFile, which has a potential of boosting your performance by avoiding kernel space <-> user space data copying.

夏日落 2024-08-20 07:00:38

不幸的是,如果您想要最大可能的性能,您仍然需要在 Windows 和 Linux 上手工编写 I/O 代码,因为当前可用的抽象库不能很好地扩展到多线程(如果有的话)。

如果您想要可移植性(和易用性),Boost asio 可能是最好的选择,但在多线程可扩展性方面它确实有其局限性(请参阅C++ Socket 服务器 - 无法使 CPU 饱和) - 我想主要问题是将超时处理集成到多线程事件循环中,而无需过多锁定。

本质上,为了获得最大性能,您希望在 Windows 上使用带有工作线程池的 I/O 完成端口,在 Linux 上使用带有工作线程池的边缘触发 epoll。

Unfortunately, if you want maximum possible performance, you will still have to hand-craft your I/O code on Windows and Linux as currently available abstraction libraries don't scale that well to multiple threads (if at all).

Boost asio is probably the best option if you want portability (and ease of use), but it does have it's limitations when it comes to multithreaded scalability (see C++ Socket Server - Unable to saturate CPU) - I guess the main problem is to integrate timeout handling without excessive locking into a multithreaded event loop.

Essentially, what you would want to use for maximum performance is I/O completion ports with a pool of worker threads on Windows and edge-triggered epoll with a pool of worker threads on Linux.

高跟鞋的旋律 2024-08-20 07:00:38

不要过早地优化你的程序。

假设这不是一个过早的优化,最简单的做法就是将所有数据保留在内存中。如果您愿意,您可以 mmap() 它们,或者只是在启动时加载它们。发送内存中已经存在的内容是理所当然的。

话虽如此,尝试使用(例如)epoll 多路复用很多东西可能会有点头痛,你能不使用别人已经编写的东西吗?

Don't optimise your program prematurely.

Assuming it isn't a premature optimisation, the easiest thing to do is just keep all the data in memory. You can mmap() them if you like, or just load them in at startup time. Sending stuff that's already in memory is a no-brainer.

Having said that, trying to multiplex lots of things with (e.g.) epoll can be a bit of a headache, can you not use something someone's already written?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文