C++套接字编程:最大化本地主机上的吞吐量/带宽(我只获得 3 Gbit/s,而不是 23GBit/s)

发布于 2024-11-27 02:03:03 字数 1555 浏览 5 评论 0原文

我想创建一个 C++ 服务器/客户端,最大限度地提高本地主机上 TCP 套接字通信的吞吐量。作为准备,我使用 iperf 来了解我的 i7 MacBookPro 上的最大带宽是多少。

------------------------------------------------------------
Server listening on TCP port 5001
TCP window size:  256 KByte (default)
------------------------------------------------------------
[  4] local 127.0.0.1 port 5001 connected with 127.0.0.1 port 51583
[  4]  0.0-120.0 sec   329 GBytes  23.6 Gbits/sec

无需任何调整,ipref 向我表明我至少可以达到 23.2 GBit/s。然后我做了我自己的 C++ 服务器/客户端实现,你可以在这里找到完整的代码:https://gist.github。 com/1116635

我的代码基本上每次读/写操作都会传输一个 1024 字节的 int 数组。因此,我在服务器上的发送循环如下所示:

   int n;

   int x[256];

   //fill int array
   for (int i=0;i<256;i++)
   {
       x[i]=i;
   }

   for (int i=0;i<(4*1024*1024);i++)
   {
       n = write(sock,x,sizeof(x));
       if (n < 0) error("ERROR writing to socket");
   }

我在客户端上的接收循环如下所示:

int x[256]; 

for (int i=0;i<(4*1024*1024);i++)
{
    n = read(sockfd,x,((sizeof(int)*256)));
    if (n < 0) error("ERROR reading from socket");
}

标题中提到的,运行此循环(使用 -O3 编译)会导致以下执行时间,约为 3 GBit/s:

./client 127.0.0.1 1234
Elapsed time for Reading 4GigaBytes of data over socket on localhost: 9578ms

正如 我是否失去了带宽,我做错了什么?同样,完整的代码可以在这里看到: https://gist.github.com/1116635

任何帮助很感激!

I want to create a C++ server/client that maximizes the throughput over TCP socket communication on my localhost. As a preparation, I used iperf to find out what the maximum bandwidth is on my i7 MacBookPro.

------------------------------------------------------------
Server listening on TCP port 5001
TCP window size:  256 KByte (default)
------------------------------------------------------------
[  4] local 127.0.0.1 port 5001 connected with 127.0.0.1 port 51583
[  4]  0.0-120.0 sec   329 GBytes  23.6 Gbits/sec

Without any tweaking, ipref showed me that I can reach at least 23.2 GBit/s. Then I did my own C++ server/client implementation, you can find the full code here: https://gist.github.com/1116635

I that code I basically transfer a 1024bytes int array with each read/write operation. So my send loop on the server looks like this:

   int n;

   int x[256];

   //fill int array
   for (int i=0;i<256;i++)
   {
       x[i]=i;
   }

   for (int i=0;i<(4*1024*1024);i++)
   {
       n = write(sock,x,sizeof(x));
       if (n < 0) error("ERROR writing to socket");
   }

My receive loop on the client looks like this:

int x[256]; 

for (int i=0;i<(4*1024*1024);i++)
{
    n = read(sockfd,x,((sizeof(int)*256)));
    if (n < 0) error("ERROR reading from socket");
}

As mention in the headline, running this (compiled with -O3) results in the following execution time which is about 3 GBit/s:

./client 127.0.0.1 1234
Elapsed time for Reading 4GigaBytes of data over socket on localhost: 9578ms

Where do I loose the bandwidth, what am I doing wrong? Again, the full code can be seen here: https://gist.github.com/1116635

Any help is appreciated!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

爱殇璃 2024-12-04 02:03:03
  • 使用更大的缓冲区(即减少库/系统调用)
  • 使用异步API
  • 阅读文档(读/写的返回值不仅仅是一个错误条件,它还表示读/写的字节数)
  • Use larger buffers (i.e. make less library/system calls)
  • Use asynchronous APIs
  • Read the documentation (the return value of read/write is not simply an error condition, it also represents the number of bytes read/written)
岁月苍老的讽刺 2024-12-04 02:03:03

我之前的回答是错误的。我已经测试了您的程序,这是结果。

  • 如果我运行原始客户端,我会得到 0m7.763s
  • 如果我使用 4 倍大的缓冲区,我会得到 0m5.209s
  • 使用缓冲区是原始 I 的 8 倍get 0m3.780s

我只改变了客户端。我怀疑如果您还更改服务器,可能会压缩更多性能。

事实上,我得到的结果与您完全不同(0m7.763s vs 9578ms)也表明这是由执行的系统调用数量引起的(因为我们有不同的处理器) ..)。要压缩更多性能:

  • 使用分散-聚集 I/O(readvwritev
  • 使用零复制机制:splice(2)发送文件(2)

My previous answer was mistaken. I have tested your programs and here are the results.

  • If I run the original client, I get 0m7.763s
  • If I use a buffer 4 times as large, I get 0m5.209s
  • With a buffer 8 times as the original I get 0m3.780s

I only changed the client. I suspect more performance can be squeezed if you also change the server.

The fact that I got radically different results than you did (0m7.763s vs 9578ms) also suggests this is caused by the number of system calls performed (as we have different processors..). To squeeze even more performance:

  • Use scater-gather I/O (readv and writev)
  • Use zero-copy mechanisms: splice(2), sendfile(2)
ゞ记忆︶ㄣ 2024-12-04 02:03:03

您可以使用 strace -f iperf -s localhost 来了解 iperf 的不同之处。看起来它使用的缓冲区比您大得多(2.0.5 为 131072 字节)。

此外,iperf 使用多个线程。如果您有 4 个 CPU 核心,则在客户端和服务器上使用两个线程将导致性能大约翻倍。

You can use strace -f iperf -s localhost to find out what iperf is doing differently. It seems that it's using significantly larger buffers (131072 Byte large with 2.0.5) than you.

Also, iperf uses multiple threads. If you have 4 CPU cores, using two threads on client and server will will result in approximately doubled performance.

飘逸的'云 2024-12-04 02:03:03

如果您确实想获得最大性能,请使用 mmap + splice/sendfile,对于本地主机通信,请使用 unix 域流套接字 (AF_LOCAL)。

If you really want to get max performance use mmap + splice/sendfile, and for localhost communication use unix domain stream sockets (AF_LOCAL).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文