C++套接字编程:最大化本地主机上的吞吐量/带宽(我只获得 3 Gbit/s,而不是 23GBit/s)
我想创建一个 C++ 服务器/客户端,最大限度地提高本地主机上 TCP 套接字通信的吞吐量。作为准备,我使用 iperf 来了解我的 i7 MacBookPro 上的最大带宽是多少。
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 256 KByte (default)
------------------------------------------------------------
[ 4] local 127.0.0.1 port 5001 connected with 127.0.0.1 port 51583
[ 4] 0.0-120.0 sec 329 GBytes 23.6 Gbits/sec
无需任何调整,ipref 向我表明我至少可以达到 23.2 GBit/s。然后我做了我自己的 C++ 服务器/客户端实现,你可以在这里找到完整的代码:https://gist.github。 com/1116635
我的代码基本上每次读/写操作都会传输一个 1024 字节的 int 数组。因此,我在服务器上的发送循环如下所示:
int n;
int x[256];
//fill int array
for (int i=0;i<256;i++)
{
x[i]=i;
}
for (int i=0;i<(4*1024*1024);i++)
{
n = write(sock,x,sizeof(x));
if (n < 0) error("ERROR writing to socket");
}
我在客户端上的接收循环如下所示:
int x[256];
for (int i=0;i<(4*1024*1024);i++)
{
n = read(sockfd,x,((sizeof(int)*256)));
if (n < 0) error("ERROR reading from socket");
}
标题中提到的,运行此循环(使用 -O3 编译)会导致以下执行时间,约为 3 GBit/s:
./client 127.0.0.1 1234
Elapsed time for Reading 4GigaBytes of data over socket on localhost: 9578ms
正如 我是否失去了带宽,我做错了什么?同样,完整的代码可以在这里看到: https://gist.github.com/1116635
任何帮助很感激!
I want to create a C++ server/client that maximizes the throughput over TCP socket communication on my localhost. As a preparation, I used iperf to find out what the maximum bandwidth is on my i7 MacBookPro.
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 256 KByte (default)
------------------------------------------------------------
[ 4] local 127.0.0.1 port 5001 connected with 127.0.0.1 port 51583
[ 4] 0.0-120.0 sec 329 GBytes 23.6 Gbits/sec
Without any tweaking, ipref showed me that I can reach at least 23.2 GBit/s. Then I did my own C++ server/client implementation, you can find the full code here: https://gist.github.com/1116635
I that code I basically transfer a 1024bytes int array with each read/write operation. So my send loop on the server looks like this:
int n;
int x[256];
//fill int array
for (int i=0;i<256;i++)
{
x[i]=i;
}
for (int i=0;i<(4*1024*1024);i++)
{
n = write(sock,x,sizeof(x));
if (n < 0) error("ERROR writing to socket");
}
My receive loop on the client looks like this:
int x[256];
for (int i=0;i<(4*1024*1024);i++)
{
n = read(sockfd,x,((sizeof(int)*256)));
if (n < 0) error("ERROR reading from socket");
}
As mention in the headline, running this (compiled with -O3) results in the following execution time which is about 3 GBit/s:
./client 127.0.0.1 1234
Elapsed time for Reading 4GigaBytes of data over socket on localhost: 9578ms
Where do I loose the bandwidth, what am I doing wrong? Again, the full code can be seen here: https://gist.github.com/1116635
Any help is appreciated!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我之前的回答是错误的。我已经测试了您的程序,这是结果。
0m7.763s
0m5.209s
0m3.780s
我只改变了客户端。我怀疑如果您还更改服务器,可能会压缩更多性能。
事实上,我得到的结果与您完全不同(
0m7.763s
vs9578ms
)也表明这是由执行的系统调用数量引起的(因为我们有不同的处理器) ..)。要压缩更多性能:readv
和writev
)splice(2)
,发送文件(2)
My previous answer was mistaken. I have tested your programs and here are the results.
0m7.763s
0m5.209s
0m3.780s
I only changed the client. I suspect more performance can be squeezed if you also change the server.
The fact that I got radically different results than you did (
0m7.763s
vs9578ms
) also suggests this is caused by the number of system calls performed (as we have different processors..). To squeeze even more performance:readv
andwritev
)splice(2)
,sendfile(2)
您可以使用 strace -f iperf -s localhost 来了解 iperf 的不同之处。看起来它使用的缓冲区比您大得多(2.0.5 为 131072 字节)。
此外,iperf 使用多个线程。如果您有 4 个 CPU 核心,则在客户端和服务器上使用两个线程将导致性能大约翻倍。
You can use
strace -f iperf -s localhost
to find out what iperf is doing differently. It seems that it's using significantly larger buffers (131072 Byte large with 2.0.5) than you.Also,
iperf
uses multiple threads. If you have 4 CPU cores, using two threads on client and server will will result in approximately doubled performance.如果您确实想获得最大性能,请使用
mmap
+splice/sendfile
,对于本地主机通信,请使用 unix 域流套接字 (AF_LOCAL
)。If you really want to get max performance use
mmap
+splice/sendfile
, and for localhost communication use unix domain stream sockets (AF_LOCAL
).