套接字性能激增

发布于 2024-08-27 04:27:26 字数 2514 浏览 4 评论 0原文

我们在使用 IPC 套接字的高吞吐量事务处理系统中面临着随机峰值。

以下是用于运行的设置:

  1. 客户端为每个事务打开和关闭新连接,并且服务器和客户端之间有 4 次交换。
  2. 我们通过 getsockopt 设置套接字延迟 (SO_LINGER) 选项来禁用 TIME_WAIT,因为我们认为峰值是由于套接字在 TIME_WAIT< 中等待而引起的/代码>。
  3. 没有对交易进行任何处理。仅传递消息。
  4. 操作系统使用 Centos 5.4

平均往返时间约为 3 毫秒,但有时往返时间范围从 100 毫秒到几秒。

用于执行、测量和输出的步骤

  1. 启动服务器

    $ python sockServerLinger.py > /dev/null &

  2. 启动客户端向服务器发送 100 万笔交易。并在 client.log 文件中记录事务的时间。

    $ python sockClient.py 1000000 > client.log

  3. 执行完成后,以下命令将以 : 格式显示大于 100 毫秒的执行时间。

    $ grep -n "0.[1-9]" client.log |

    $ grep -n "0.[1-9]" client.log | less

下面是服务器和客户端的示例代码。

服务器

# File: sockServerLinger.py
import socket, traceback,time
import struct
host = ''
port = 9999

l_onoff = 1
l_linger = 0
lingeropt = struct.pack('ii', l_onoff, l_linger)

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.setsockopt(socket.SOL_SOCKET, socket.SO_LINGER, lingeropt)
s.bind((host, port))
s.listen(1)

while 1:
    try:
        clientsock, clientaddr = s.accept()
        print "Got connection from", clientsock.getpeername()
        data = clientsock.recv(1024*1024*10)
        #print "asdasd",data
        numsent=clientsock.send(data)
        data1 = clientsock.recv(1024*1024*10)
        numsent=clientsock.send(data)
        ret = 1
        while(ret>0):
            data1 = clientsock.recv(1024*1024*10)
            ret = len(data)
        clientsock.close()
    except KeyboardInterrupt:
        raise
    except:
        print traceback.print_exc()
        continue

客户端

# File: sockClient.py

import socket, traceback,sys
import time
i = 0
while 1:
    try:
        st = time.time()
        s = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
        while (s.connect_ex(('127.0.0.1',9999)) != 0):
            continue
        numsent=s.send("asd"*1000)
        response = s.recv(6000)
        numsent=s.send("asd"*1000)
        response = s.recv(6000)
        i+=1
        if i == int(sys.argv[1]):
            break
    except KeyboardInterrupt:
        raise
    except:
        print "in exec:::::::::::::",traceback.print_exc()
        continue
    print time.time() -st

We are facing random spikes in high throughput transaction processing system using sockets for IPC.

Below is the setup used for the run:

  1. The client opens and closes new connection for every transaction, and there are 4 exchanges between the server and the client.
  2. We have disabled the TIME_WAIT, by setting the socket linger (SO_LINGER) option via getsockopt as we thought that the spikes were caused due to the sockets waiting in TIME_WAIT.
  3. There is no processing done for the transaction. Only messages are passed.
  4. OS used Centos 5.4

The average round trip time is around 3 milli seconds, but some times the round trip time ranges from 100 milli seconds to couple of seconds.

Steps used for Execution and Measurement and output

  1. Starting the server

    $ python sockServerLinger.py > /dev/null &

  2. Starting the client to post 1 million transactions to the server. And logs the time for a transaction in the client.log file.

    $ python sockClient.py 1000000 > client.log

  3. Once the execution finishes the following command will show the execution time greater than 100 milliseconds in the format <line_number>:<execution_time>.

    $ grep -n "0.[1-9]" client.log | less

Below is the example code for Server and Client.

Server

# File: sockServerLinger.py
import socket, traceback,time
import struct
host = ''
port = 9999

l_onoff = 1
l_linger = 0
lingeropt = struct.pack('ii', l_onoff, l_linger)

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.setsockopt(socket.SOL_SOCKET, socket.SO_LINGER, lingeropt)
s.bind((host, port))
s.listen(1)

while 1:
    try:
        clientsock, clientaddr = s.accept()
        print "Got connection from", clientsock.getpeername()
        data = clientsock.recv(1024*1024*10)
        #print "asdasd",data
        numsent=clientsock.send(data)
        data1 = clientsock.recv(1024*1024*10)
        numsent=clientsock.send(data)
        ret = 1
        while(ret>0):
            data1 = clientsock.recv(1024*1024*10)
            ret = len(data)
        clientsock.close()
    except KeyboardInterrupt:
        raise
    except:
        print traceback.print_exc()
        continue

Client

# File: sockClient.py

import socket, traceback,sys
import time
i = 0
while 1:
    try:
        st = time.time()
        s = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
        while (s.connect_ex(('127.0.0.1',9999)) != 0):
            continue
        numsent=s.send("asd"*1000)
        response = s.recv(6000)
        numsent=s.send("asd"*1000)
        response = s.recv(6000)
        i+=1
        if i == int(sys.argv[1]):
            break
    except KeyboardInterrupt:
        raise
    except:
        print "in exec:::::::::::::",traceback.print_exc()
        continue
    print time.time() -st

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

何必那么矫情 2024-09-03 04:27:26

我想到了一种可能性:

1)因为您使用的是 SOCK_STREAM,所以您正在使用 TCP 协议
2)作为一个可靠的协议,TCP会重新发送超时的数据包,以确保一切最终到达。
3) TCP 使用动态超时值,该值是根据当前往返时间 (RTT) 的估计值计算得出的
4)当TCP连接第一次启动时,它不知道RTT是多少,因此它使用非常大的初始超时值,有时约为秒。

所以...如果早期的 TCP 握手数据包之一被丢弃,您的套接字可能会等待很长时间,然后才确定该数据包没有到达那里并重新发送它。这种情况会随机发生,相对较少,但在一百万个连接中肯定会发生很多次。

尝试使用具有相对较短值的socket.settimeout(),如果连接超时,请立即重试。通过这种方式,您可以假装拥有较短的初始 RTT 估计值。

Here is one possibility that comes to mind:

1) Because you are using SOCK_STREAM, you are using the TCP protocol
2) As a reliable protocol, TCP will resend packets that have timed out, to ensure that everything eventually arrives.
3) TCP uses a dynamic timeout value that is calculated based on what the current round-trip time (RTT) is estimated to be
4) When a TCP connection first starts up, it doesn't know what the RTT is, so it uses a very large initial timeout value, sometimes on the order of seconds.

So... if one of the early TCP handshake packets is dropped, your socket could be waiting around a long time before it decides that the packet didn't get there and it resends it. This would happen randomly, relatively rarely, but certainly many times in a million connections.

Try using socket.settimeout() with a relatively short value, and retry it immediately if the connection times out. This way you fake having a shorter initial RTT estimate.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文