套接字性能激增
我们在使用 IPC 套接字的高吞吐量事务处理系统中面临着随机峰值。
以下是用于运行的设置:
- 客户端为每个事务打开和关闭新连接,并且服务器和客户端之间有 4 次交换。
- 我们通过 getsockopt 设置套接字延迟 (
SO_LINGER
) 选项来禁用TIME_WAIT
,因为我们认为峰值是由于套接字在TIME_WAIT< 中等待而引起的/代码>。
- 没有对交易进行任何处理。仅传递消息。
- 操作系统使用 Centos 5.4
平均往返时间约为 3 毫秒,但有时往返时间范围从 100 毫秒到几秒。
用于执行、测量和输出的步骤
启动服务器
$ python sockServerLinger.py > /dev/null &
启动客户端向服务器发送 100 万笔交易。并在 client.log 文件中记录事务的时间。
$ python sockClient.py 1000000 > client.log
执行完成后,以下命令将以
$ grep -n "0.[1-9]" client.log |
格式显示大于 100 毫秒的执行时间。: $ grep -n "0.[1-9]" client.log | less
下面是服务器和客户端的示例代码。
服务器
# File: sockServerLinger.py
import socket, traceback,time
import struct
host = ''
port = 9999
l_onoff = 1
l_linger = 0
lingeropt = struct.pack('ii', l_onoff, l_linger)
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.setsockopt(socket.SOL_SOCKET, socket.SO_LINGER, lingeropt)
s.bind((host, port))
s.listen(1)
while 1:
try:
clientsock, clientaddr = s.accept()
print "Got connection from", clientsock.getpeername()
data = clientsock.recv(1024*1024*10)
#print "asdasd",data
numsent=clientsock.send(data)
data1 = clientsock.recv(1024*1024*10)
numsent=clientsock.send(data)
ret = 1
while(ret>0):
data1 = clientsock.recv(1024*1024*10)
ret = len(data)
clientsock.close()
except KeyboardInterrupt:
raise
except:
print traceback.print_exc()
continue
客户端
# File: sockClient.py
import socket, traceback,sys
import time
i = 0
while 1:
try:
st = time.time()
s = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
while (s.connect_ex(('127.0.0.1',9999)) != 0):
continue
numsent=s.send("asd"*1000)
response = s.recv(6000)
numsent=s.send("asd"*1000)
response = s.recv(6000)
i+=1
if i == int(sys.argv[1]):
break
except KeyboardInterrupt:
raise
except:
print "in exec:::::::::::::",traceback.print_exc()
continue
print time.time() -st
We are facing random spikes in high throughput transaction processing system using sockets for IPC.
Below is the setup used for the run:
- The client opens and closes new connection for every transaction, and there are 4 exchanges between the server and the client.
- We have disabled the
TIME_WAIT
, by setting the socket linger (SO_LINGER
) option via getsockopt as we thought that the spikes were caused due to the sockets waiting inTIME_WAIT
. - There is no processing done for the transaction. Only messages are passed.
- OS used Centos 5.4
The average round trip time is around 3 milli seconds, but some times the round trip time ranges from 100 milli seconds to couple of seconds.
Steps used for Execution and Measurement and output
Starting the server
$ python sockServerLinger.py > /dev/null &
Starting the client to post 1 million transactions to the server. And logs the time for a transaction in the client.log file.
$ python sockClient.py 1000000 > client.log
Once the execution finishes the following command will show the execution time greater than 100 milliseconds in the format
<line_number>:<execution_time>
.$ grep -n "0.[1-9]" client.log | less
Below is the example code for Server and Client.
Server
# File: sockServerLinger.py
import socket, traceback,time
import struct
host = ''
port = 9999
l_onoff = 1
l_linger = 0
lingeropt = struct.pack('ii', l_onoff, l_linger)
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.setsockopt(socket.SOL_SOCKET, socket.SO_LINGER, lingeropt)
s.bind((host, port))
s.listen(1)
while 1:
try:
clientsock, clientaddr = s.accept()
print "Got connection from", clientsock.getpeername()
data = clientsock.recv(1024*1024*10)
#print "asdasd",data
numsent=clientsock.send(data)
data1 = clientsock.recv(1024*1024*10)
numsent=clientsock.send(data)
ret = 1
while(ret>0):
data1 = clientsock.recv(1024*1024*10)
ret = len(data)
clientsock.close()
except KeyboardInterrupt:
raise
except:
print traceback.print_exc()
continue
Client
# File: sockClient.py
import socket, traceback,sys
import time
i = 0
while 1:
try:
st = time.time()
s = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
while (s.connect_ex(('127.0.0.1',9999)) != 0):
continue
numsent=s.send("asd"*1000)
response = s.recv(6000)
numsent=s.send("asd"*1000)
response = s.recv(6000)
i+=1
if i == int(sys.argv[1]):
break
except KeyboardInterrupt:
raise
except:
print "in exec:::::::::::::",traceback.print_exc()
continue
print time.time() -st
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我想到了一种可能性:
1)因为您使用的是 SOCK_STREAM,所以您正在使用 TCP 协议
2)作为一个可靠的协议,TCP会重新发送超时的数据包,以确保一切最终到达。
3) TCP 使用动态超时值,该值是根据当前往返时间 (RTT) 的估计值计算得出的
4)当TCP连接第一次启动时,它不知道RTT是多少,因此它使用非常大的初始超时值,有时约为秒。
所以...如果早期的 TCP 握手数据包之一被丢弃,您的套接字可能会等待很长时间,然后才确定该数据包没有到达那里并重新发送它。这种情况会随机发生,相对较少,但在一百万个连接中肯定会发生很多次。
尝试使用具有相对较短值的socket.settimeout(),如果连接超时,请立即重试。通过这种方式,您可以假装拥有较短的初始 RTT 估计值。
Here is one possibility that comes to mind:
1) Because you are using SOCK_STREAM, you are using the TCP protocol
2) As a reliable protocol, TCP will resend packets that have timed out, to ensure that everything eventually arrives.
3) TCP uses a dynamic timeout value that is calculated based on what the current round-trip time (RTT) is estimated to be
4) When a TCP connection first starts up, it doesn't know what the RTT is, so it uses a very large initial timeout value, sometimes on the order of seconds.
So... if one of the early TCP handshake packets is dropped, your socket could be waiting around a long time before it decides that the packet didn't get there and it resends it. This would happen randomly, relatively rarely, but certainly many times in a million connections.
Try using socket.settimeout() with a relatively short value, and retry it immediately if the connection times out. This way you fake having a shorter initial RTT estimate.