Python 套接字缓冲
假设我想使用标准 socket
模块从套接字读取一行:
def read_line(s):
ret = ''
while True:
c = s.recv(1)
if c == '\n' or c == '':
break
else:
ret += c
return ret
s.recv(1)
中到底发生了什么? 每次都会发出系统调用吗? 无论如何,我想我应该添加一些缓冲:
为了与硬件和网络实际情况最佳匹配,bufsize的值应该是一个相对较小的2的幂,例如4096。
http://docs.python.org/library/socket.html#socket.socket.recv
但这似乎并不容易写入高效且线程安全的缓冲。 如果我使用 file.readline() 会怎样?
# does this work well, is it efficiently buffered?
s.makefile().readline()
Let's say I want to read a line from a socket, using the standard socket
module:
def read_line(s):
ret = ''
while True:
c = s.recv(1)
if c == '\n' or c == '':
break
else:
ret += c
return ret
What exactly happens in s.recv(1)
? Will it issue a system call each time? I guess I should add some buffering, anyway:
For best match with hardware and network realities, the value of bufsize should be a relatively small power of 2, for example, 4096.
http://docs.python.org/library/socket.html#socket.socket.recv
But it doesn't seem easy to write efficient and thread-safe buffering. What if I use file.readline()
?
# does this work well, is it efficiently buffered?
s.makefile().readline()
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果您关心性能并完全控制套接字
(例如,您没有将其传递到库中)然后尝试实现
你自己的 Python 缓冲——Python string.find 和 string.split 等可以
快得惊人。
如果您希望有效负载由行组成
不是太大,应该运行得很快,
并避免跳过过多的功能层
不必要地打电话。 我很想知道
这与 file.readline() 或使用 socket.recv(1) 相比如何。
If you are concerned with performance and control the socket completely
(you are not passing it into a library for example) then try implementing
your own buffering in Python -- Python string.find and string.split and such can
be amazingly fast.
If you expect the payload to consist of lines
that are not too huge, that should run pretty fast,
and avoid jumping through too many layers of function
calls unnecessarily. I'd be interesting in knowing
how this compares to file.readline() or using socket.recv(1).
recv()
调用是通过调用 C 库函数直接处理的。它将阻塞等待套接字有数据。 实际上,它只会让
recv()
系统调用阻塞。file.readline()
是一种高效的缓冲实现。 它不是线程安全的,因为它假定它是唯一读取该文件的人。 (例如,通过缓冲即将到来的输入。)如果您使用文件对象,则每次使用正参数调用
read()
时,底层代码都会recv()
仅请求的数据量,除非已缓冲。如果满足以下条件,它将被缓冲:
您调用了 readline(),它读取了一个完整的缓冲区
该行的结尾早于缓冲区
因此将数据留在缓冲区中。 否则缓冲区通常不会被填满。
提问的目的不明确。 如果您需要在读取之前查看数据是否可用,可以使用
select()
或使用s.setblocking(False)
将套接字设置为非阻塞模式。 然后,如果没有等待数据,则读取将返回空,而不是阻塞。您正在使用多个线程读取一个文件或套接字吗? 我会让一个工作人员读取套接字并将接收到的项目送入队列以供其他线程处理。
建议咨询 Python Socket 模块源和C进行系统调用的源。
The
recv()
call is handled directly by calling the C library function.It will block waiting for the socket to have data. In reality it will just let the
recv()
system call block.file.readline()
is an efficient buffered implementation. It is not threadsafe, because it presumes it's the only one reading the file. (For example by buffering upcoming input.)If you are using the file object, every time
read()
is called with a positive argument, the underlying code willrecv()
only the amount of data requested, unless it's already buffered.It would be buffered if:
you had called readline(), which reads a full buffer
the end of the line was before the end of the buffer
Thus leaving data in the buffer. Otherwise the buffer is generally not overfilled.
The goal of the question is not clear. if you need to see if data is available before reading, you can
select()
or set the socket to nonblocking mode withs.setblocking(False)
. Then, reads will return empty, rather than blocking, if there is no waiting data.Are you reading one file or socket with multiple threads? I would put a single worker on reading the socket and feeding received items into a queue for handling by other threads.
Suggest consulting Python Socket Module source and C Source that makes the system calls.