Python 套接字缓冲

发布于 2024-07-19 11:42:26 字数 745 浏览 6 评论 0原文

假设我想使用标准 socket 模块从套接字读取一行:

def read_line(s):
    ret = ''

    while True:
        c = s.recv(1)

        if c == '\n' or c == '':
            break
        else:
            ret += c

    return ret

s.recv(1) 中到底发生了什么? 每次都会发出系统调用吗? 无论如何,我想我应该添加一些缓冲:

为了与硬件和网络实际情况最佳匹配,bufsize的值应该是一个相对较小的2的幂,例如4096。

http://docs.python.org/library/socket.html#socket.socket.recv

但这似乎并不容易写入高效且线程安全的缓冲。 如果我使用 file.readline() 会怎样?

# does this work well, is it efficiently buffered?
s.makefile().readline()

Let's say I want to read a line from a socket, using the standard socket module:

def read_line(s):
    ret = ''

    while True:
        c = s.recv(1)

        if c == '\n' or c == '':
            break
        else:
            ret += c

    return ret

What exactly happens in s.recv(1)? Will it issue a system call each time? I guess I should add some buffering, anyway:

For best match with hardware and network realities, the value of bufsize should be a relatively small power of 2, for example, 4096.

http://docs.python.org/library/socket.html#socket.socket.recv

But it doesn't seem easy to write efficient and thread-safe buffering. What if I use file.readline()?

# does this work well, is it efficiently buffered?
s.makefile().readline()

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

太傻旳人生 2024-07-26 11:42:26

如果您关心性能并完全控制套接字
(例如,您没有将其传递到库中)然后尝试实现
你自己的 Python 缓冲——Python string.find 和 string.split 等可以
快得惊人。

def linesplit(socket):
    buffer = socket.recv(4096)
    buffering = True
    while buffering:
        if "\n" in buffer:
            (line, buffer) = buffer.split("\n", 1)
            yield line + "\n"
        else:
            more = socket.recv(4096)
            if not more:
                buffering = False
            else:
                buffer += more
    if buffer:
        yield buffer

如果您希望有效负载由行组成
不是太大,应该运行得很快,
并避免跳过过多的功能层
不必要地打电话。 我很想知道
这与 file.readline() 或使用 socket.recv(1) 相比如何。

If you are concerned with performance and control the socket completely
(you are not passing it into a library for example) then try implementing
your own buffering in Python -- Python string.find and string.split and such can
be amazingly fast.

def linesplit(socket):
    buffer = socket.recv(4096)
    buffering = True
    while buffering:
        if "\n" in buffer:
            (line, buffer) = buffer.split("\n", 1)
            yield line + "\n"
        else:
            more = socket.recv(4096)
            if not more:
                buffering = False
            else:
                buffer += more
    if buffer:
        yield buffer

If you expect the payload to consist of lines
that are not too huge, that should run pretty fast,
and avoid jumping through too many layers of function
calls unnecessarily. I'd be interesting in knowing
how this compares to file.readline() or using socket.recv(1).

紫轩蝶泪 2024-07-26 11:42:26

recv() 调用是通过调用 C 库函数直接处理的。

它将阻塞等待套接字有数据。 实际上,它只会让 recv() 系统调用阻塞。

file.readline() 是一种高效的缓冲实现。 它不是线程安全的,因为它假定它是唯一读取该文件的人。 (例如,通过缓冲即将到来的输入。)

如果您使用文件对象,则每次使用正参数调用 read() 时,底层代码都会 recv()仅请求的数据量,除非已缓冲。

如果满足以下条件,它将被缓冲:

  • 您调用了 readline(),它读取了一个完整的缓冲区

  • 该行的结尾早于缓冲区

因此将数据留在缓冲区中。 否则缓冲区通常不会被填满。

提问的目的不明确。 如果您需要在读取之前查看数据是否可用,可以使用 select() 或使用 s.setblocking(False) 将套接字设置为非阻塞模式。 然后,如果没有等待数据,则读取将返回空,而不是阻塞。

您正在使用多个线程读取一个文件或套接字吗? 我会让一个工作人员读取套接字并将接收到的项目送入队列以供其他线程处理。

建议咨询 Python Socket 模块源C进行系统调用的源

The recv() call is handled directly by calling the C library function.

It will block waiting for the socket to have data. In reality it will just let the recv() system call block.

file.readline() is an efficient buffered implementation. It is not threadsafe, because it presumes it's the only one reading the file. (For example by buffering upcoming input.)

If you are using the file object, every time read() is called with a positive argument, the underlying code will recv() only the amount of data requested, unless it's already buffered.

It would be buffered if:

  • you had called readline(), which reads a full buffer

  • the end of the line was before the end of the buffer

Thus leaving data in the buffer. Otherwise the buffer is generally not overfilled.

The goal of the question is not clear. if you need to see if data is available before reading, you can select() or set the socket to nonblocking mode with s.setblocking(False). Then, reads will return empty, rather than blocking, if there is no waiting data.

Are you reading one file or socket with multiple threads? I would put a single worker on reading the socket and feeding received items into a queue for handling by other threads.

Suggest consulting Python Socket Module source and C Source that makes the system calls.

乱世争霸 2024-07-26 11:42:26
def buffered_readlines(pull_next_chunk, buf_size=4096):
  """
  pull_next_chunk is callable that should accept one positional argument max_len,
  i.e. socket.recv or file().read and returns string of up to max_len long or
  empty one when nothing left to read.

  >>> for line in buffered_readlines(socket.recv, 16384):
  ...   print line
    ...
  >>> # the following code won't read whole file into memory
  ... # before splitting it into lines like .readlines method
  ... # of file does. Also it won't block until FIFO-file is closed
  ...
  >>> for line in buffered_readlines(open('huge_file').read):
  ...   # process it on per-line basis
        ...
  >>>
  """
  chunks = []
  while True:
    chunk = pull_next_chunk(buf_size)
    if not chunk:
      if chunks:
        yield ''.join(chunks)
      break
    if not '\n' in chunk:
      chunks.append(chunk)
      continue
    chunk = chunk.split('\n')
    if chunks:
      yield ''.join(chunks + [chunk[0]])
    else:
      yield chunk[0]
    for line in chunk[1:-1]:
      yield line
    if chunk[-1]:
      chunks = [chunk[-1]]
    else:
      chunks = []
def buffered_readlines(pull_next_chunk, buf_size=4096):
  """
  pull_next_chunk is callable that should accept one positional argument max_len,
  i.e. socket.recv or file().read and returns string of up to max_len long or
  empty one when nothing left to read.

  >>> for line in buffered_readlines(socket.recv, 16384):
  ...   print line
    ...
  >>> # the following code won't read whole file into memory
  ... # before splitting it into lines like .readlines method
  ... # of file does. Also it won't block until FIFO-file is closed
  ...
  >>> for line in buffered_readlines(open('huge_file').read):
  ...   # process it on per-line basis
        ...
  >>>
  """
  chunks = []
  while True:
    chunk = pull_next_chunk(buf_size)
    if not chunk:
      if chunks:
        yield ''.join(chunks)
      break
    if not '\n' in chunk:
      chunks.append(chunk)
      continue
    chunk = chunk.split('\n')
    if chunks:
      yield ''.join(chunks + [chunk[0]])
    else:
      yield chunk[0]
    for line in chunk[1:-1]:
      yield line
    if chunk[-1]:
      chunks = [chunk[-1]]
    else:
      chunks = []
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文