Python 套接字缓冲

发布于 2024-07-19 11:42:26 字数 745 浏览 6 评论 0原文

假设我想使用标准 socket 模块从套接字读取一行：

def read_line(s):
    ret = ''

    while True:
        c = s.recv(1)

        if c == '\n' or c == '':
            break
        else:
            ret += c

    return ret

s.recv(1) 中到底发生了什么？每次都会发出系统调用吗？无论如何，我想我应该添加一些缓冲：

为了与硬件和网络实际情况最佳匹配，bufsize的值应该是一个相对较小的2的幂，例如4096。

http://docs.python.org/library/socket.html#socket.socket.recv

但这似乎并不容易写入高效且线程安全的缓冲。如果我使用 file.readline() 会怎样？

# does this work well, is it efficiently buffered?
s.makefile().readline()

原文

Let's say I want to read a line from a socket, using the standard socket module:

def read_line(s):
    ret = ''

    while True:
        c = s.recv(1)

        if c == '\n' or c == '':
            break
        else:
            ret += c

    return ret

What exactly happens in s.recv(1)? Will it issue a system call each time? I guess I should add some buffering, anyway:

For best match with hardware and network realities, the value of bufsize should be a relatively small power of 2, for example, 4096.

http://docs.python.org/library/socket.html#socket.socket.recv

But it doesn't seem easy to write efficient and thread-safe buffering. What if I use file.readline()?

# does this work well, is it efficiently buffered?
s.makefile().readline()

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

太傻旳人生 2024-07-26 11:42:26

如果您关心性能并完全控制套接字
（例如，您没有将其传递到库中）然后尝试实现
你自己的 Python 缓冲——Python string.find 和 string.split 等可以
快得惊人。

def linesplit(socket):
    buffer = socket.recv(4096)
    buffering = True
    while buffering:
        if "\n" in buffer:
            (line, buffer) = buffer.split("\n", 1)
            yield line + "\n"
        else:
            more = socket.recv(4096)
            if not more:
                buffering = False
            else:
                buffer += more
    if buffer:
        yield buffer

如果您希望有效负载由行组成
不是太大，应该运行得很快，
并避免跳过过多的功能层
不必要地打电话。我很想知道
这与 file.readline() 或使用 socket.recv(1) 相比如何。

If you are concerned with performance and control the socket completely
(you are not passing it into a library for example) then try implementing
your own buffering in Python -- Python string.find and string.split and such can
be amazingly fast.

def linesplit(socket):
    buffer = socket.recv(4096)
    buffering = True
    while buffering:
        if "\n" in buffer:
            (line, buffer) = buffer.split("\n", 1)
            yield line + "\n"
        else:
            more = socket.recv(4096)
            if not more:
                buffering = False
            else:
                buffer += more
    if buffer:
        yield buffer

If you expect the payload to consist of lines
that are not too huge, that should run pretty fast,
and avoid jumping through too many layers of function
calls unnecessarily. I'd be interesting in knowing
how this compares to file.readline() or using socket.recv(1).

回复收藏 0 原文

紫轩蝶泪 2024-07-26 11:42:26

recv() 调用是通过调用 C 库函数直接处理的。

它将阻塞等待套接字有数据。实际上，它只会让 recv() 系统调用阻塞。

file.readline() 是一种高效的缓冲实现。它不是线程安全的，因为它假定它是唯一读取该文件的人。（例如，通过缓冲即将到来的输入。）

如果您使用文件对象，则每次使用正参数调用 read() 时，底层代码都会 recv()仅请求的数据量，除非已缓冲。

如果满足以下条件，它将被缓冲：

您调用了 readline()，它读取了一个完整的缓冲区
该行的结尾早于缓冲区

因此将数据留在缓冲区中。否则缓冲区通常不会被填满。

提问的目的不明确。如果您需要在读取之前查看数据是否可用，可以使用 select() 或使用 s.setblocking(False) 将套接字设置为非阻塞模式。然后，如果没有等待数据，则读取将返回空，而不是阻塞。

您正在使用多个线程读取一个文件或套接字吗？我会让一个工作人员读取套接字并将接收到的项目送入队列以供其他线程处理。

建议咨询 Python Socket 模块源和C进行系统调用的源。

回复收藏 0 原文

乱世争霸 2024-07-26 11:42:26

def buffered_readlines(pull_next_chunk, buf_size=4096):
  """
  pull_next_chunk is callable that should accept one positional argument max_len,
  i.e. socket.recv or file().read and returns string of up to max_len long or
  empty one when nothing left to read.

  >>> for line in buffered_readlines(socket.recv, 16384):
  ...   print line
    ...
  >>> # the following code won't read whole file into memory
  ... # before splitting it into lines like .readlines method
  ... # of file does. Also it won't block until FIFO-file is closed
  ...
  >>> for line in buffered_readlines(open('huge_file').read):
  ...   # process it on per-line basis
        ...
  >>>
  """
  chunks = []
  while True:
    chunk = pull_next_chunk(buf_size)
    if not chunk:
      if chunks:
        yield ''.join(chunks)
      break
    if not '\n' in chunk:
      chunks.append(chunk)
      continue
    chunk = chunk.split('\n')
    if chunks:
      yield ''.join(chunks + [chunk[0]])
    else:
      yield chunk[0]
    for line in chunk[1:-1]:
      yield line
    if chunk[-1]:
      chunks = [chunk[-1]]
    else:
      chunks = []

def buffered_readlines(pull_next_chunk, buf_size=4096):
  """
  pull_next_chunk is callable that should accept one positional argument max_len,
  i.e. socket.recv or file().read and returns string of up to max_len long or
  empty one when nothing left to read.

  >>> for line in buffered_readlines(socket.recv, 16384):
  ...   print line
    ...
  >>> # the following code won't read whole file into memory
  ... # before splitting it into lines like .readlines method
  ... # of file does. Also it won't block until FIFO-file is closed
  ...
  >>> for line in buffered_readlines(open('huge_file').read):
  ...   # process it on per-line basis
        ...
  >>>
  """
  chunks = []
  while True:
    chunk = pull_next_chunk(buf_size)
    if not chunk:
      if chunks:
        yield ''.join(chunks)
      break
    if not '\n' in chunk:
      chunks.append(chunk)
      continue
    chunk = chunk.split('\n')
    if chunks:
      yield ''.join(chunks + [chunk[0]])
    else:
      yield chunk[0]
    for line in chunk[1:-1]:
      yield line
    if chunk[-1]:
      chunks = [chunk[-1]]
    else:
      chunks = []

回复收藏 0 原文

~没有更多了~