可以从左侧截断的 Python 缓冲区?

发布于 2024-07-24 10:36:45 字数 538 浏览 5 评论 0原文

现在,我使用字符串、StringIO 或 cStringIO 缓冲字节。 但是,我经常需要从缓冲区左侧删除字节。 一种简单的方法是重建整个缓冲区。 如果左截断是一种非常常见的操作,是否有最佳方法来做到这一点? Python 的垃圾收集器实际上应该 GC 被截断的字节。

任何类型的算法(将缓冲区保持在小块中?)或现有的实现都会真正有帮助。

编辑:

我尝试为此使用Python 2.7的内存视图,但遗憾的是,当原始引用被删除时,“视图”之外的数据不会被GCed:

# (This will use ~2GB of memory, not 50MB)

memoryview # Requires Python 2.7+

smalls = []

for i in xrange(10):
    big = memoryview('z'*(200*1000*1000))
    small = big[195*1000*1000:]
    del big
    smalls.append(small)
    print '.',

Right now, I am buffering bytes using strings, StringIO, or cStringIO. But, I often need to remove bytes from the left side of the buffer. A naive approach would rebuild the entire buffer. Is there an optimal way to do this, if left-truncating is a very common operation? Python's garbage collector should actually GC the truncated bytes.

Any sort of algorithm for this (keep the buffer in small pieces?), or an existing implementation, would really help.

Edit:

I tried to use Python 2.7's memoryview for this, but sadly, the data outside the "view" isn't GCed when the original reference is deleted:

# (This will use ~2GB of memory, not 50MB)

memoryview # Requires Python 2.7+

smalls = []

for i in xrange(10):
    big = memoryview('z'*(200*1000*1000))
    small = big[195*1000*1000:]
    del big
    smalls.append(small)
    print '.',

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

一刻暧昧 2024-07-31 10:36:45

如果左删除操作频繁,双端队列将是高效的(与使用列表、字符串或缓冲区,它的分摊时间为 O(1)(对于任一端删除)。 然而,它在内存方面比字符串更昂贵,因为您将每个字符存储为其自己的字符串对象,而不是打包序列。

或者,您可以创建自己的实现(例如固定大小的字符串/缓冲区对象的链接列表),这可以更紧凑地存储数据。

A deque will be efficient if left-removal operations are frequent (Unlike using a list, string or buffer, it's amortised O(1) for either-end removal). It will be more costly memory-wise than a string however, as you'll be storing each character as its own string object, rather than a packed sequence.

Alternatively, you could create your own implementation (eg. a linked list of string / buffer objects of fixed size), which may store the data more compactly.

鸠书 2024-07-31 10:36:45

将缓冲区构建为字符或行列表并对列表进行切片。 仅在输出中作为字符串连接。 这对于大多数类型的“可变字符串”行为非常有效。

GC 将收集被截断的字节,因为它们不再在列表中被引用。

更新:要修改列表头,您可以简单地反转列表。 这听起来像是一件低效的事情,但是 python 的列表实现在内部对此进行了优化。

来自 http://effbot.org/zone/python-list.htm

倒车很快,所以暂时
颠倒列表通常可以加快速度
如果你需要删除和
插入一堆项目
列表开头:

L.reverse() 
  # 在远端追加/插入/弹出/删除 
  L.reverse() 
  

Build your buffer as a list of characters or lines and slice the list. Only join as string on output. This is pretty efficient for most types of 'mutable string' behaviour.

The GC will collect the truncated bytes because they are no longer referenced in the list.

UPDATE: For modifying the list head you can simply reverse the list. This sounds like an inefficient thing to do however python's list implementation optimises this internally.

from http://effbot.org/zone/python-list.htm :

Reversing is fast, so temporarily
reversing the list can often speed
things up if you need to remove and
insert a bunch of items at the
beginning of the list:

L.reverse()
# append/insert/pop/delete at far end
L.reverse()
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文