FreeBSD 上的 Python 字符串内存使用情况
我观察到 python 字符串出现奇怪的内存使用模式 自由软件。考虑 下届会议。想法是创建一个包含一些内容的列表 字符串,以便列表中的累积字符为 100MB。
l = []
for i in xrange(100000):
l.append(str(i) * (1000/len(str(i))))
这会按预期使用大约 100MB 的内存,“del l”将清除该内存。
l = []
for i in xrange(20000):
l.append(str(i) * (5000/len(str(i))))
这使用了 165MB 内存。我真的不明白在哪里 额外的内存使用来自于。 [两个列表的大小相同]
FreeBSD 7.2 上的 Python 2.6.4。在 Linux/windows 上都使用 around 仅 100MB 内存。
更新:我正在使用“ps aux”测量内存。可以在上述代码片段之后使用 os.sytem 执行。这些也是单独执行的。
Update2:看起来像 freebsd malloc 内存是 2 的倍数。所以分配 5KB 实际上分配了 8KB。但我不确定。
I'm observing a strange memory usage pattern with python strings on
Freebsd. Consider
the following session. Idea is to create a list which holds some
strings so that cumulative characters in the list is 100MB.
l = []
for i in xrange(100000):
l.append(str(i) * (1000/len(str(i))))
This uses around 100MB of memory as expected and 'del l' will clear that.
l = []
for i in xrange(20000):
l.append(str(i) * (5000/len(str(i))))
This is using 165MB of memory. I really don't understand where the
additional memory usage is coming from. [Size of both lists are same]
Python 2.6.4 on FreeBSD 7.2. On Linux/windows both uses around
100mb memory only.
Update: I'm measuring memory using 'ps aux'. That can be executed using os.sytem after above code snippets. Also These were executed separately.
Update2: Looks like freebsd mallocs memory in multiples of 2. So allocating 5KB actually allocates 8KB. I'm not sure though.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
在我看来,那可能就是记忆中的碎片。首先,大于255字节的内存块将在CPython中使用malloc分配。您可以参考
改进Python的内存分配器
出于性能原因,大多数内存分配,例如malloc,都会返回对齐的地址。例如,你永远不会得到像
“它没有按4字节对齐”这样的地址,计算机访问内存会非常慢。因此,你通过malloc得到的所有地址都应该是
等等。 4字节对齐只是基本的通用规则,真正的对齐策略将随操作系统的变化而变化。
而你所说的内存使用量应该是RSS(不确定)。对于大多数操作系统,虚拟内存的页面大小为 4K。对于您分配的内容,您需要 2 个页面来存储 5000 字节的块。让我们看一个例子来说明一些内存泄漏。我们假设这里的对齐方式是 256 字节。
可以看到,内存中的碎片太多了,无法使用,但仍然占用了一个页面的内存空间。我不确定FreeBSD的对齐策略是什么,但我认为这是由这样的原因引起的。为了通过 Python 有效地使用内存,您可以使用一大块预先分配的 bytearray,然后选择一个好的数字作为要使用的块(您必须测试才能知道哪个数字最好,这取决于操作系统)。
In my opinion, that would probably be fragments in memory. First of all, memory chunks which are bigger than 255 bytes will be allocated with malloc in CPython. You can reference to
Improving Python's Memory Allocator
For performance reason, most of memory allocation, like malloc, will return a aligned address. For example, you will never get a address like
It is not aligned by 4 bytes, it would be very slow for computer to access the memory. Therefore, all address you get by malloc should be
and so on. The 4 bytes alignment is only the basic common rule, real policy of alignment would be OS variant.
And the memory usage you are talking about should be RSS (not sure). For most of OS, page size of virtual memory is 4K. For what you allocate, you need 2 page for storing a 5000 byte chunk. Let's see an example for illustrating some memory leak. We assume the alignment is by 256 bytes here.
As you can see there are so many fragments in the memory, they can't be used, but still, they occupy the memory space of a page. I'm not sure what is the alignment policy of FreeBSD, but I think it is caused by reason like this. For using memory efficiently with Python, you can use a big chunk of pre-allocated bytearray, and pick a good number as the chunk to use (You have to test to know which number is best, it depends on OS).
答案可能就在这个传奇中。我认为您正在目睹一些不可避免的内存管理器开销。
正如 @Hossein 所说,尝试在一次运行中执行两个代码片段,然后交换它们。
The answer may be in this saga. I think that you're witnessing some unavoidable memory manager overhead.
As @Hossein says, try executing both code snippets in one run, and then swap them.
我认为freebsd中的所有内存地址都必须与2的幂对齐。所以所有python的内存池在内存中都有些碎片化并且不连续。
尝试使用其他工具来发现任何有趣的东西
I think that all the memory addresses in freebsd have to be aligned to power of two. So all python's memory pools are somewhat fragmented into the memory and not continuous.
Try to use some other tool to spot anything interesting