FreeBSD 上的 Python 字符串内存使用情况

发布于 2024-10-22 12:18:10 字数 601 浏览 5 评论 0原文

我观察到 python 字符串出现奇怪的内存使用模式自由软件。考虑下届会议。想法是创建一个包含一些内容的列表字符串，以便列表中的累积字符为 100MB。

l = []
for i in xrange(100000):
    l.append(str(i) * (1000/len(str(i))))

这会按预期使用大约 100MB 的内存，“del l”将清除该内存。

l = []
for i in xrange(20000):
    l.append(str(i) * (5000/len(str(i))))

这使用了 165MB 内存。我真的不明白在哪里额外的内存使用来自于。 [两个列表的大小相同]

FreeBSD 7.2 上的 Python 2.6.4。在 Linux/windows 上都使用 around 仅 100MB 内存。

更新：我正在使用“ps aux”测量内存。可以在上述代码片段之后使用 os.sytem 执行。这些也是单独执行的。

Update2：看起来像 freebsd malloc 内存是 2 的倍数。所以分配 5KB 实际上分配了 8KB。但我不确定。

原文

I'm observing a strange memory usage pattern with python strings on
Freebsd. Consider
the following session. Idea is to create a list which holds some
strings so that cumulative characters in the list is 100MB.

l = []
for i in xrange(100000):
    l.append(str(i) * (1000/len(str(i))))

This uses around 100MB of memory as expected and 'del l' will clear that.

l = []
for i in xrange(20000):
    l.append(str(i) * (5000/len(str(i))))

This is using 165MB of memory. I really don't understand where the
additional memory usage is coming from. [Size of both lists are same]

Python 2.6.4 on FreeBSD 7.2. On Linux/windows both uses around
100mb memory only.

Update: I'm measuring memory using 'ps aux'. That can be executed using os.sytem after above code snippets. Also These were executed separately.

Update2: Looks like freebsd mallocs memory in multiples of 2. So allocating 5KB actually allocates 8KB. I'm not sure though.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

情绪 2024-10-29 12:18:11

在我看来，那可能就是记忆中的碎片。首先，大于255字节的内存块将在CPython中使用malloc分配。您可以参考

改进Python的内存分配器

出于性能原因，大多数内存分配，例如malloc，都会返回对齐的地址。例如，你永远不会得到像

0x00003

“它没有按4字节对齐”这样的地址，计算机访问内存会非常慢。因此，你通过malloc得到的所有地址都应该是

0x00000
0x00004
0x00008

等等。 4字节对齐只是基本的通用规则，真正的对齐策略将随操作系统的变化而变化。

而你所说的内存使用量应该是RSS（不确定）。对于大多数操作系统，虚拟内存的页面大小为 4K。对于您分配的内容，您需要 2 个页面来存储 5000 字节的块。让我们看一个例子来说明一些内存泄漏。我们假设这里的对齐方式是 256 字节。

0x00000 {
...       chunk 1
0x01388 }
0x01389 {
...       fragment 1
0x013FF }
0x01400 {
...       chunk 2
0x02788 }
0x02789 {
...       fragment 2
0x027FF }
0x02800 {
...       chunk 3
0x03B88 }
0x03B89 {
...       fragment 3
0x04000 }

可以看到，内存中的碎片太多了，无法使用，但仍然占用了一个页面的内存空间。我不确定FreeBSD的对齐策略是什么，但我认为这是由这样的原因引起的。为了通过 Python 有效地使用内存，您可以使用一大块预先分配的 bytearray，然后选择一个好的数字作为要使用的块（您必须测试才能知道哪个数字最好，这取决于操作系统）。

In my opinion, that would probably be fragments in memory. First of all, memory chunks which are bigger than 255 bytes will be allocated with malloc in CPython. You can reference to

Improving Python's Memory Allocator

For performance reason, most of memory allocation, like malloc, will return a aligned address. For example, you will never get a address like

0x00003

It is not aligned by 4 bytes, it would be very slow for computer to access the memory. Therefore, all address you get by malloc should be

0x00000
0x00004
0x00008

and so on. The 4 bytes alignment is only the basic common rule, real policy of alignment would be OS variant.

And the memory usage you are talking about should be RSS (not sure). For most of OS, page size of virtual memory is 4K. For what you allocate, you need 2 page for storing a 5000 byte chunk. Let's see an example for illustrating some memory leak. We assume the alignment is by 256 bytes here.

0x00000 {
...       chunk 1
0x01388 }
0x01389 {
...       fragment 1
0x013FF }
0x01400 {
...       chunk 2
0x02788 }
0x02789 {
...       fragment 2
0x027FF }
0x02800 {
...       chunk 3
0x03B88 }
0x03B89 {
...       fragment 3
0x04000 }

As you can see there are so many fragments in the memory, they can't be used, but still, they occupy the memory space of a page. I'm not sure what is the alignment policy of FreeBSD, but I think it is caused by reason like this. For using memory efficiently with Python, you can use a big chunk of pre-allocated bytearray, and pick a good number as the chunk to use (You have to test to know which number is best, it depends on OS).

回复收藏 0 原文