为什么在与字符串共享 ctypes.Structure 和仅共享字符串时,子进程(python 多处理)的内存使用情况如此不同?

发布于 2025-01-06 01:39:02 字数 2036 浏览 1 评论 0原文

以下代码使用 multiprocessingArray 跨进程共享大量 unicode 字符串。如果我使用 c_wchar_p 作为类型,则子进程的内存使用量约为父进程中使用的内存的四分之一(如果我更改数组中的条目数量,则该数量会发生变化)。

但是,如果我使用带有单个 c_wchar_p 字段的 ctypes.Structure ,子进程的内存使用量将保持不变且非常低,而父进程的内存使用量会加倍。

import ctypes
import multiprocessing
import random
import resource
import time

a = None

class Record(ctypes.Structure):
    _fields_ = [('value', ctypes.c_wchar_p)]
    def __init__(self, value):
        self.value = value

    def __str__(self):
        return '(%s)' % (self.value,)

def child(i):
    while True:
        print "%ik memory used in child %i: %s" % (resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1024, i, a[i])
        time.sleep(1)
        for j in xrange(len(a)):
            c = a[j]

def main():
    global a
    # uncomment this line and comment the next to switch
    #a = multiprocessing.Array(ctypes.c_wchar_p, [u'unicode %r!' % i for i in xrange(1000000)], lock=False)
    a = multiprocessing.Array(Record, [Record(u'unicode %r!' % i) for i in xrange(1000000)], lock=False)
    for i in xrange(5):
        p = multiprocessing.Process(target=child, args=(i + 1,))
        p.start()
    while True:
        print "%ik memory used in parent: %s" % (resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1024, a[0])
        time.sleep(1)

if __name__ == '__main__':
    main()

使用 c_wchar_p 会产生以下输出:

363224k memory used in parent: unicode 0!
72560k memory used in child 5: unicode 5!
72556k memory used in child 3: unicode 3!
72536k memory used in child 1: unicode 1!
72568k memory used in child 4: unicode 4!
72576k memory used in child 2: unicode 2!

使用 Record 会产生以下输出:

712508k memory used in parent: (unicode 0!)
1912k memory used in child 1: (unicode 1!)
1908k memory used in child 2: (unicode 2!)
1904k memory used in child 5: (unicode 5!)
1904k memory used in child 4: (unicode 4!)
1908k memory used in child 3: (unicode 3!)

为什么?

The following code uses multiprocessing's Array to share a large array of unicode strings across processes. If I use c_wchar_p as the type, the child process' memory usage is about one quarter of memory used in the parent process (the amount changes if I change the amount of entries in the Array).

However, if I use a ctypes.Structure with a single c_wchar_p field the child process' memory usage is constant and very low while the parent process' memory usage doubles.

import ctypes
import multiprocessing
import random
import resource
import time

a = None

class Record(ctypes.Structure):
    _fields_ = [('value', ctypes.c_wchar_p)]
    def __init__(self, value):
        self.value = value

    def __str__(self):
        return '(%s)' % (self.value,)

def child(i):
    while True:
        print "%ik memory used in child %i: %s" % (resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1024, i, a[i])
        time.sleep(1)
        for j in xrange(len(a)):
            c = a[j]

def main():
    global a
    # uncomment this line and comment the next to switch
    #a = multiprocessing.Array(ctypes.c_wchar_p, [u'unicode %r!' % i for i in xrange(1000000)], lock=False)
    a = multiprocessing.Array(Record, [Record(u'unicode %r!' % i) for i in xrange(1000000)], lock=False)
    for i in xrange(5):
        p = multiprocessing.Process(target=child, args=(i + 1,))
        p.start()
    while True:
        print "%ik memory used in parent: %s" % (resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1024, a[0])
        time.sleep(1)

if __name__ == '__main__':
    main()

Using c_wchar_p results in this output:

363224k memory used in parent: unicode 0!
72560k memory used in child 5: unicode 5!
72556k memory used in child 3: unicode 3!
72536k memory used in child 1: unicode 1!
72568k memory used in child 4: unicode 4!
72576k memory used in child 2: unicode 2!

Using Record results in this output:

712508k memory used in parent: (unicode 0!)
1912k memory used in child 1: (unicode 1!)
1908k memory used in child 2: (unicode 2!)
1904k memory used in child 5: (unicode 5!)
1904k memory used in child 4: (unicode 4!)
1908k memory used in child 3: (unicode 3!)

Why?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

哽咽笑 2025-01-13 01:39:02

我不知道内存使用量的增加,但我认为它并没有真正实现您想要做的事情。

如果您在父进程中修改 a[i],子进程不会获得相同的值。

最好不要在进程之间传递指针(这正是 _p 类型)。正如 multiprocessing 文档中引用的那样:

虽然可以在共享内存中存储指针,但请记住,这将引用特定进程的地址空间中的位置。但是,该指针很可能在第二个进程的上下文中无效,并且尝试从第二个进程取消引用该指针可能会导致崩溃。

I don't know about the increase in memory usage but I don't think it is really doing what you intend to do.

If you modify a[i] in your parent process, the child processes don't get the same value.

It's best not to pass pointers (which is exactly what the _p types are) between processes. As quoted from multiprocessing docs:

Although it is possible to store a pointer in shared memory remember that this will refer to a location in the address space of a specific process. However, the pointer is quite likely to be invalid in the context of a second process and trying to dereference the pointer from the second process may cause a crash.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文