为什么在与字符串共享 ctypes.Structure 和仅共享字符串时,子进程(python 多处理)的内存使用情况如此不同?
以下代码使用 multiprocessing
的 Array
跨进程共享大量 unicode 字符串。如果我使用 c_wchar_p 作为类型,则子进程的内存使用量约为父进程中使用的内存的四分之一(如果我更改数组中的条目数量,则该数量会发生变化)。
但是,如果我使用带有单个 c_wchar_p
字段的 ctypes.Structure
,子进程的内存使用量将保持不变且非常低,而父进程的内存使用量会加倍。
import ctypes
import multiprocessing
import random
import resource
import time
a = None
class Record(ctypes.Structure):
_fields_ = [('value', ctypes.c_wchar_p)]
def __init__(self, value):
self.value = value
def __str__(self):
return '(%s)' % (self.value,)
def child(i):
while True:
print "%ik memory used in child %i: %s" % (resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1024, i, a[i])
time.sleep(1)
for j in xrange(len(a)):
c = a[j]
def main():
global a
# uncomment this line and comment the next to switch
#a = multiprocessing.Array(ctypes.c_wchar_p, [u'unicode %r!' % i for i in xrange(1000000)], lock=False)
a = multiprocessing.Array(Record, [Record(u'unicode %r!' % i) for i in xrange(1000000)], lock=False)
for i in xrange(5):
p = multiprocessing.Process(target=child, args=(i + 1,))
p.start()
while True:
print "%ik memory used in parent: %s" % (resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1024, a[0])
time.sleep(1)
if __name__ == '__main__':
main()
使用 c_wchar_p 会产生以下输出:
363224k memory used in parent: unicode 0!
72560k memory used in child 5: unicode 5!
72556k memory used in child 3: unicode 3!
72536k memory used in child 1: unicode 1!
72568k memory used in child 4: unicode 4!
72576k memory used in child 2: unicode 2!
使用 Record 会产生以下输出:
712508k memory used in parent: (unicode 0!)
1912k memory used in child 1: (unicode 1!)
1908k memory used in child 2: (unicode 2!)
1904k memory used in child 5: (unicode 5!)
1904k memory used in child 4: (unicode 4!)
1908k memory used in child 3: (unicode 3!)
为什么?
The following code uses multiprocessing
's Array
to share a large array of unicode strings across processes. If I use c_wchar_p
as the type, the child process' memory usage is about one quarter of memory used in the parent process (the amount changes if I change the amount of entries in the Array).
However, if I use a ctypes.Structure
with a single c_wchar_p
field the child process' memory usage is constant and very low while the parent process' memory usage doubles.
import ctypes
import multiprocessing
import random
import resource
import time
a = None
class Record(ctypes.Structure):
_fields_ = [('value', ctypes.c_wchar_p)]
def __init__(self, value):
self.value = value
def __str__(self):
return '(%s)' % (self.value,)
def child(i):
while True:
print "%ik memory used in child %i: %s" % (resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1024, i, a[i])
time.sleep(1)
for j in xrange(len(a)):
c = a[j]
def main():
global a
# uncomment this line and comment the next to switch
#a = multiprocessing.Array(ctypes.c_wchar_p, [u'unicode %r!' % i for i in xrange(1000000)], lock=False)
a = multiprocessing.Array(Record, [Record(u'unicode %r!' % i) for i in xrange(1000000)], lock=False)
for i in xrange(5):
p = multiprocessing.Process(target=child, args=(i + 1,))
p.start()
while True:
print "%ik memory used in parent: %s" % (resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1024, a[0])
time.sleep(1)
if __name__ == '__main__':
main()
Using c_wchar_p results in this output:
363224k memory used in parent: unicode 0!
72560k memory used in child 5: unicode 5!
72556k memory used in child 3: unicode 3!
72536k memory used in child 1: unicode 1!
72568k memory used in child 4: unicode 4!
72576k memory used in child 2: unicode 2!
Using Record results in this output:
712508k memory used in parent: (unicode 0!)
1912k memory used in child 1: (unicode 1!)
1908k memory used in child 2: (unicode 2!)
1904k memory used in child 5: (unicode 5!)
1904k memory used in child 4: (unicode 4!)
1908k memory used in child 3: (unicode 3!)
Why?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我不知道内存使用量的增加,但我认为它并没有真正实现您想要做的事情。
如果您在父进程中修改
a[i]
,子进程不会获得相同的值。最好不要在进程之间传递指针(这正是
_p
类型)。正如multiprocessing
文档中引用的那样:I don't know about the increase in memory usage but I don't think it is really doing what you intend to do.
If you modify
a[i]
in your parent process, the child processes don't get the same value.It's best not to pass pointers (which is exactly what the
_p
types are) between processes. As quoted frommultiprocessing
docs: