循环引用的对象没有被垃圾收集
我有一个方便的小类,我在代码中经常使用它,如下所示:
class Structure(dict):
def __init__(self, **kwargs):
dict.__init__(self, **kwargs)
self.__dict__ = self
它的好处是您可以使用字典键语法或常用对象样式访问属性:
myStructure = Structure(name="My Structure")
print myStructure["name"]
print myStructure.name
今天我注意到我的应用程序内存消耗在我预期会减少的情况下略有增加。在我看来,从 Structure 类生成的实例不会被垃圾收集。为了说明这一点,这里有一个小片段:
import gc
class Structure(dict):
def __init__(self, **kwargs):
dict.__init__(self, **kwargs)
self.__dict__ = self
structures = [Structure(name="__{0}".format(str(value))) for value in range(4096)]
print "Structure name: ", structures[16].name
print "Structure name: ", structures[16]["name"]
del structures
gc.collect()
print "Structures count: ", len([obj for obj in gc.get_objects() if type(obj) is Structure])
具有以下输出:
Structure name: __16
Structure name: __16
Structures count: 4096
正如您所注意到的,结构实例计数仍然是 4096。
我评论了创建方便的自引用的行:
import gc
class Structure(dict):
def __init__(self, **kwargs):
dict.__init__(self, **kwargs)
# self.__dict__ = self
structures = [Structure(name="__{0}".format(str(value))) for value in range(4096)]
# print "Structure name: ", structures[16].name
print "Structure name: ", structures[16]["name"]
del structures
gc.collect()
print "Structures count: ", len([obj for obj in gc.get_objects() if type(obj) is Structure])
现在,循环引用已被删除输出有意义:
Structure name: __16
Structures count: 0
我使用 Melia 进一步推动测试来分析内存消耗:
import gc
import pprint
from meliae import scanner
from meliae import loader
class Structure(dict):
def __init__(self, **kwargs):
dict.__init__(self, **kwargs)
self.__dict__ = self
structures = [Structure(name="__{0}".format(str(value))) for value in range(4096)]
print "Structure name: ", structures[16].name
print "Structure name: ", structures[16]["name"]
del structures
gc.collect()
print "Structures count: ", len([obj for obj in gc.get_objects() if type(obj) is Structure])
scanner.dump_all_objects("Test_001.json")
om = loader.load("Test_001.json")
summary = om.summarize()
print summary
structures = om.get_all("Structure")
if structures:
pprint.pprint(structures[0].c)
生成以下输出
Structure name: __16
Structure name: __16
Structures count: 4096
loading... line 5001, 5002 objs, 0.6 / 1.8 MiB read in 0.2s
loading... line 10002, 10003 objs, 1.1 / 1.8 MiB read in 0.3s
loading... line 15003, 15004 objs, 1.7 / 1.8 MiB read in 0.5s
loaded line 16405, 16406 objs, 1.8 / 1.8 MiB read in 0.5s
checked 1 / 16406 collapsed 0
checked 16405 / 16406 collapsed 157
compute parents 0 / 16249
compute parents 16248 / 16249
set parents 16248 / 16249
collapsed in 0.2s
Total 16249 objects, 58 types, Total size = 3.2MiB (3306183 bytes)
Index Count % Size % Cum Max Kind
0 4096 25 1212416 36 36 296 Structure
1 390 2 536976 16 52 49432 dict
2 5135 31 417550 12 65 12479 str
3 82 0 290976 8 74 12624 module
4 235 1 212440 6 80 904 type
5 947 5 121216 3 84 128 code
6 1008 6 120960 3 88 120 function
7 1048 6 83840 2 90 80 wrapper_descriptor
8 654 4 47088 1 92 72 builtin_function_or_method
9 562 3 40464 1 93 72 method_descriptor
10 517 3 37008 1 94 216 tuple
11 139 0 35832 1 95 2280 set
12 351 2 30888 0 96 88 weakref
13 186 1 23200 0 97 1664 list
14 63 0 21672 0 97 344 WeakSet
15 21 0 18984 0 98 904 ABCMeta
16 197 1 14184 0 98 72 member_descriptor
17 188 1 13536 0 99 72 getset_descriptor
18 284 1 6816 0 99 24 int
19 14 0 5296 0 99 2280 frozenset
[Structure(4312707312 296B 2refs 2par),
type(4298634592 904B 4refs 100par 'Structure')]
:内存使用量是3.2MiB,删除自引用行会产生以下输出:
Structure name: __16
Structures count: 0
loading... line 5001, 5002 objs, 0.6 / 1.4 MiB read in 0.1s
loading... line 10002, 10003 objs, 1.1 / 1.4 MiB read in 0.3s
loaded line 12308, 12309 objs, 1.4 / 1.4 MiB read in 0.4s
checked 12 / 12309 collapsed 0
checked 12308 / 12309 collapsed 157
compute parents 0 / 12152
compute parents 12151 / 12152
set parents 12151 / 12152
collapsed in 0.1s
Total 12152 objects, 57 types, Total size = 2.0MiB (2093714 bytes)
Index Count % Size % Cum Max Kind
0 390 3 536976 25 25 49432 dict
1 5134 42 417497 19 45 12479 str
2 82 0 290976 13 59 12624 module
3 235 1 212440 10 69 904 type
4 947 7 121216 5 75 128 code
5 1008 8 120960 5 81 120 function
6 1048 8 83840 4 85 80 wrapper_descriptor
7 654 5 47088 2 87 72 builtin_function_or_method
8 562 4 40464 1 89 72 method_descriptor
9 517 4 37008 1 91 216 tuple
10 139 1 35832 1 92 2280 set
11 351 2 30888 1 94 88 weakref
12 186 1 23200 1 95 1664 list
13 63 0 21672 1 96 344 WeakSet
14 21 0 18984 0 97 904 ABCMeta
15 197 1 14184 0 98 72 member_descriptor
16 188 1 13536 0 98 72 getset_descriptor
17 284 2 6816 0 99 24 int
18 14 0 5296 0 99 2280 frozenset
19 22 0 2288 0 99 104 classobj
确认 Structure 实例已被销毁,内存使用量降至 2.0MiB。
知道如何确保此类得到正确的垃圾收集吗?顺便说一句,所有这些都是在 Python 2.7.2 ( Darwin ) 上执行的。
干杯,
托马斯
I have a small handy class that I use a lot in my code which is the following:
class Structure(dict):
def __init__(self, **kwargs):
dict.__init__(self, **kwargs)
self.__dict__ = self
The nice thing about it is that you can either access the attributes using dictionary key syntax or usual object style:
myStructure = Structure(name="My Structure")
print myStructure["name"]
print myStructure.name
Today I have noticed that my application memory consumption was increasing slightly in a situation where I would have expected it to reduce. It seems to me that the instances generated from the Structure class are not garbaged collected. To illustrate this here is a small snippet:
import gc
class Structure(dict):
def __init__(self, **kwargs):
dict.__init__(self, **kwargs)
self.__dict__ = self
structures = [Structure(name="__{0}".format(str(value))) for value in range(4096)]
print "Structure name: ", structures[16].name
print "Structure name: ", structures[16]["name"]
del structures
gc.collect()
print "Structures count: ", len([obj for obj in gc.get_objects() if type(obj) is Structure])
With the following output:
Structure name: __16
Structure name: __16
Structures count: 4096
As you noticed the Structure instances count is still 4096.
I commented the line creating the handy self reference:
import gc
class Structure(dict):
def __init__(self, **kwargs):
dict.__init__(self, **kwargs)
# self.__dict__ = self
structures = [Structure(name="__{0}".format(str(value))) for value in range(4096)]
# print "Structure name: ", structures[16].name
print "Structure name: ", structures[16]["name"]
del structures
gc.collect()
print "Structures count: ", len([obj for obj in gc.get_objects() if type(obj) is Structure])
Now that the circular reference is removed the output makes sense:
Structure name: __16
Structures count: 0
I pushed the tests a bit further using Melia to analyze the memory consumption:
import gc
import pprint
from meliae import scanner
from meliae import loader
class Structure(dict):
def __init__(self, **kwargs):
dict.__init__(self, **kwargs)
self.__dict__ = self
structures = [Structure(name="__{0}".format(str(value))) for value in range(4096)]
print "Structure name: ", structures[16].name
print "Structure name: ", structures[16]["name"]
del structures
gc.collect()
print "Structures count: ", len([obj for obj in gc.get_objects() if type(obj) is Structure])
scanner.dump_all_objects("Test_001.json")
om = loader.load("Test_001.json")
summary = om.summarize()
print summary
structures = om.get_all("Structure")
if structures:
pprint.pprint(structures[0].c)
Generating the following output:
Structure name: __16
Structure name: __16
Structures count: 4096
loading... line 5001, 5002 objs, 0.6 / 1.8 MiB read in 0.2s
loading... line 10002, 10003 objs, 1.1 / 1.8 MiB read in 0.3s
loading... line 15003, 15004 objs, 1.7 / 1.8 MiB read in 0.5s
loaded line 16405, 16406 objs, 1.8 / 1.8 MiB read in 0.5s
checked 1 / 16406 collapsed 0
checked 16405 / 16406 collapsed 157
compute parents 0 / 16249
compute parents 16248 / 16249
set parents 16248 / 16249
collapsed in 0.2s
Total 16249 objects, 58 types, Total size = 3.2MiB (3306183 bytes)
Index Count % Size % Cum Max Kind
0 4096 25 1212416 36 36 296 Structure
1 390 2 536976 16 52 49432 dict
2 5135 31 417550 12 65 12479 str
3 82 0 290976 8 74 12624 module
4 235 1 212440 6 80 904 type
5 947 5 121216 3 84 128 code
6 1008 6 120960 3 88 120 function
7 1048 6 83840 2 90 80 wrapper_descriptor
8 654 4 47088 1 92 72 builtin_function_or_method
9 562 3 40464 1 93 72 method_descriptor
10 517 3 37008 1 94 216 tuple
11 139 0 35832 1 95 2280 set
12 351 2 30888 0 96 88 weakref
13 186 1 23200 0 97 1664 list
14 63 0 21672 0 97 344 WeakSet
15 21 0 18984 0 98 904 ABCMeta
16 197 1 14184 0 98 72 member_descriptor
17 188 1 13536 0 99 72 getset_descriptor
18 284 1 6816 0 99 24 int
19 14 0 5296 0 99 2280 frozenset
[Structure(4312707312 296B 2refs 2par),
type(4298634592 904B 4refs 100par 'Structure')]
The memory usage is 3.2MiB, removing the self referencing line leads to the following output:
Structure name: __16
Structures count: 0
loading... line 5001, 5002 objs, 0.6 / 1.4 MiB read in 0.1s
loading... line 10002, 10003 objs, 1.1 / 1.4 MiB read in 0.3s
loaded line 12308, 12309 objs, 1.4 / 1.4 MiB read in 0.4s
checked 12 / 12309 collapsed 0
checked 12308 / 12309 collapsed 157
compute parents 0 / 12152
compute parents 12151 / 12152
set parents 12151 / 12152
collapsed in 0.1s
Total 12152 objects, 57 types, Total size = 2.0MiB (2093714 bytes)
Index Count % Size % Cum Max Kind
0 390 3 536976 25 25 49432 dict
1 5134 42 417497 19 45 12479 str
2 82 0 290976 13 59 12624 module
3 235 1 212440 10 69 904 type
4 947 7 121216 5 75 128 code
5 1008 8 120960 5 81 120 function
6 1048 8 83840 4 85 80 wrapper_descriptor
7 654 5 47088 2 87 72 builtin_function_or_method
8 562 4 40464 1 89 72 method_descriptor
9 517 4 37008 1 91 216 tuple
10 139 1 35832 1 92 2280 set
11 351 2 30888 1 94 88 weakref
12 186 1 23200 1 95 1664 list
13 63 0 21672 1 96 344 WeakSet
14 21 0 18984 0 97 904 ABCMeta
15 197 1 14184 0 98 72 member_descriptor
16 188 1 13536 0 98 72 getset_descriptor
17 284 2 6816 0 99 24 int
18 14 0 5296 0 99 2280 frozenset
19 22 0 2288 0 99 104 classobj
Confirming that the Structure instances have been destroyed and the memory usage dropped to 2.0MiB.
Any idea how I could ensure that this class gets properly garbage collected? All this is executed on Python 2.7.2 ( Darwin ) by the way.
Cheers,
Thomas
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以使用
__getattr__
和__setattr__
更直接地实现 Structure 类,以允许属性访问底层字典。循环在 Python 中是垃圾收集的,但只是定期收集(与常规引用计数对象不同,一旦引用计数下降到 0,就会立即收集)。
避免循环(就像使用
__getattr__
和__setattr__
的 Structure 类所做的那样),意味着您将获得更好的 gc 行为。您可能想看看collections.namedtuple
作为一个不错的选择:它并不完全按照您所实现的方式进行,但也许它适合您的目的。You can more straightforwardly implement your Structure class by using
__getattr__
and__setattr__
to allow attribute access to go to the underlying dict.Cycles are garbage collected in Python, but only periodically (unlike regular reference counted objects which get collected as soon as their reference count drops to 0).
Avoiding the cycle (as the Structure class using
__getattr__
and__setattr__
does), means you'll get better gc behavior. You may want a look atcollections.namedtuple
as a good alternative: it's not doing exactly what you've implemented but perhaps it suits your ends.