我正在阅读 https://devguide.python.org/garbage_collector/ (和 https://docs.python.org/3.8/library/gc.html)来了解Python中GC背后的机制。世代阈值的概念对我来说有点不清楚。提到的文章(以及许多其他文章)说
当世代包含的对象数量达到某个预定义阈值时,就会收集世代
。默认阈值为 (700, 10, 10)。美好的。
我正在运行以下代码以更好地理解该机制,并且输出与我所理解的阈值定义相矛盾:
class A():
pass
lst = []
gc_lst = []
def strt(p, inf):
s = "phase = %s\ninf = %s\ngc.get_count() = %s\ngc.get_stats() = %s\nlen(gc.get_objects()) = %s"\
%(p, inf, gc.get_count(), gc.get_stats(), [len(gc.get_objects(0)),len(gc.get_objects(1)),len(gc.get_objects(2))])
gc_lst.append(s)
gc.callbacks.append(strt)
s = "gc.get_count() = %s\ngc.get_stats() = %s\nlen(gc.get_objects() for each gen) = %s"\
%(gc.get_count(), gc.get_stats(), [len(gc.get_objects(0)),len(gc.get_objects(1)),len(gc.get_objects(2))])
print(s)
for k in range(8000):
lst.append(A())
gc_lst.append(k)
for k in gc_lst:
print(k)
输出:
0
1
2
...
...
...
5413
5414
phase = start
inf = {'generation': 0, 'collected': 0, 'uncollectable': 0}
gc.get_count() = (701, 8, 1)
gc.get_stats() = [{'collections': 19, 'collected': 65, 'uncollectable': 0}, {'collections': 1, 'collected': 33, 'uncollectable': 0}, {'collections': 0, 'collected': 0, 'uncollectable': 0}]
len(gc.get_objects()) = [703, 5475, 5069]
phase = stop
inf = {'generation': 0, 'collected': 0, 'uncollectable': 0}
gc.get_count() = (0, 9, 1)
gc.get_stats() = [{'collections': 20, 'collected': 65, 'uncollectable': 0}, {'collections': 1, 'collected': 33, 'uncollectable': 0}, {'collections': 0, 'collected': 0, 'uncollectable': 0}]
len(gc.get_objects()) = [2, 6176, 5069]
5415
5416
...
6114
6115
phase = start
inf = {'generation': 0, 'collected': 0, 'uncollectable': 0}
gc.get_count() = (701, 9, 1)
gc.get_stats() = [{'collections': 20, 'collected': 65, 'uncollectable': 0}, {'collections': 1, 'collected': 33, 'uncollectable': 0}, {'collections': 0, 'collected': 0, 'uncollectable': 0}]
len(gc.get_objects()) = [703, 6176, 5069]
phase = stop
inf = {'generation': 0, 'collected': 0, 'uncollectable': 0}
gc.get_count() = (0, 10, 1)
gc.get_stats() = [{'collections': 21, 'collected': 65, 'uncollectable': 0}, {'collections': 1, 'collected': 33, 'uncollectable': 0}, {'collections': 0, 'collected': 0, 'uncollectable': 0}]
len(gc.get_objects()) = [2, 6877, 5069]
6116
6117
...
6815
6816
phase = start
inf = {'generation': 0, 'collected': 0, 'uncollectable': 0}
gc.get_count() = (701, 10, 1)
gc.get_stats() = [{'collections': 21, 'collected': 65, 'uncollectable': 0}, {'collections': 1, 'collected': 33, 'uncollectable': 0}, {'collections': 0, 'collected': 0, 'uncollectable': 0}]
len(gc.get_objects()) = [703, 6877, 5069]
phase = stop
inf = {'generation': 0, 'collected': 0, 'uncollectable': 0}
gc.get_count() = (0, 11, 1)
gc.get_stats() = [{'collections': 22, 'collected': 65, 'uncollectable': 0}, {'collections': 1, 'collected': 33, 'uncollectable': 0}, {'collections': 0, 'collected': 0, 'uncollectable': 0}]
len(gc.get_objects()) = [2, 7578, 5069]
6817
6818
...
7516
7517
phase = start
inf = {'generation': 1, 'collected': 0, 'uncollectable': 0}
gc.get_count() = (701, 11, 1)
gc.get_stats() = [{'collections': 22, 'collected': 65, 'uncollectable': 0}, {'collections': 1, 'collected': 33, 'uncollectable': 0}, {'collections': 0, 'collected': 0, 'uncollectable': 0}]
len(gc.get_objects()) = [703, 7578, 5069]
phase = stop
inf = {'generation': 1, 'collected': 0, 'uncollectable': 0}
gc.get_count() = (0, 0, 2)
gc.get_stats() = [{'collections': 22, 'collected': 65, 'uncollectable': 0}, {'collections': 2, 'collected': 33, 'uncollectable': 0}, {'collections': 0, 'collected': 0, 'uncollectable': 0}]
len(gc.get_objects()) = [2, 0, 13341]
7518
7519
...
7998
7999
我看到提到的阈值适用于集合,而不是生成中的对象。晚辈每人收集700件物品。下一代是每 10 个年轻一代的集合(每 7000 个对象)收集一次,按照这个逻辑,最老的一代应该每 70000 个对象收集一次(忽略二次时间问题的修复https://bugs.python.org/issue4074)。我在定义中缺少什么?
I'm reading https://devguide.python.org/garbage_collector/ (and https://docs.python.org/3.8/library/gc.html) to understand the mechanism behind GC in python. The concept of generation thresholds is somehow unclear to me. The mentioned article (and many others) says that
Generations are collected when the number of objects that they contain reaches some predefined threshold
The default thresholds are (700, 10, 10). Fine.
I'm running the following code to better understand the mechanism and the output contradicts with what I understand as the definition of the thresholds:
class A():
pass
lst = []
gc_lst = []
def strt(p, inf):
s = "phase = %s\ninf = %s\ngc.get_count() = %s\ngc.get_stats() = %s\nlen(gc.get_objects()) = %s"\
%(p, inf, gc.get_count(), gc.get_stats(), [len(gc.get_objects(0)),len(gc.get_objects(1)),len(gc.get_objects(2))])
gc_lst.append(s)
gc.callbacks.append(strt)
s = "gc.get_count() = %s\ngc.get_stats() = %s\nlen(gc.get_objects() for each gen) = %s"\
%(gc.get_count(), gc.get_stats(), [len(gc.get_objects(0)),len(gc.get_objects(1)),len(gc.get_objects(2))])
print(s)
for k in range(8000):
lst.append(A())
gc_lst.append(k)
for k in gc_lst:
print(k)
the output:
0
1
2
...
...
...
5413
5414
phase = start
inf = {'generation': 0, 'collected': 0, 'uncollectable': 0}
gc.get_count() = (701, 8, 1)
gc.get_stats() = [{'collections': 19, 'collected': 65, 'uncollectable': 0}, {'collections': 1, 'collected': 33, 'uncollectable': 0}, {'collections': 0, 'collected': 0, 'uncollectable': 0}]
len(gc.get_objects()) = [703, 5475, 5069]
phase = stop
inf = {'generation': 0, 'collected': 0, 'uncollectable': 0}
gc.get_count() = (0, 9, 1)
gc.get_stats() = [{'collections': 20, 'collected': 65, 'uncollectable': 0}, {'collections': 1, 'collected': 33, 'uncollectable': 0}, {'collections': 0, 'collected': 0, 'uncollectable': 0}]
len(gc.get_objects()) = [2, 6176, 5069]
5415
5416
...
6114
6115
phase = start
inf = {'generation': 0, 'collected': 0, 'uncollectable': 0}
gc.get_count() = (701, 9, 1)
gc.get_stats() = [{'collections': 20, 'collected': 65, 'uncollectable': 0}, {'collections': 1, 'collected': 33, 'uncollectable': 0}, {'collections': 0, 'collected': 0, 'uncollectable': 0}]
len(gc.get_objects()) = [703, 6176, 5069]
phase = stop
inf = {'generation': 0, 'collected': 0, 'uncollectable': 0}
gc.get_count() = (0, 10, 1)
gc.get_stats() = [{'collections': 21, 'collected': 65, 'uncollectable': 0}, {'collections': 1, 'collected': 33, 'uncollectable': 0}, {'collections': 0, 'collected': 0, 'uncollectable': 0}]
len(gc.get_objects()) = [2, 6877, 5069]
6116
6117
...
6815
6816
phase = start
inf = {'generation': 0, 'collected': 0, 'uncollectable': 0}
gc.get_count() = (701, 10, 1)
gc.get_stats() = [{'collections': 21, 'collected': 65, 'uncollectable': 0}, {'collections': 1, 'collected': 33, 'uncollectable': 0}, {'collections': 0, 'collected': 0, 'uncollectable': 0}]
len(gc.get_objects()) = [703, 6877, 5069]
phase = stop
inf = {'generation': 0, 'collected': 0, 'uncollectable': 0}
gc.get_count() = (0, 11, 1)
gc.get_stats() = [{'collections': 22, 'collected': 65, 'uncollectable': 0}, {'collections': 1, 'collected': 33, 'uncollectable': 0}, {'collections': 0, 'collected': 0, 'uncollectable': 0}]
len(gc.get_objects()) = [2, 7578, 5069]
6817
6818
...
7516
7517
phase = start
inf = {'generation': 1, 'collected': 0, 'uncollectable': 0}
gc.get_count() = (701, 11, 1)
gc.get_stats() = [{'collections': 22, 'collected': 65, 'uncollectable': 0}, {'collections': 1, 'collected': 33, 'uncollectable': 0}, {'collections': 0, 'collected': 0, 'uncollectable': 0}]
len(gc.get_objects()) = [703, 7578, 5069]
phase = stop
inf = {'generation': 1, 'collected': 0, 'uncollectable': 0}
gc.get_count() = (0, 0, 2)
gc.get_stats() = [{'collections': 22, 'collected': 65, 'uncollectable': 0}, {'collections': 2, 'collected': 33, 'uncollectable': 0}, {'collections': 0, 'collected': 0, 'uncollectable': 0}]
len(gc.get_objects()) = [2, 0, 13341]
7518
7519
...
7998
7999
I see that the mentioned thresholds are for collections, and not objects in generation. The younger generation is collected each 700 objects. the next generation is collected each 10 collections of the young gen (each 7000 objects) and following this logic the oldest gen should be collected each 70000 objects (neglecting the fix for the quadratic time issue https://bugs.python.org/issue4074). What am I missing in the definition?
发布评论