更多詳細關於垃圾回收:https://pythonav.com/wiki/detail/6/88/
內存管理
Python解釋器由c語言開發完成,py中所有的操作最終都由底層的c語言來實現並完成,所以想要了解底層內存管理需要結合python源碼來進行解釋。
1. 兩個重要的結構體
include/object.h
#define _PyObject_HEAD_EXTRA \ struct _object *_ob_next; \ struct _object *_ob_prev; #define PyObject_HEAD PyObject ob_base; #define PyObject_VAR_HEAD PyVarObject ob_base; typedef struct _object { _PyObject_HEAD_EXTRA // 用於構造雙向鏈表 Py_ssize_t ob_refcnt; // 引用計數器 struct _typeobject *ob_type; // 數據類型 } PyObject; typedef struct { PyObject ob_base; // PyObject對象 Py_ssize_t ob_size; /* Number of items in variable part,即:元素個數 */ } PyVarObject;
以上源碼是Python內存管理中的基石,其中包含了:
- 2個結構體
- PyObject,此結構體中包含3個元素。
- _PyObject_HEAD_EXTRA,用於構造雙向鏈表。
- ob_refcnt,引用計數器。
- *ob_type,數據類型。
- PyVarObject,次結構體中包含4個元素(ob_base中包含3個元素)
- ob_base,PyObject結構體對象,即:包含PyObject結構體中的三個元素。
- ob_size,內部元素個數。
- PyObject,此結構體中包含3個元素。
- 3個宏定義
- PyObject_HEAD,代指PyObject結構體。
- PyVarObject_HEAD,代指PyVarObject對象。
- _PyObject_HEAD_EXTRA,代指前后指針,用於構造雙向隊列。
Python中所有類型創建對象時,底層都是與PyObject和PyVarObject結構體實現,一般情況下由單個元素組成對象內部會使用PyObject結構體(float)、由多個元素組成的對象內部會使用PyVarObject結構體(str/int/list/dict/tuple/set/自定義類),因為由多個元素組成的話是需要為其維護一個 ob_size(內部元素個數)。

typedef struct { PyObject_HEAD double ob_fval; } PyFloatObject;

// longintrepr.h struct _longobject { PyObject_VAR_HEAD digit ob_digit[1]; }; // longobject.h /* Long (arbitrary precision) integer object interface */ typedef struct _longobject PyLongObject; /* Revealed in longintrepr.h */ /* 1. python3中沒有long類型,只有int類型,但py3內部的int是基於long實現。 2. python3中對int/long長度沒有限制,因其內部不是用long存儲而是使用類似於“字符串”存儲。 */

typedef struct { PyObject_VAR_HEAD Py_hash_t ob_shash; char ob_sval[1]; /* Invariants: * ob_sval contains space for 'ob_size+1' elements. * ob_sval[ob_size] == 0. * ob_shash is the hash of the string or -1 if not computed yet. */ } PyBytesObject;

typedef struct { PyObject_VAR_HEAD /* Vector of pointers to list elements. list[0] is ob_item[0], etc. */ PyObject **ob_item; /* ob_item contains space for 'allocated' elements. The number * currently in use is ob_size. * Invariants: * 0 <= ob_size <= allocated * len(list) == ob_size * ob_item == NULL implies ob_size == allocated == 0 * list.sort() temporarily sets allocated to -1 to detect mutations. * * Items must normally not be NULL, except during construction when * the list is not yet visible outside the function that builds it. */ Py_ssize_t allocated; } PyListObject;

typedef struct { PyObject_VAR_HEAD PyObject *ob_item[1]; /* ob_item contains space for 'ob_size' elements. * Items must normally not be NULL, except during construction when * the tuple is not yet visible outside the function that builds it. */ } PyTupleObject;

typedef struct { PyObject_HEAD Py_ssize_t ma_used; PyDictKeysObject *ma_keys; PyObject **ma_values; } PyDictObject;

typedef struct { PyObject_HEAD Py_ssize_t fill; /* Number active and dummy entries*/ Py_ssize_t used; /* Number active entries */ /* The table contains mask + 1 slots, and that's a power of 2. * We store the mask instead of the size because the mask is more * frequently needed. */ Py_ssize_t mask; /* The table points to a fixed-size smalltable for small tables * or to additional malloc'ed memory for bigger tables. * The table pointer is never NULL which saves us from repeated * runtime null-tests. */ setentry *table; Py_hash_t hash; /* Only used by frozenset objects */ Py_ssize_t finger; /* Search finger for pop() */ setentry smalltable[PySet_MINSIZE]; PyObject *weakreflist; /* List of weak references */ } PySetObject;

typedef struct _typeobject { PyObject_VAR_HEAD const char *tp_name; /* For printing, in format "<module>.<name>" */ Py_ssize_t tp_basicsize, tp_itemsize; /* For allocation */ /* Methods to implement standard operations */ ... } PyTypeObject;
2. 內存管理
以float和list類型為例,分析python源碼執行流程,了解內存管理機制。
2.1 float類型
情景一:創建float對象時
val = 3.14
當按照上述方式創建一個Float類型對象時,源碼內部會先后執行如下代碼。
/* Special free list free_list is a singly-linked list of available PyFloatObjects, linked via abuse of their ob_type members. */ static PyFloatObject *free_list = NULL; static int numfree = 0; PyObject * PyFloat_FromDouble(double fval) { PyFloatObject *op = free_list; if (op != NULL) { free_list = (PyFloatObject *) Py_TYPE(op); numfree--; } else { // 第一步:根據float類型大小,為float對象開辟內存。 op = (PyFloatObject*) PyObject_MALLOC(sizeof(PyFloatObject)); if (!op) return PyErr_NoMemory(); } // 第二步:在為float對象開辟的內存中進行初始化。 /* Inline PyObject_New */ (void)PyObject_INIT(op, &PyFloat_Type); // 第三步:將值賦值到float對象開辟的內存中。 op->ob_fval = fval; // 第四步:返回已經創建的float對象的內存地址(引用/指針) return (PyObject *) op; }
第一步:根據float類型所需的內存大小,為其開辟內存。

static PyMemAllocatorEx _PyObject = { #ifdef PYMALLOC_DEBUG &_PyMem_Debug.obj, PYDBG_FUNCS #else NULL, PYOBJ_FUNCS #endif }; void * PyObject_Malloc(size_t size) { /* see PyMem_RawMalloc() */ if (size > (size_t)PY_SSIZE_T_MAX) return NULL; // 開辟內存 return _PyObject.malloc(_PyObject.ctx, size); }

Customize Memory Allocators =========================== .. versionadded:: 3.4 .. c:type:: PyMemAllocatorEx Structure used to describe a memory block allocator. The structure has four fields: +----------------------------------------------------------+---------------------------------------+ | Field | Meaning | +==========================================================+=======================================+ | ``void *ctx`` | user context passed as first argument | +----------------------------------------------------------+---------------------------------------+ | ``void* malloc(void *ctx, size_t size)`` | allocate a memory block | +----------------------------------------------------------+---------------------------------------+ | ``void* calloc(void *ctx, size_t nelem, size_t elsize)`` | allocate a memory block initialized | | | with zeros | +----------------------------------------------------------+---------------------------------------+ | ``void* realloc(void *ctx, void *ptr, size_t new_size)`` | allocate or resize a memory block | +----------------------------------------------------------+---------------------------------------+ | ``void free(void *ctx, void *ptr)`` | free a memory block | +----------------------------------------------------------+---------------------------------------+ .. versionchanged:: 3.5 The :c:type:`PyMemAllocator` structure was renamed to :c:type:`PyMemAllocatorEx` and a new ``calloc`` field was added.
第二步:對新開辟的內存中進行類型和引用的初始化

/* Macros trading binary compatibility for speed. See also pymem.h. Note that these macros expect non-NULL object pointers.*/ #define PyObject_INIT(op, typeobj) \ ( Py_TYPE(op) = (typeobj), _Py_NewReference((PyObject *)(op)), (op) )

/* Head of circular doubly-linked list of all objects. These are linked * together via the _ob_prev and _ob_next members of a PyObject, which * exist only in a Py_TRACE_REFS build. */ static PyObject refchain = {&refchain, &refchain}; /* Insert op at the front of the list of all objects. If force is true, * op is added even if _ob_prev and _ob_next are non-NULL already. If * force is false amd _ob_prev or _ob_next are non-NULL, do nothing. * force should be true if and only if op points to freshly allocated, * uninitialized memory, or you've unlinked op from the list and are * relinking it into the front. * Note that objects are normally added to the list via _Py_NewReference, * which is called by PyObject_Init. Not all objects are initialized that * way, though; exceptions include statically allocated type objects, and * statically allocated singletons (like Py_True and Py_None). */ void _Py_AddToAllObjects(PyObject *op, int force) { if (force || op->_ob_prev == NULL) { op->_ob_next = refchain._ob_next; op->_ob_prev = &refchain; refchain._ob_next->_ob_prev = op; refchain._ob_next = op; } } void _Py_NewReference(PyObject *op) { _Py_INC_REFTOTAL; // 對新開辟的內存中的的引用計數器初始化為1。 op->ob_refcnt = 1; // 將新開辟的內存的指針添加到一個雙向鏈表refchain中。 _Py_AddToAllObjects(op, 1); _Py_INC_TPALLOCS(op); }
所以,float類型每次創建對象時都會把對象放到 refchain 的雙向鏈表中。
情景二:float對象引用時
val = 7.8 data = val
這個過程比較簡單,在給對象創建新引用時,會對其引用計數器+1的動作。

/* The macros Py_INCREF(op) and Py_DECREF(op) are used to increment or decrement reference counts. Py_DECREF calls the object's deallocator function when the refcount falls to 0; for objects that don't contain references to other objects or heap memory this can be the standard function free(). Both macros can be used wherever a void expression is allowed. The argument must not be a NULL pointer. If it may be NULL, use Py_XINCREF/Py_XDECREF instead. The macro _Py_NewReference(op) initialize reference counts to 1, and in special builds (Py_REF_DEBUG, Py_TRACE_REFS) performs additional bookkeeping appropriate to the special build. #define Py_INCREF(op) ( \ _Py_INC_REFTOTAL _Py_REF_DEBUG_COMMA \ ((PyObject *)(op))->ob_refcnt++)
情景三:銷毀float對象時
val = 3.14 # 主動刪除對象 del val """ 主動del刪除對象時,會執行對象銷毀的動作。 一個函數執行完畢之后,其內部局部變量也會有銷毀動作,如: def func(): val = 2.22 func() """
當進行銷毀對象動作時,先后會執行如下代碼:

The macros Py_INCREF(op) and Py_DECREF(op) are used to increment or decrement reference counts. Py_DECREF calls the object's deallocator function when the refcount falls to 0; for objects that don't contain references to other objects or heap memory this can be the standard function free(). Both macros can be used wherever a void expression is allowed. The argument must not be a NULL pointer. If it may be NULL, use Py_XINCREF/Py_XDECREF instead. The macro _Py_NewReference(op) initialize reference counts to 1, and in special builds (Py_REF_DEBUG, Py_TRACE_REFS) performs additional bookkeeping appropriate to the special build. #define Py_DECREF(op) \ do { \ PyObject *_py_decref_tmp = (PyObject *)(op); \ if (_Py_DEC_REFTOTAL _Py_REF_DEBUG_COMMA \ --(_py_decref_tmp)->ob_refcnt != 0) \ _Py_CHECK_REFCNT(_py_decref_tmp) \ else \ _Py_Dealloc(_py_decref_tmp); \ } while (0)

void _Py_Dealloc(PyObject *op) { // 第一步:調用float類型的tp_dealloc,進行內存的銷毀 destructor dealloc = Py_TYPE(op)->tp_dealloc; // 第二步:在refchain雙向鏈表中移除 _Py_ForgetReference(op); (*dealloc)(op); }
第一步,調用float類型的tp_dealloc進行內存的銷毀。
按理此過程說應該直接將對象內存銷毀,但float內部有緩存機制,所以他的執行流程是這樣的:
- float內部緩存的內存個數已經大於等於100,那么在執行`del val`的語句時,內存中就會直接刪除此對象。
- 未達到100時,那么執行 `del val`語句,不會真的在內存中銷毀對象,而是將對象放到一個free_list的單鏈表中,以便以后的對象使用。

/* Special free list free_list is a singly-linked list of available PyFloatObjects, linked via abuse of their ob_type members. */ #ifndef PyFloat_MAXFREELIST #define PyFloat_MAXFREELIST 100 #endif static int numfree = 0; static PyFloatObject *free_list = NULL; PyTypeObject PyFloat_Type = { PyVarObject_HEAD_INIT(&PyType_Type, 0) "float", sizeof(PyFloatObject), 0, // tp_dealloc表示執行float_dealloc方法 (destructor)float_dealloc, /* tp_dealloc */ 0, /* tp_print */ 0, /* tp_getattr */ 0, /* tp_setattr */ 0, /* tp_reserved */ ... }; static void float_dealloc(PyFloatObject *op) { // 檢測是否是float類型 if (PyFloat_CheckExact(op)) { // 檢測緩沖池個數是否大於100個 if (numfree >= PyFloat_MAXFREELIST) { // 如果大於100個,則在內存中銷毀對象 PyObject_FREE(op); return; } // 否則,緩沖池個數+1 // 並將要銷毀的數據加入到free_list的單項鏈表中,以便以后創建float類型使用。 numfree++; Py_TYPE(op) = (struct _typeobject *)free_list; free_list = op; } else Py_TYPE(op)->tp_free((PyObject *)op); }

""" 了解Python中float類型的緩存機制之后,就可以理解如下代碼的兩個內存地址居然一樣的現象的本質了。 """ v1 = 3.8 print(id(v1)) # 內存地址:140454027861640 del v1 v2 = 88.7 print(id(v2)) # 內存地址:140454027861640

void PyObject_Free(void *ptr) { // 與上述開辟內存類似 _PyObject.free(_PyObject.ctx, ptr); }
第二步,在refchain雙向鏈表中移除

/* Head of circular doubly-linked list of all objects. These are linked * together via the _ob_prev and _ob_next members of a PyObject, which * exist only in a Py_TRACE_REFS build. */ static PyObject refchain = {&refchain, &refchain}; void _Py_ForgetReference(PyObject *op) { #ifdef SLOW_UNREF_CHECK PyObject *p; #endif if (op->ob_refcnt < 0) Py_FatalError("UNREF negative refcnt"); if (op == &refchain || op->_ob_prev->_ob_next != op || op->_ob_next->_ob_prev != op) { fprintf(stderr, "* ob\n"); _PyObject_Dump(op); fprintf(stderr, "* op->_ob_prev->_ob_next\n"); _PyObject_Dump(op->_ob_prev->_ob_next); fprintf(stderr, "* op->_ob_next->_ob_prev\n"); _PyObject_Dump(op->_ob_next->_ob_prev); Py_FatalError("UNREF invalid object"); } #ifdef SLOW_UNREF_CHECK for (p = refchain._ob_next; p != &refchain; p = p->_ob_next) { if (p == op) break; } if (p == &refchain) /* Not found */ Py_FatalError("UNREF unknown object"); #endif op->_ob_next->_ob_prev = op->_ob_prev; op->_ob_prev->_ob_next = op->_ob_next; op->_ob_next = op->_ob_prev = NULL; _Py_INC_TPFREES(op); }
綜上所述,float對象在創建對象時會把為其開辟內存並初始化引用計數器為1,然后將其加入到名為 refchain 的雙向鏈表中;float對象在增加引用時,會執行 Py_INCREF在內部會讓引用計數器+1;最后執行銷毀float對象時,會先判斷float內部free_list中緩存的個數,如果已達到300個,則直接在內存中銷毀,否則不會真正銷毀而是加入free_list單鏈表中,以后后續對象使用,銷毀動作的最后再在refchain中移除即可。
垃圾回收機制
Python的垃圾回收機制是以:引用計數器為主,標記清除和分代回收為輔。
1. 引用計數器
每個對象內部都維護了一個值,該值記錄這此對象被引用的次數,如果次數為0,則Python垃圾回收機制會自動清除此對象。下圖是Python源碼中引用計數器存儲的代碼。
引用計數器的獲取及代碼示例:
import sys # 在內存中創建一個字符串對象"武沛齊",對象引用計數器的值為:1 nick_name = '武沛齊' # 應該輸入2,實際輸出2,因為getrefcount方法時把 nick_name 當做參數傳遞了,引發引用計數器+1,所以打印時值為:2 # 注意:getrefcount 函數執行完畢后,會自動-1,所以本質上引用計數器還是1. print(sys.getrefcount(nick_name)) # 變量 real_name 也指向的字符串對象"武沛齊",即:引用計數器再 +1,所以值為:2 real_name = nick_name # 應該輸出2,實際輸出3. 因為getrefcount方法時把 real_name 當做參數傳遞了,引發引用計數器+1,所以打印時值為:3 # 注意:getrefcount 函數執行完畢后,會自動-1,所以本質上引用計數器還是2. print(sys.getrefcount(nick_name)) # 刪除reald_name變量,並讓其指向對象中的引用計數器-1 del real_name # 應該輸出1,實際輸出2,因為getrefcount方法時把 real_name 當做參數傳遞了,引發引用計數器+1,所以打印時值為:2. print(sys.getrefcount(nick_name)) # ############ getrefcount 注釋信息 ############ ''' def getrefcount(p_object): # real signature unknown; restored from __doc__ """ getrefcount(object) -> integer Return the reference count of object. The count returned is generally one higher than you might expect, because it includes the (temporary) reference as an argument to getrefcount(). """ return 0 '''
2. 循環引用
通過引用計數器的方式基本上可以完成Python的垃圾回收,但它還是具有明顯的缺陷,即:“循環引用” 。
#!/usr/bin/env python # -*- coding:utf-8 -*- import gc import objgraph class Foo(object): def __init__(self): self.data = None # 在內存創建兩個對象,即:引用計數器值都是1 obj1 = Foo() obj2 = Foo() # 兩個對象循環引用,導致內存中對象的應用+1,即:引用計數器值都是2 obj1.data = obj2 obj2.data = obj1 # 刪除變量,並將引用計數器-1。 del obj1 del obj2 # 關閉垃圾回收機制,因為python的垃圾回收機制是:引用計數器、標記清除、分代回收 配合已解決循環引用的問題,關閉他便於之后查詢內存中未被釋放對象。 gc.disable() # 至此,由於循環引用導致內存中創建的obj1和obj2兩個對象引用計數器不為0,無法被垃圾回收機制回收。 # 所以,內存中Foo類的對象就還顯示有2個。 print(objgraph.count('Foo'))
注意:gc.collect() 可以主動觸發垃圾回收;
循環引用的問題會引發內存中的對象一直無法釋放,從而內存逐漸增大,最終導致內存泄露。
為了解決循環引用的問題,Python又在引用計數器的基礎上引入了標記清除和分代回收的機制。
so,不必再擔心循環引用的問題了。
Reference cycles involving lists, tuples, instances, classes, dictionaries, and functions are found.
Python GC 源碼文檔:http://www.arctrix.com/nas/python/gc/
3. 標記清除&分代回收
Python為了解決循環引用,針對 lists, tuples, instances, classes, dictionaries, and functions 類型,每創建一個對象都會將對象放到一個雙向鏈表中,每個對象中都有 _ob_next 和 _ob_prev 指針,用於掛靠到鏈表中。
/* Nothing is actually declared to be a PyObject, but every pointer to * a Python object can be cast to a PyObject*. This is inheritance built * by hand. Similarly every pointer to a variable-size Python object can, * in addition, be cast to PyVarObject*. */ typedef struct _object { _PyObject_HEAD_EXTRA # 雙向鏈表 Py_ssize_t ob_refcnt; struct _typeobject *ob_type; } PyObject; typedef struct { PyObject ob_base; Py_ssize_t ob_size; /* Number of items in variable part */ } PyVarObject; /* Define pointers to support a doubly-linked list of all live heap objects. */ #define _PyObject_HEAD_EXTRA \ struct _object *_ob_next; \ struct _object *_ob_prev;
隨着對象的創建,該雙向鏈表上的對象會越來越多。
- 當對象個數超過 700個 時,Python解釋器就會進行垃圾回收。
- 當代碼中主動執行 gc.collect() 命令時,Python解釋器就會進行垃圾回收。
import gc gc.collect()
Python解釋器在垃圾回收時,會遍歷鏈表中的每個對象,如果存在循環引用,就將存在循環引用的對象的引用計數器 -1,同時Python解釋器也會將計數器等於0(可回收)和不等於0(不可回收)的一分為二,把計數器等於0的所有對象進行回收,把計數器不為0的對象放到另外一個雙向鏈表表(即:分代回收的下一代)。
關於分代回收(generations):
The GC classifies objects into three generations depending on how many collection sweeps they have survived. New objects are placed in the youngest generation (generation 0
). If an object survives a collection it is moved into the next older generation. Since generation 2
is the oldest generation, objects in that generation remain there after a collection. In order to decide when to run, the collector keeps track of the number object allocations and deallocations since the last collection. When the number of allocations minus the number of deallocations exceeds threshold0, collection starts. Initially only generation 0
is examined. If generation 0
has been examined more than threshold1 times since generation 1
has been examined, then generation 1
is examined as well. Similarly, threshold2 controls the number of collections of generation 1
before collecting generation 2
.
# 默認情況下三個閾值為 (700,10,10) ,也可以主動去修改默認閾值。 import gc gc.set_threshold(threshold0[, threshold1[, threshold2]])
官方文檔: https://docs.python.org/3/library/gc.html
參考文檔:
http://www.wklken.me/posts/2015/09/29/python-source-gc.html
https://yq.aliyun.com/users/yqzdoezsuvujg/album?spm=a2c4e.11155435.0.0.d07467451AwRxO