Python內存管理機制


Python的內存管理機制:引入計數、垃圾回收、內存池機制

一、引入計數

1、變量與對象

In sum, variables are created when assigned, can reference any type of object, and must
be assigned before they are referenced. This means that you never need to declare names
used by your script, but you must initialize names before you can update them; counters,
for example, must be initialized to zero before you can add to them.
  • 變量賦值的時候才創建,它可以指向(引用)任何類型的對象
    • python里每一個東西都是對象,它們的核心就是一個結構體:PyObject
  • 變量必須先賦值,再引用。
    • 比如,你定義一個計數器,你必須初始化成0,然后才能自增。
  • 每個對象都包含兩個頭部字段(類型標識符和引用計數器)

關系圖如下:

 Names and objects after running the assignment a = 3. Variable a becomes a reference to
the object 3. Internally, the variable is really a pointer to the object’s memory space created by running
the literal expression 3.

These links from variables to objects are called references in Python—that is, a reference
is a kind of association, implemented as a pointer in memory.1 Whenever the variables
are later used (i.e., referenced), Python automatically follows the variable-to-object
links. This is all simpler than the terminology may imply. In concrete terms:
  • Variables are entries in a system table, with spaces for links to objects.
  • Objects are pieces of allocated memory, with enough space to represent the values for which they stand.
  • References are automatically followed pointers from variables to objects.
  • objects have two header fields, a type designator and a reference counter.

In Python, things work more simply.
Names have no types; as stated earlier, types live with objects, not names. In the preceding
listing, we’ve simply changed a to reference different objects. Because variables
have no type, we haven’t actually changed the type of the variable a; we’ve simply made
the variable reference a different type of object. In fact, again, all we can ever say about
a variable in Python is that it references a particular object at a particular point in time. 

  變量名沒有類型,類型屬於對象(因為變量引用對象,所以類型隨對象)在Python中,變量是一種特定類型對象一個特定的時間點的引用。

2、共享引用

>>> a = 3
>>> b = a
>>>
>>> id(a)
1747479616
>>> id(b)
1747479616
>>>
>>> hex(id(a))
'0x68286c40'
>>> hex(id(b))
'0x68286c40'
>>> 

  

This scenario in Python—with multiple names referencing the same object—is usually
called a shared reference (and sometimes just a shared object). Note that the names a
and b are not linked to each other directly when this happens; in fact, there is no way
to ever link a variable to another variable in Python. 
Rather, both variables point to the same object via their references.

1、id() 是 python 的內置函數,用於返回對象的標識,即對象的內存地址。

>>> help(id)
Help on built-in function id in module builtins:

id(obj, /)
    Return the identity of an object.
    
    This is guaranteed to be unique among simultaneously existing objects.
    (CPython uses the object's memory address.)

2、引用所指判斷

  通過is進行引用所指判斷,is是用來判斷兩個引用所指的對象是否相同。

整數

>>> a = 256
>>> b = 256
>>> a is b
True
>>> c = 257
>>> d = 257
>>> c is d
False
>>> 

短字符串

>>> e = "Explicit"
>>> f = "Explicit"
>>> e is f
True
>>> 

長字符串

>>> g = "Beautiful is better"
>>> h = "Beautiful is better"
>>> g is h
False
>>> 

列表

>>> lst1 = [1, 2, 3]
>>> lst2 = [1, 2, 3]
>>> lst1 is lst2
False
>>> 

由運行結果可知:

  1、Python緩存整數短字符串,因此每個對象在內存中只存有一份,引用所指對象就是相同的,即使使用賦值

    語句,也只是創造新的引用,而不是對象本身;

  2、Python沒有緩存長字符串、列表及其他對象,可以由多個相同的對象,可以使用賦值語句創建出新的對象。

原理:

# 兩種優化機制: 代碼塊內的緩存機制, 小數據池。

# 代碼塊
代碼全都是基於代碼塊去運行的(好比校長給一個班發布命令),一個文件就是一個代碼塊。
不同的文件就是不同的代碼塊。

# 代碼塊內的緩存機制
Python在執行同一個代碼塊的初始化對象的命令時,會檢查是否其值是否已經存在,如果存在,會將其重用。
換句話說:執行同一個代碼塊時,遇到初始化對象的命令時,他會將初始化的這個變量與值存儲在一個字典中,
在遇到新的變量時,會先在字典中查詢記錄,
如果有同樣的記錄那么它會重復使用這個字典中的之前的這個值。
所以在文件執行時(同一個代碼塊)會把兩個變量指向同一個對象,
滿足緩存機制則他們在內存中只存在一個,即:id相同。

注意:
# 機制只是在同一個代碼塊下!!!,才實行。
# 滿足此機制的數據類型:int str bool。


# 小數據池(駐留機制,駐村機制,字符串的駐存機制,字符串的緩存機制等等)
不同代碼塊之間的優化。
# 適應的數據類型:str bool int
int: -5 ~256
str: 一定條件下的str滿足小數據池。
bool值 全部。


# 總結:
如果你在同一個代碼塊中,用同一個代碼塊中的緩存機制。
如果你在不同代碼塊中,用小數據池。

# 優點:
1,節省內存。
2,提升性能。

  github上有詳細的例子,wtfpython

3、查看對象的引用計數

  在Python中,每個對象都有指向該對象的引用總數 --- 引用計數

  查看對象的引用計數:sys.getrefcount()

 當對變量重新賦值時,它原來引用的值去哪啦?比如下面的例子,給 s 重新賦值 字符串 apple,6 跑哪里去啦?

>>> s = 6
>>> s = 'apple'

答案是:當變量重新賦值時,它原來指向的對象(如果沒有被其他變量或對象引用的話)的空間可能被收回(垃圾回收

The answer is that in Python, whenever a name is assigned to a new object, the space
held by the prior object is reclaimed if it is not referenced by any other name or object.
This automatic reclamation of objects’ space is known as garbage collection, and makes
life much simpler for programmers of languages like Python that support it.

普通引用

>>> import sys
>>> 
>>> a = "simple"
>>> sys.getrefcount(a)
2
>>> b = a
>>> sys.getrefcount(a)
3
>>> sys.getrefcount(b)
3
>>> 

  注意:當使用某個引用作為參數,傳遞給getrefcount()時,參數實際上創建了一個臨時的引用。因此,getrefcount()所得到的結果,會比期望的多1

三、垃圾回收

  當Python中的對象越來越多,占據越來越大的內存,啟動垃圾回收(garbage collection),將沒用的對象清除。

1、原理

  當Python的某個對象的引用計數降為0時,說明沒有任何引用指向該對象,該對象就成為要被回收的垃圾。

比如某個新建對象,被分配給某個引用,對象的引用計數變為1。如果引用被刪除,對象的引用計數為0,那么該對象就可以被垃圾回收。

Internally, Python accomplishes this feat by keeping a counter in every object that keeps
track of the number of references currently pointing to that object. As soon as (and
exactly when) this counter drops to zero, the object’s memory space is automatically
reclaimed. In the preceding listing, we’re assuming that each time x is assigned to a new
object, the prior object’s reference counter drops to zero, causing it to be reclaimed.

The most immediately tangible benefit of garbage collection is that it means you can
use objects liberally without ever needing to allocate or free up space in your script.
Python will clean up unused space for you as your program runs. In practice, this
eliminates a substantial amount of bookkeeping code required in lower-level languages
such as C and C++.

2、解析del

  del 可以使 對象的引用計數減 1,該表引用計數變為0,用戶不可能通過任何方式接觸或者動用這個對象,當垃圾回收啟動時,Python掃描到這個引用計數為0的對象,就將它所占據的內存清空。

注意

  1、垃圾回收時,Python不能進行其它的任務,頻繁的垃圾回收將大大降低Python的工作效率;

  2、Python只會在特定條件下,自動啟動垃圾回收(垃圾對象少就沒必要回收)

  3、當Python運行時,會記錄其中分配對象(object allocation)和取消分配對象(object deallocation)的次數。

  當兩者的差值高於某個閾值時,垃圾回收才會啟動。

>>> import gc
>>> 
>>> gc.get_threshold() #gc模塊中查看垃圾回收閾值的方法
(700, 10, 10)
>>> 

閾值分析:

  700 即是垃圾回收啟動的閾值;

  每10 次 0代 垃圾回收,會配合 1次 1代 的垃圾回收;而每10次1代的垃圾回收,才會有1次的2代垃圾回收;

當然也是可以手動啟動垃圾回收:

>>> gc.collect()       #手動啟動垃圾回收
52
>>> gc.set_threshold(666, 8, 9) # gc模塊中設置垃圾回收閾值的方法
>>> 

何為分代回收

  • Python將所有的對象分為0,1,2三代;
  • 所有的新建對象都是0代對象;
  • 當某一代對象經歷過垃圾回收,依然存活,就被歸入下一代對象。
分代技術是一種典型的以空間換時間的技術,這也正是java里的關鍵技術。這種思想簡單點說就是:對象存在時間越長,越可能不是垃圾,應該越少去收集。
這樣的思想,可以減少標記-清除機制所帶來的額外操作。分代就是將回收對象分成數個代,每個代就是一個鏈表(集合),代進行標記-清除的時間與代內對象
存活時間成正比例關系。
從上面代碼可以看出python里一共有三代,每個代的threshold值表示該代最多容納對象的個數。默認情況下,當0代超過700,或1,2代超過10,垃圾回收機制將觸發。
0代觸發將清理所有三代,1代觸發會清理1,2代,2代觸發后只會清理自己。

標記-清除

標記-清除機制,顧名思義,首先標記對象(垃圾檢測),然后清除垃圾(垃圾回收)。
首先初始所有對象標記為白色,並確定根節點對象(這些對象是不會被刪除),標記它們為黑色(表示對象有效)。
將有效對象引用的對象標記為灰色(表示對象可達,但它們所引用的對象還沒檢查),檢查完灰色對象引用的對象后,將灰色標記為黑色。
重復直到不存在灰色節點為止。最后白色結點都是需要清除的對象。

如何解決循環引用可能導致的內存泄露問題呢?

More on Python Garbage Collection
Technically speaking, Python’s garbage collection is based mainly upon reference counters, as described here; however, it also has a component that detects and reclaims objects with cyclic references in time. This component can be disabled if you’re sure that your code doesn’t create cycles, but it is enabled by default.
Circular references are a classic issue in reference count garbage collectors. Because references are implemented as pointers, it’s possible for an object to reference itself, or reference another object that does. For example, exercise 3 at the end of Part I and its solution in Appendix D show how to create a cycle easily by embedding a reference to a list within itself (e.g., L.append(L)). The same phenomenon can occur for assignments to attributes of objects created from user-defined classes. Though relatively rare, because the reference counts for such objects never drop to zero, they must be treated specially.
For more details on Python’s cycle detector, see the documentation for the gc module in Python’s library manual. The best news here is that garbage-collection-based memory management is implemented for you in Python, by people highly skilled at the task.

  答案是:

  1. 弱引用   使用weakref 模塊下的 ref 方法
  2. 強制把其中一個引用變成 None
import gc
import objgraph
import sys
import weakref


def quote_demo():
    class Person:
        pass

    p = Person()  # 1
    print(sys.getrefcount(p))  # 2  first

    def log(obj):
        # 4  second 函數執行才計數,執行完釋放
        print(sys.getrefcount(obj))

    log(p)  # 3

    p2 = p  # 2
    print(sys.getrefcount(p))  # 3
    del p2
    print(sys.getrefcount(p))  # 3 - 1 = 2


def circle_quote():
    # 循環引用
    class Dog:
        pass

    class Person:
        pass

    p = Person()
    d = Dog()

    print(objgraph.count("Person"))
    print(objgraph.count("Dog"))

    p.pet = d
    d.master = p

    # 刪除 p, d之后, 對應的對象是否被釋放掉
    del p
    del d

    print(objgraph.count("Person"))
    print(objgraph.count("Dog"))


def solve_cirecle_quote():
    # 1. 定義了兩個類
    class Person:
        def __del__(self):
            print("Person對象, 被釋放了")

        pass

    class Dog:
        def __del__(self):
            print("Dog對象, 被釋放了")

        pass

    p = Person()
    d = Dog()

    p.pet = d
    d.master = p

    p.pet = None  # 強制置 None
    del p
    del d

    gc.collect()

    print(objgraph.count("Person"))
    print(objgraph.count("Dog"))


def sovle_circle_quote_with_weak_ref():
    # 1. 定義了兩個類
    class Person:
        def __del__(self):
            print("Person對象, 被釋放了")

        pass

    class Dog:
        def __del__(self):
            print("Dog對象, 被釋放了")

        pass

    p = Person()
    d = Dog()

    p.pet = d
    d.master = weakref.ref(p)

    del p
    del d

    gc.collect()

    print(objgraph.count("Person"))
    print(objgraph.count("Dog"))


if __name__ == "__main__":
    quote_demo()
    circle_quote()
    solve_cirecle_quote()
    sovle_circle_quote_with_weak_ref()

四、內存池機制

  Python中有分為大內存和小內存:(256K為界限分大小內存)

  1. 大內存使用malloc進行分配
  2. 小內存使用內存池進行分配
  3. Python的內存池(金字塔)

  第+3層:最上層,用戶對Python對象的直接操作

  第+1層和第+2層:內存池,有Python的接口函數PyMem_Malloc實現

    • 若請求分配的內存在1~256字節之間就使用內存池管理系統進行分配,調用malloc函數分配內存,
    • 但是每次只會分配一塊大小為256K的大塊內存,不會調用free函數釋放內存,將該內存塊留在內存池中以便下次使用

  第0層:大內存  -----> 若請求分配的內存大於256K,malloc函數分配內存,free函數釋放內存。

  第-1,-2層:操作系統進行操作

 
       


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM