原文鏈接:
python的垃圾回收機制及循環引用 - libochou - 博客園 https://www.cnblogs.com/libochou/p/10150048.html
[轉]java垃圾回收之循環引用 - kkmm - 博客園 https://www.cnblogs.com/lihaozy/archive/2013/06/08/3125974.html
3. Recycling techniques — Memory Management Reference 4.0 documentation
https://www.memorymanagement.org/mmref/recycle.html
python的垃圾回收機制及循環引用
引用計數
Python語言默認采用的垃圾收集機制是『引用計數法 Reference Counting』,該算法最早George E. Collins在1960的時候首次提出,50年后的今天,該算法依然被很多編程語言使用,『引用計數法』的原理是:每個對象維護一個ob_ref
字段,用來記錄該對象當前被引用的次數,每當新的引用指向該對象時,它的引用計數ob_ref加1,每當該對象的引用失效時計數ob_ref減1,一旦對象的引用計數為0,該對象立即被回收,對象占用的內存空間將被釋放。它的缺點是需要額外的空間維護引用計數,這個問題是其次的,不過最主要的問題是它不能解決對象的“循環引用”,因此,也有很多語言比如Java並沒有采用該算法做來垃圾的收集機制。
什么是循環引用?A和B相互引用而再沒有外部引用A與B中的任何一個,它們的引用計數雖然都為1,但顯然應該被回收,例子:
a= { } #對象A的引用計數為 1
del
語句后,A、B對象已經沒有任何引用指向這兩個對象,但是這兩個對象各包含一個對方對象的引用,雖然最后兩個對象都無法通過其它變量來引用這兩個對象了,這對GC來說就是兩個非活動對象或者說是垃圾對象,但是他們的引用計數並沒有減少到零。因此如果是使用引用計數法來管理這兩對象的話,他們並不會被回收,它會一直駐留在內存中,就會造成了內存泄漏(內存空間在使用完畢后未釋放)。
為了解決對象的循環引用問題,Python引入了標記-清除和分代回收兩種GC機制。
標記清除
『標記清除(Mark—Sweep)』算法是一種基於追蹤回收(tracing GC)技術實現的垃圾回收算法。它分為兩個階段:第一階段是標記階段,GC會把所有的『活動對象』打上標記,第二階段是把那些沒有標記的對象『非活動對象』進行回收。那么GC又是如何判斷哪些是活動對象哪些是非活動對象的呢?
對象之間通過引用(指針)連在一起,構成一個有向圖,對象構成這個有向圖的節點,而引用關系構成這個有向圖的邊。從根對象(root object)出發,沿着有向邊遍歷對象,可達的(reachable)對象標記為活動對象,不可達的對象就是要被清除的非活動對象。根對象就是全局變量、調用棧、寄存器。
在上圖中,我們把小黑圈視為全局變量,也就是把它作為root object,從小黑圈出發,對象1可直達,那么它將被標記,對象2、3可間接到達也會被標記,而4和5不可達,那么1、2、3就是活動對象,4和5是非活動對象會被GC回收。
標記清除算法作為Python的輔助垃圾收集技術主要處理的是一些容器對象,比如list、dict、tuple,instance等,因為對於字符串、數值對象是不可能造成循環引用問題。Python使用一個雙向鏈表將這些容器對象組織起來。不過,這種簡單粗暴的標記清除算法也有明顯的缺點:清除非活動的對象前它必須順序掃描整個堆內存,哪怕只剩下小部分活動對象也要掃描所有對象。
分代回收
分代回收是一種以空間換時間的操作方式,Python將內存根據對象的存活時間划分為不同的集合,每個集合稱為一個代,Python將內存分為了3“代”,分別為年輕代(第0代)、中年代(第1代)、老年代(第2代),他們對應的是3個鏈表,它們的垃圾收集頻率與對象的存活時間的增大而減小。新創建的對象都會分配在年輕代,年輕代鏈表的總數達到上限時,Python垃圾收集機制就會被觸發,把那些可以被回收的對象回收掉,而那些不會回收的對象就會被移到中年代去,依此類推,老年代中的對象是存活時間最久的對象,甚至是存活於整個系統的生命周期內。同時,分代回收是建立在標記清除技術基礎之上。分代回收同樣作為Python的輔助垃圾收集技術處理那些容器對象。
參考:
- http://www.memorymanagement.org/mmref/recycle.html#tracing-collectors
- 《垃圾回收的算法與實現》
- 《Python源碼剖析》
java垃圾回收之循環引用
工作原理:為每個內存對象維護一個引用計數。
當有新的引用指向某對象時就將該對象的引用計數加一,當指向該對象的引用被銷毀時將該計數減一,當計數歸零時,就回收該對象所占用的內存資源。
缺陷:在每次內存對象被引用或引用被銷毀的時候都必須修改引用計數,這類操作被稱為footprint。引用計數的footprint是很高的。這使得程序整體的性能受到比較大的影響。因此多數現代的程序語言都不適用引用計數作為垃圾收集的實現算法。
另外,引用計數還有一個致命的缺陷,當程中出現序循環引用時,引用計數算法無法檢測出來,被循環引用的內存對象就成了無法回收的內存。從而引起內存泄露。
舉例說明就是:
class A{ public B b; } class B{ public A a; } public class Main{ public static void main(String[] args){ A a = new A(); B b = new B(); a.b=b; b.a=a; } }
在函數的結尾,a和b的計數均為2
先撤銷a,然后a的計數為1,在等待b.a對a的引用的撤銷,也就是在等待b的撤銷
對於b來講,也是同理
兩個對象都在等待對方撤銷,所有這兩個資源均不能釋放
垃圾回收技術(Recycling techniques)
自動內存管理器有很多方法可以確定不再需要哪些內存。基本上,垃圾回收依賴於確定哪些塊未被任何程序變量指向。下面簡要介紹了一些用於執行此操作的技術,但是存在許多潛在的陷阱,並且可能有許多改進。這些技術通常可以結合使用。
There are many ways for automatic memory managers to determine what memory is no longer required. In the main, garbage collection relies on determining which blocks are not pointed to by any program variables. Some of the techniques for doing this are described briefly below, but there are many potential pitfalls, and many possible refinements. These techniques can often be used in combination.
1.掃描收集器(Tracing collectors)
Automatic memory managers that follow pointers to determine which blocks of memory are reachable from program variables (known as the root set) are known as tracing collectors. The classic example is the mark-sweep collector.
- 標記-清除(Mark-sweep collection)
此算法執行分兩階段。第一階段從引用根節點開始標記所有被引用的對象,第二階段遍歷整個堆,把未標記的對象清除。此算法需要暫停整個應用,同時,會產生內存碎片。
In a mark-sweep collection, the collector first examines the program variables; any blocks of memory pointed to are added to a list of blocks to be examined. For each block on that list, it sets a flag (the mark) on the block to show that it is still required, and also that it has been processed. It also adds to the list any blocks pointed to by that block that have not yet been marked. In this way, all blocks that can be reached by the program are marked.
In the second phase, the collector sweeps all allocated memory, searching for blocks that have not been marked. If it finds any, it returns them to the allocator for reuse.
Five memory blocks, three of which are reachable from program variables.
In the diagram above, block 1 is directly accessible from a program variable, and blocks 2 and 3 are indirectly accessible. Blocks 4 and 5 cannot be reached by the program. The first step would mark block 1, and remember blocks 2 and 3 for later processing. The second step would mark block 2. The third step would mark block 3, but wouldn’t remember block 2 as it is already marked. The sweep phase would ignore blocks 1, 2, and 3 because they are marked, but would recycle blocks 4 and 5.
The two drawbacks of simple mark-sweep collection are:
---it must scan the entire memory in use before any memory can be freed;
---it must run to completion or, if interrupted, start again from scratch.
If a system requires real-time or interactive response, then simple mark-sweep collection may be unsuitable as it stands, but many more sophisticated garbage collection algorithms are derived from this technique.
- 復制(Copying collection)
After many memory blocks have been allocated and recycled, there are two problems that typically occur:
1、the memory in use is widely scattered in memory, causing poor performance in the memory caches or virtual memory systems of most modern computers (known as poor locality of reference);
2、it becomes difficult to allocate large blocks because free memory is divided into small pieces, separated by blocks in use (known as external fragmentation).
One technique that can solve both these problems is copying garbage collection. A copying garbage collector may move allocated blocks around in memory and adjust any references to them to point to the new location. This is a very powerful technique and can be combined with many other types of garbage collection, such as mark-sweep collection.
The disadvantages of copying collection are:
---it is difficult to combine with incremental garbage collection (see below) because all references must be adjusted to remain consistent;
---it is difficult to combine with conservative garbage collection (see below) because references cannot be confidently adjusted;
---extra storage is required while both new and old copies of an object exist;
---copying data takes extra time (proportional to the amount of live data).
此算法把內存空間划為兩個相等的區域,每次只使用其中一個區域。垃圾回收時,遍歷當前使用區域,把正在使用中的對象復制到另外一個區域中。次算法每次只處理正在使用中的對象,因此復制成本比較小,同時復制過去以后還能進行相應的內存整理,不會出現"碎片"問題。當然,此算法的缺點也是很明顯的,就是需要兩倍內存空間。
- 標記-整理(Mark-Compact)
此算法結合了"標記-清除"和"復制"兩個算法的優點。也是分兩階段,第一階段從根節點開始標記所有被引用對象,第二階段遍歷整個堆,把清除未標記對象並且把存活對象"壓縮"到堆的其中一塊,按順序排放。此算法避免了"標記-清除"的碎片問題,同時也避免了"復制"算法的空間問題。
- 增量收集(Incremental collection)
實施垃圾回收算法,即:在應用進行的同時進行垃圾回收。不知道什么原因JDK5.0中的收集器沒有使用這種算法的。
Older garbage collection algorithms relied on being able to start collection and continue working until the collection was complete, without interruption. This makes many interactive systems pause during collection, and makes the presence of garbage collection obtrusive.
Fortunately, there are modern techniques (known as incremental garbage collection) to allow garbage collection to be performed in a series of small steps while the program is never stopped for long. In this context, the program that uses and modifies the blocks is sometimes known as the mutator. While the collector is trying to determine which blocks of memory are reachable by the mutator, the mutator is busily allocating new blocks, modifying old blocks, and changing the set of blocks it is actually looking at.
Incremental collection is usually achieved with either the cooperation of the memory hardware or the mutator; this ensures that, whenever memory in crucial locations is accessed, a small amount of necessary bookkeeping is performed to keep the collector’s data structures correct.
- 分代(Generational Collecting)
基於對對象生命周期分析后得出的垃圾回收算法。把對象分為年青代、年老代、持久代,對不同生命周期的對象使用不同的算法(上述方式中的一個)進行回收。現在的垃圾回收器(從J2SE1.2開始)都是使用此算法的。
-
保守式GC(Conservative garbage collection)
Although garbage collection was first invented in 1958, many languages have been designed and implemented without the possibility of garbage collection in mind. It is usually difficult to add normal garbage collection to such a system, but there is a technique, known as conservative garbage collection, that can be used.
The usual problem with such a language is that it doesn’t provide the collector with information about the data types, and the collector cannot therefore determine what is a pointer and what isn’t. A conservative collector assumes that anything might be a pointer. It regards any data value that looks like a pointer to or into a block of allocated memory as preventing the recycling of that block.
Note that, because the collector does not know for certain which memory locations contain pointers, it cannot readily be combined with copying garbage collection. Copying collection needs to know where pointers are in order to update them when blocks are moved.
You might think that conservative garbage collection could easily perform quite poorly, leaving a lot of garbage uncollected. In practice, it does quite well, and there are refinements that improve matters further.
2.引用計數(Reference counts)
A reference count is a count of how many references (that is, pointers) there are to a particular memory block from other blocks. It is used as the basis for some automatic recycling techniques that do not rely on tracing.
2.1. Simple reference counting
In a simple reference counting system, a reference count is kept for each object. This count is incremented for each new reference, and is decremented if a reference is overwritten, or if the referring object is recycled. If a reference count falls to zero, then the object is no longer required and can be recycled.
Reference counting is frequently chosen as an automatic memory management strategy because it seems simple to implement using manual memory management primitives. However, it is hard to implement efficiently because of the cost of updating the counts. It is also hard to implement reliably, because the standard technique cannot reclaim objects connected in a loop. In many cases, it is an inappropriate solution, and it would be preferable to use tracing garbage collection instead.
Reference counting is most useful in situations where it can be guaranteed that there will be no loops and where modifications to the reference structure are comparatively infrequent. These circumstances can occur in some types of database structure and some file systems. Reference counting may also be useful if it is important that objects are recycled promptly, such as in systems with tight memory constraints.
2.2. Deferred reference counting
The performance of reference counting can be improved if not all references are taken into account. In one important technique, known as deferred reference counting, only references from other objects are counted, and references from program variables are ignored. Since most of the references to the object are likely to be from local variables, this can substantially reduce the overhead of keeping the counts up to date. An object cannot be reclaimed as soon as its count has dropped to zero, because there might still be a reference to it from a program variable. Instead, the program variables (including the control stack) are periodically scanned, and any objects which are not referenced from there and which have zero count are reclaimed.
Deferred reference counting cannot normally be used unless it is directly supported by the compiler. It’s more common for modern compilers to support tracing garbage collectors instead, because they can reclaim loops. Deferred reference counting may still be useful for its promptness—but that is limited by the frequency of scanning the program variables.
2.3. One-bit reference counting
Another variation on reference counting, known as the one-bit reference count, uses a single bit flag to indicate whether each object has either “one” or “many” references. If a reference to an object with “one” reference is removed, then the object can be recycled. If an object has “many” references, then removing references does not change this, and that object will never be recycled. It is possible to store the flag as part of the pointer to the object, so no additional space is required in each object to store the count. One-bit reference counting is effective in practice because most actual objects have a reference count of one.
2.4. Weighted reference counting
Reference counting is often used for tracking inter-process references for distributed garbage collection. This fails to collect objects in separate processes if they have looped references, but tracing collectors are usually too inefficient as inter-process tracing entails much communication between processes. Within a process, tracing collectors are often used for local recycling of memory.
Many distributed collectors use a technique called weighted reference counting, which reduces the level of communication even further. Each time a reference is copied, the weight of the reference is shared between the new and the old copies. Since this operation doesn’t change the total weight of all references, it doesn’t require any communication with the object. Communication is only required when references are deleted.
參考鏈接:
3. Recycling techniques — Memory Management Reference 4.0 documentation
https://www.memorymanagement.org/mmref/recycle.html#recycling-techniques