Erlang process structure -- refc binary


Erlang 的process 是虛擬機層面的進程,每個Erlang process 都包括一個 pcb(process control block), 一個stack 以及私有heap .

這部分的姿勢, 在各種論文中都有提到. 網上也有各種各樣的解讀,包括但不僅限於:

1, http://fengchj.com/?p=2255

2, http://blog.csdn.net/mycwq/article/details/26613275

那么, 從現有的資料,可以看出,正因為在Erlang 虛擬機內部,每個進程(process)都有自己的PCB,自己的stack和自己的私有heap(注意,現在的Erlang還不支持shared heap). Erlang的 GC 並不是"stop whe whole world",只是針對於每一個進程而言的.

Each process’ heap is garbage collected independently. Thus when one scheduler is collecting garbage for a process, other schedulers can keep executing other processes.

 

但是,並不是所有的數據都是private heap 的, 同樣也有一些數據, 是存放在shared 區域的.

In addition, binaries larger than 64 bytes are stored in a common heap shared by all processes. ETS tables are also stored in a common heap.

binaries larger than 64 bytes, 也就是大家常說到的refc binaries, 這部分的解讀網上也有很多, 舉個栗子:

1, http://blog.csdn.net/zhongruixian/article/details/9450361

這個時候, Erlang的GC,就有可能遇到一些問題了,因為Erlang 虛擬機對待這部分shared memory 的GC,是采取引用計數器的.

 

然后,看一段完整的小程序:

 1 -module(refc_binary_test).
 2 
 3 -export ([start/0,
 4           handle_big_binary/1]).
 5 
 6 start() ->
 7     Me = erlang:self(),
 8     erlang:spawn(?MODULE, handle_big_binary, [Me]),
 9     receive
10         {ok, C} ->
11             io:format("----- get_bin_address C : ~p~n", [test:get_bin_address(C)]),
12             io:format("------- handled ~p~n", [erts_debug:get_internal_state({binary_info, C})]),
13             timer:sleep(1000000),
14             C;
15         _ ->
16             error
17     after 10000 ->
18             error
19     end.
20 
21 handle_big_binary(Me) ->
22     A = binary:copy(<<1>>, 1024*1024),
23     io:format("----- get_bin_address A : ~p~n", [test:get_bin_address(A)]),
24     io:format("------- resource ~p~n", [erts_debug:get_internal_state({binary_info, A})]),
25     <<B:1/binary, _/binary>> = A,
26     io:format("----- get_bin_address B : ~p~n", [test:get_bin_address(B)]),
27     erlang:send(Me, {ok, B}).

熟悉Ejabberd 的人,對這一模式應該不會陌生, handle_big_binary/1 的execute 進程可以映射到ejabberd 中的ejabberd_receiver module, start/0 可以看做是ejabberd 中的c2s 進程. ok, 和TCP socket 直接關聯的進程會解析socket 數據, 解析完成后, 交給實際處理進程.

3> refc_binary_test:start().
----- get_bin_address A : "bin: size=1048576, ptr=0x18fc0040"
------- resource {refc_binary,1048576,{binary,1048576},0}
----- get_bin_address B : "bin: size=1, ptr=0x18fc0040"
----- get_bin_address C : "bin: size=1, ptr=0x18fc0040"
------- handled {refc_binary,1,{binary,1048576},0}

那么, 這個時候, 可以看到 變量 C 還占據這'{binary,1048576}' 的數據(1048576 是binary 的orig_size),即便是handle_big_binary 的進程在send/2 之后就已經結束生命. 然后, 可以放大一下這個小問題:

4> [proc_lib:spawn(refc_binary_test, start, []) || _ <- lists:seq(1, 1000)].

然后就會看到, beam.smp 進程已經占用了超過了1G的內存:(.

 

那,how fix ?

在binary module 中,referenced_byte_size/1 func:

If a binary references a larger binary (often described as being a sub-binary), it can be useful to get the size of the actual referenced binary. This function can be used in a program to trigger the use of copy/1. By copying a binary, one might dereference the original, possibly large, binary which a smaller binary is a reference to.

1 store(Binary, GBSet) ->
2   NewBin =
3       case binary:referenced_byte_size(Binary) of
4           Large when Large > 2 * byte_size(Binary) ->
5              binary:copy(Binary);
6           _ ->
7              Binary
8       end,
9   gb_sets:insert(NewBin,GBSet).

然后, 在看binary:copy/1 的function 描述:

This function will always create a new binary, even if N = 1. By using copy/1 on a binary referencing a larger binary, one might free up the larger binary for garbage collection.

這個時候再去refc_binary_test:start(). 就不會出現上面的問題了.

 

總結:

1, refc binary 的存儲使用的是shared memory;

2, 對於refc binary 的GC 策略是引用計數器;

3, refc binary 的內存分配是連續的.


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM