[Erlang 0116] 當我們談論Erlang Maps時,我們談論什么 Part 1

本文轉載自查看原文 2014-03-03 23:25 3088 maps/ erlang/ Erlang

Erlang 增加 Maps數據類型並不是很突然,因為這個提議已經進行了2~3年之久,只不過Joe Armstrong老爺子最近一篇文章 Big changes to Erlang掀起不小了風浪.這篇文章用了比較誇張的說法:"Records are dead - long live maps !",緊接着在國內國外社區這句話就傳遍了.馬上就有開發者憂心忡忡的在Stackoverflow上提問: Will Erlang R17 still have records?

套用一句文藝的話,當我們談論Maps時,實際上是表達我們對record的不滿,這些不滿/痛點恰好就是我們寄希望於Maps能夠提供給我們的.本文將盡可能的逐一列出這些點,並嘗試分析原因,下篇文章將深入分析Maps的一些細節.

Record的痛點

使用Record我們遇到哪些痛點呢?這些痛點在Maps出現之后有所改善嗎?我們先從細數痛點開始:

1.可以把record的name用作參數嗎?

簡單講就是#RecordName{} 可以嗎?

7> rd(person,{name,id}).
person
8> #person{}.
#person{name = undefined,id = undefined}
9> P=person.
person
10> #P{}.
* 1: syntax error before: P
10>

Is it possible to use record name as a parameter in Erlang
http://stackoverflow.com/questions/4103731/is-it-possible-to-use-record-name-as-a-parameter-in-erlang

2.可以把record的filed作為參數使用嗎?

10> N=name.
name
11> #person{N="zen"}.
* 1: field 'N' is not an atom or _ in record person
12>

Modify a record in Erlang by programmatically specifying the field to modify

http://stackoverflow.com/questions/13188449/modify-a-record-in-erlang-by-programmatically-specifying-the-field-to-modify/13188717#13188717

解決這個問題可以關注dynarec項目,可以動態生成record字段值的getter和setter訪問入口. https://github.com/jcomellas/mlapi/blob/master/src/dynarec.erl

3. a.b.c.d.e.f 能實現嗎?

在有些語言中會有Fluent API(或 Fluent Interface)的設計,目的是在語法層面方便完成一系列連續的邏輯.在使用嵌套record的時候,我們特別希望能用a.b.c.d.e.f的方式來簡化代碼,而實際上是下面這個樣子:

Eshell V6.0  (abort with ^G)
1> rd(foo,{a,b,c}).
foo
2>  rd(a,{f,m}).
a
3>  rd(f,{id,name}).
f
4>  #foo{a=#a{f=#f{id=2002,name="zen"},m=1984},b=1234,c=2465}.
#foo{a = #a{f = #f{id = 2002,name = "zen"},m = 1984},
     b = 1234,c = 2465}
5> D=v(4).
#foo{a = #a{f = #f{id = 2002,name = "zen"},m = 1984},
     b = 1234,c = 2465}
6> D#foo.a#a.f#f.name.
"zen"

有一個開源項目recbird就可以實現這種效果,解決的路子當然是parse_transform, 需要在代碼中添加-compile({parse_transform, recbird}).選項

recbird的作者是dcaoyuan,這個代碼也是作為ErlyBird的一部分host在sourceforge:

http://sourceforge.net/p/erlybird/code/HEAD/tree/trunk/erlybird/erlang-snippets/recbird.erl

4.record轉proplists proplists轉record

為什么要轉換properlist?其目的就是方便檢索字段值.

這個之前討論過 http://www.cnblogs.com/me-sa/archive/2012/05/22/erlang-code-snippet-2.html

record_info擴展項目 https://github.com/hio/erlang-record_info/blob/master/src/record_info.erl

5.key只能是atom

的確有人提過這個

6.record往往要定義在hrl中

原因何在?

在record相關的問題中,常常提到的一個詞就是"compile-time dependency",即record只存在於編譯時,並沒有對應實際的數據類型.record本質上是tuple在語法層面的語法糖,而上面record的諸多問題其實就是源於tuple,在著名的 exprecs項目,有這樣一段描述:

This parse transform can be used to reduce compile-time dependencies in large systems.

In the old days, before records, Erlang programmers often wrote access functions for tuple data. This was tedious and error-prone. The record syntax made this easier, but since records were implemented fully in the pre-processor, a nasty compile-time dependency was introduced.

This module automates the generation of access functions for records. While this method cannot fully replace the utility of pattern matching, it does allow a fair bit of functionality on records without the need for compile-time dependencies.

Record即Tuple

在內部表示沒有record只有tuple, 下面是Erlang數據內部表示的介紹,我做了一個長圖:

源文檔地址: http://www.erlang-factory.com/upload/presentations/467/Halfword_EUC_2011.pdf (這個文檔在我們的 Erlang Resources 小站多次推薦過)

這幾張圖可以幫助我們建立起來Erlang數據內部表示的思考模型,我們簡單梳理一下:

Beam(Björns/Bogdans Erlang Abstract Machine)虛擬機,包含一個擁有1024個虛擬寄存器的虛擬寄存器機,程序變量可能存儲在register或stack;垃圾回收是以進程為單位,逐代進行;Beam包含一個常量池( constant pool)不被GC.大型二進制數據在Heap外,並可被多個進程共享;VM Code中用來表達數據類型使用的概念是Eterm:一個Eterm通常一個字(word)大小( sizeof(void *)),進程的Heap實際上就是Eterm構成的數組,ETS也是以Eterm的形式存儲數據.寄存器(register)也是Eterm,VM中的stack也是由Eterm組成;VM需要在進程heap上分配一些Eterm來表示一些復雜的數據結構比如list,tuple;如果變量指向的數據復雜,那么stack/register會包含指向heap的指針,換句話話說,Eterm要支持指針;

Eterm其實是使用一些二進制數據位來標記當前的數據類型,Erlang使用了一個層次化的標記系統,最基礎的是使用最低兩位primary tags來標識:

00 = Continuation pointer (return address on stack) or header word on heap
01 = Cons cell (list)
10 = Boxed (tuple, float, bignum, binary, external pid/port, exterrnal/internal ref ...)
11 = Immediate (the rest - secondary tag present)

具體到Boxed類型,繼續細分:

– 0000 = Tuple
– 0001 = Binary match state (internal type)
– 001x = Bignum (needs more than 28 bits)
– 0100 = Ref
– 0101 = Fun
– 0110 = Float
– 0111 = Export fun (make_fun/3)
– 1000 - 1010 = Binaries
– 1100 - 1110 = External entities (Pids, Ports and Refs)

看到了吧,這里已經沒有record的蹤影了,只有tuple,而對於Maps,我們已經可以在17.0-rc2/erts/emulator/beam/erl_term.h的代碼中找到它的subtag:

#define ARITYVAL_SUBTAG          (0x0 << _TAG_PRIMARY_SIZE) /* TUPLE */
#define BIN_MATCHSTATE_SUBTAG     (0x1 << _TAG_PRIMARY_SIZE)
#define POS_BIG_SUBTAG          (0x2 << _TAG_PRIMARY_SIZE) /* BIG: tags 2&3 */
#define NEG_BIG_SUBTAG          (0x3 << _TAG_PRIMARY_SIZE) /* BIG: tags 2&3 */
#define _BIG_SIGN_BIT          (0x1 << _TAG_PRIMARY_SIZE)
#define REF_SUBTAG          (0x4 << _TAG_PRIMARY_SIZE) /* REF */
#define FUN_SUBTAG          (0x5 << _TAG_PRIMARY_SIZE) /* FUN */
#define FLOAT_SUBTAG          (0x6 << _TAG_PRIMARY_SIZE) /* FLOAT */
#define EXPORT_SUBTAG          (0x7 << _TAG_PRIMARY_SIZE) /* FLOAT */
#define _BINARY_XXX_MASK     (0x3 << _TAG_PRIMARY_SIZE)
#define REFC_BINARY_SUBTAG     (0x8 << _TAG_PRIMARY_SIZE) /* BINARY */
#define HEAP_BINARY_SUBTAG     (0x9 << _TAG_PRIMARY_SIZE) /* BINARY */
#define SUB_BINARY_SUBTAG     (0xA << _TAG_PRIMARY_SIZE) /* BINARY */
#define MAP_SUBTAG          (0xB << _TAG_PRIMARY_SIZE) /* MAP */
#define EXTERNAL_PID_SUBTAG     (0xC << _TAG_PRIMARY_SIZE) /* EXTERNAL_PID */
#define EXTERNAL_PORT_SUBTAG     (0xD << _TAG_PRIMARY_SIZE) /* EXTERNAL_PORT */
#define EXTERNAL_REF_SUBTAG     (0xE << _TAG_PRIMARY_SIZE) /* EXTERNAL_REF */

感興趣的話,可以繼續在otp_src_17.0-rc2\erts\emulator\beam\erl_term.h中看到tuple實現相關的代碼,搜索/* tuple access methods */代碼段.

看到這里,Stackoverflow 有個問題討論" Does erlang implement record copy-and-modify in any clever way?"

注意里面提到的erts_debug:size/1 和 erts_debug:flat_size/1方法,可以幫助我們查看共享和非共享狀態數據占用的字數.所謂的共享和非共享,就是通過復用一些數據塊(即指針指向)而不是通過數據拷貝,這樣提高效率.在一些萬不得已的情況下再觸發拷貝,比如數據發往別的節點,存入ETS等等, Erlang Efficiency Guide 很多優化的小技巧都是從這個出發點考慮的.

那去掉primary tag和sub tag之后tuple是一個什么樣的數據結構呢?我們可以從兩個角度來看,首先是 Erlang Interface Reference Manual中

erl_mk_tuple方法明確指示了tuple實際上是一個Eterm的數組:

ETERM *erl_mk_tuple(array, arrsize)
Types:
ETERM **array;
int arrsize;
Creates an Erlang tuple from an array of Erlang terms.
array is an array of Erlang terms.
arrsize is the number of elements in array.

另外一個角度就是在 bif.c中,tuple_to_list和list_to_tuple的實現,其實就是數組和鏈表的互相轉換,看代碼還可以知道通過make_arityval(len)冗余了數組的長度.對於tuple,獲得size和按照索引訪問數據都是很快的.這也就是找EEP43中提到過的Record的優勢:

快速查詢 O(1), 編譯期間完成了對key的索引,對於小數據量存取相當快 (~50 values),
沒有過多額外的內存消耗,只有Value和name 2+ N個字 (name + size+ N)
函數頭完成匹配

而編譯期一過,record提供的語法紅利沒有了,剩下的也就是快速獲得tuple size和按照索引訪問數據了.exprecs項目所謂 reduce compile-time dependencies 其實就是在編譯階段把一些語法紅利繼續保持下去,比如可以按照record name去new一個record,按照字段索引位置訪問數據等等.上面提到的record與proplists的轉換,實際上是把解決問題的時機從編譯期推遲到了運行時.

說到這里,你可能非常期待了,Erlang R17之后加入的Maps又解決了什么問題?帶來了什么驚喜呢?Maps與Record是一場你死我活的PK么?我們明天再說,敬請關注.

PS. Joe Armstrong老爺子文章中提到的Names in Funs 之前我們已經討論過多次了:

[Erlang 0050]用fun在Erlang Shell中編寫尾遞歸
http://www.cnblogs.com/me-sa/archive/2012/03/24/you-win-yourself-zen-this-is-the-50-erlang-article-go-on.html

[Erlang 0056] 用fun在Erlang Shell中編寫尾遞歸 Ⅱ
http://www.cnblogs.com/me-sa/archive/2012/04/28/2474892.html

[Erlang 0063] Joe Armstrong 《A Few Improvements to Erlang》EUC 2012
http://www.cnblogs.com/me-sa/archive/2012/06/06/2538941.html

相關資料:

[0] setelement/3 優化 http://www.erlang.org/doc/efficiency_guide/commoncaveats.html#id62422

[1] http://www.erlang.org/doc/man/erl_eterm.html

[2] http://erlang.org/doc/efficiency_guide/users_guide.html

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 [Erlang 0117] 當我們談論Erlang Maps時,我們談論什么 Part 2 當我們看到phpinfo時在談論什么當我們在談論kmeans（2） HMS Core Insights第三期直播預告—— 當我們在談論App的時候，我們還可以談論什么？ HMS Core Insights第三期直播回顧 – 當我們在談論App的時候，我們還可以談論什么？談論XSS [Erlang 0111] Erlang Abstract Format , Part 2 [Erlang 0045] Erlang 雜記 Ⅲ [Erlang 0035] Erlang SMP [Erlang 0046] Erlang Timer