[Erlang-0008][OTP] 高效指南 -- 表和數據庫（ets mnesia）

本文轉載自查看原文 2012-09-27 12:45 7479 Erlang/OTP/ Erlang

原文鏈接：http://www.erlang.org/doc/efficiency_guide/tablesDatabases.html

錯誤之處歡迎指正

7 表和數據庫

7.1 ets，dets和mnesia

每一個Ets的例子都適用於Mnesia。通常所有Ets的例子都適用於Dets表。

Select/Match 操作

Ets和Mnesia的Select/Match操作代價很高。通常需要檢索整張表。你應該盡可能優化你的數據結構，以便最少的使用select/match。但是，如果你確實需要select/match的話，它還是比tab2list高效很多的。接下來的章節會有這方面的例子，包括如何避免使用select/match。函數ets:select/2和mnesia:select/3會優於ets:match/2，ets:match_object/2，mnesia:match_object/3。

注意：也有例外的情況可以不檢索整張表，例如當檢索ordered_set表時一個關鍵字不足以精確查找到結果，或者是Mnesia有第二索引，用這個字段去select/match。如果關鍵字能夠精確匹配出結果，當然select/match是沒有意義的，除非你有一個bag表，並且只對檢索結果的一個子集感興趣。
(NOTE:There are exceptions when the complete table is not scanned, for instance if part of the key is bound when searching an ordered_set table, or if it is a Mnesia table and there is a secondary index on the field that is selected/matched. If the key is fully bound there will, of course, be no point in doing a select/match, unless you have a bag table and you are only interested in a sub-set of the elements with the specific key.)

當創建一個被用作select/match操作的記錄時，想絕大部分字段的值是'_'。最簡單快捷的方法是下面這樣

#person{age = 42, _ = '_'}.

刪除一個元素

刪除操作被當做是成功的，如果一個元素不在表里。因此，刪除之前，所有嘗試去檢測元素是否存在於ets/mnesia表的操作都是非必要的。這里有個ets表的操作。

DO

...
ets:delete(Tab, Key),
...

DO NOT

...
case ets:lookup(Tab, Key) of
    [] ->
        ok;
    [_|_] ->
        ets:delete(Tab, Key)
end,
...

獲取數據

不要重復獲取已有的數據！假設你有一個模塊處理抽象數據類型Person。你導出了一個接口函數print_person/1，它調用了三個內部函數print_name/1, print_age/1, print_occupation/1。

注意：如果函數print_name/1等是接口函數，那完全是另一回事了，因為你不想讓接口使用者知道內部數據結構。

DO

%%% Interface function
print_person(PersonId) ->
    %% Look up the person in the named table person,
    case ets:lookup(person, PersonId) of
        [Person] ->
            print_name(Person),
            print_age(Person),
            print_occupation(Person);
        [] ->
            io:format("No person with ID = ~p~n", [PersonID])
    end.

%%% Internal functions
print_name(Person) -> 
    io:format("No person ~p~n", [Person#person.name]).
                      
print_age(Person) -> 
    io:format("No person ~p~n", [Person#person.age]).

print_occupation(Person) -> 
    io:format("No person ~p~n", [Person#person.occupation]).

DO NOT

%%% Interface function
print_person(PersonId) ->
    %% Look up the person in the named table person,
    case ets:lookup(person, PersonId) of
        [Person] ->
            print_name(PersonID),
            print_age(PersonID),
            print_occupation(PersonID);
        [] ->
            io:format("No person with ID = ~p~n", [PersonID])
    end.

%%% Internal functionss
print_name(PersonID) -> 
    [Person] = ets:lookup(person, PersonId),
    io:format("No person ~p~n", [Person#person.name]).

print_age(PersonID) -> 
    [Person] = ets:lookup(person, PersonId),
    io:format("No person ~p~n", [Person#person.age]).

print_occupation(PersonID) -> 
    [Person] = ets:lookup(person, PersonId),
    io:format("No person ~p~n", [Person#person.occupation]).

非持久性數據存儲

對於非持久性數據庫存儲，Ets表要優於本地Mnesia表。即使Mnesia的dirty_write操作只比ets寫操作代價高那么一點點。Mnesia還必須檢測這個表是否有別的拷貝，或者是否有索引，所以每次dirty_write至少包含一次ets lookup操作。因此ets寫永遠比Mnesia寫快。

tab2list

假設我們有一個ets表，用idno作為key，內容如下：

[#person{idno = 1, name = "Adam",  age = 31, occupation = "mailman"},
 #person{idno = 2, name = "Bryan", age = 31, occupation = "cashier"},
 #person{idno = 3, name = "Bryan", age = 35, occupation = "banker"},
 #person{idno = 4, name = "Carl",  age = 25, occupation = "mailman"}]

如果我們必須得到ets表里所有的數據，可以用ets:tab2list/1。但是通常我們只對部分數據感興趣，這種情況下ets:tab2list/1的代價就太高了。如果我們只想要每個記錄的一個字段，例如年齡，應該這樣做：

DO

...
ets:select(Tab,[{ #person{idno='_', 
                          name='_', 
                          age='$1', 
                          occupation = '_'},
                [],
                ['$1']}]),
...

DO NOT

...
TabList = ets:tab2list(Tab),
lists:map(fun(X) -> X#person.age end, TabList),
...

如果我們只對名叫Bryan的人的年齡感興趣，應該：

DO

...
ets:select(Tab,[{ #person{idno='_', 
                          name="Bryan", 
                          age='$1', 
                          occupation = '_'},
                [],
                ['$1']}]),
...

DO NOT

...
TabList = ets:tab2list(Tab),
lists:foldl(fun(X, Acc) -> case X#person.name of
                                "Bryan" ->
                                    [X#person.age|Acc];
                                 _ ->
                                     Acc
                           end
             end, [], TabList),
...

REALLY DO NOT

...
TabList = ets:tab2list(Tab),
BryanList = lists:filter(fun(X) -> X#person.name == "Bryan" end,
                         TabList),
lists:map(fun(X) -> X#person.age end, BryanList),
...

如果我們需要表中名叫Bryan的人的所有信息：

DO

...
ets:select(Tab, [{#person{idno='_', 
                          name="Bryan", 
                          age='_', 
                          occupation = '_'}, [], ['$_']}]),
...

DO NOT

...
TabList = ets:tab2list(Tab),
lists:filter(fun(X) -> X#person.name == "Bryan" end, TabList),
...

Ordered_set表

如果表中的數據要經常被訪問，那么有序的KEYS是很有意義的。ordered_set類型的表可以用來代替大部分常見的set類型表。ordered_set表的key總是按照Erlang term順序排序，所以select，match_object，foldl的返回值也是根據Key排序的。ordered_set的first和next操作也是按照key的排序返回的。

注意：ordered_set表保證每條記錄都按照key的順序處理。ets:select/2的結果也是按照這個順序，即使結果中不包含key

7.2 ets特性

利用好ets的Key

ets表時一個單key的表（不論哈希表還是樹結構）而且應該只用一個key。換句話說，任何可能的情況下都用key去lookup。一個lookup查詢對set ets表來說代價是一個常數，對於ordered_set ets表來說是O(logN)。用Key去lookup永遠好於需要整表遍歷。上面的例子中，字段idno是表的key，所有用姓名字段來查詢的，都需要遍歷整表來得到匹配結果。

一個簡單的解決辦法就是用name字段代替idno字段作為Key，但是如果名字不唯一就會有問題。更常用的解決方法是創建第二張表，名字為key，idno為數據，把這個表索引到主表的name字段。第二張表必須要和原表保持一致。mnesia可以為你做這些，但是一個自制的索引表會比用mnesia高效的多。

前面例子的索引表必須是個bag表（因為有重復key），內容如下：

[#index_entry{name="Adam", idno=1},
#index_entry{name="Bryan", idno=2},
#index_entry{name="Bryan", idno=3},
#index_entry{name="Carl", idno=4}]

查詢名為Bryan的人的年齡，應該這樣做：

...
MatchingIDs = ets:lookup(IndexTable,"Bryan"),
lists:map(fun(#index_entry{idno = ID}) ->
                 [#person{age = Age}] = ets:lookup(PersonTable, ID),
                 Age
          end,
          MatchingIDs),
...

注意上面的代碼永遠不要用match/2代替lookup/2。lists:map/2只用來遍歷了名為Bryan的數據，所以主表的查詢操作已經最少了。

使用索引表會產生一些開銷，當向表中插入記錄時，因此插入的記錄越多越效率越低。但是記住能用key來查詢元素，意義是很大的。

7.3 mnesia特性

第二索引

如果你經常以非Key字段來查詢表操作，你將會因使用mnesia:select/match_object而損失性能，因為這些函數會遍歷整表。。你可以創建一個第二索引來代替，用mnesia:index_read來快速訪問，但是這也會消耗更多的內存，例如：

-record(person, {idno, name, age, occupation}).
        ...
{atomic, ok} = 
mnesia:create_table(person, [{index,[#person.age]},
                              {attributes,
                                    record_info(fields, person)}]),
{atomic, ok} = mnesia:add_table_index(person, age), 
...

PersonsAge42 =
     mnesia:dirty_index_read(person, 42, #person.age),

事務

事務用來確保分布式mnesia數據庫保持一致，即使許多不同的進程並行更新。但是如果你對實時性要求很高，推薦使用臟操作代替事務。當使用臟操作時會損失一致性保證，通常的解決方法是讓一個進程來更新表。別的進程都發送更新請求給這個進程。

...
% Using transaction

Fun = fun() ->
          [mnesia:read({Table, Key}),
           mnesia:read({Table2, Key2})]
      end, 

{atomic, [Result1, Result2]}  = mnesia:transaction(Fun),
...

% Same thing using dirty operations
...

Result1 = mnesia:dirty_read({Table, Key}),
Result2 = mnesia:dirty_read({Table2, Key2}),
...

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 erlang mnesia數據庫簡單應用 [Erlang-0006][OTP] 高效指南 -- 列表解析 erlang 分布式數據庫Mnesia 實現及應用 Erlang --- ETS表 Erlang大量數據的存儲機制：ETS和DETS [Erlang23]怎么有效的遍歷ETS表？ python和數據庫連接和建表 pg數據庫表接口和數據導出大批量數據高效插入數據庫表 5星評級數據庫表結構如何才能更高效？