[Erlang 0034] Erlang iolist


問題的緣起是芒果在使用mochiweb的過程中遇到一個異常,在google的討論組中我找到了同樣的問題:

=ERROR REPORT==== 7-Apr-2011::18:58:22 === 
"web request failed"
path: "cfsp/entity"
type: error
what: badarg
trace: [{erlang,iolist_size,
[[...]]},
{mochiweb_request,respond,2},
{rest_server_web,loop,1},
{mochiweb_http,headers,5},
{proc_lib,init_p_do_apply,3}]

 

提問者遇到這個異常后判斷是文檔超長造成的,bob在下面的回復首先否定了這個猜測,並把關注點放在了trace信息中明確提示出來的iolist異常上面,他的回復:

I don't think it has anything to do with the size of your document,

your code is somehow returning a value that is not an iolist. Perhaps there is an atom in there, or an integer outside of 0..255 (I guess this is more likely, I don't know xmerl output very well).

1> iolist_size([256]). 
** exception error: bad argument
in function iolist_size/1
called as iolist_size([256])

You probably want UTF-8, so unicode:characters_to_binary(xmerl:export_simple(Res,xmerl_xml)) is my 

guess at what you really want to be doing for the output.

-bob

詳情點擊:http://groups.google.com/group/mochiweb/browse_thread/thread/f67abc113b338bfe?pli=1

按照這個提示給芒果,果然就把問題解決掉了;問題到這里還不能結束,追問一下,Erlang的List和IOList有什么區別?
 

iolist定義

在erlang官方文檔中iolist描述甚少,不過還是可以找到定義:
iodata() = iolist() | binary()
iolist() maybe_improper_list(char() | binary() | iolist(), binary() | [])
maybe_improper_list() maybe_improper_list(any(), any())
byte() 0..255
char() 0..16#10ffffmaybe_improper_list(T) maybe_improper_list(T, any())

或者:

IoData = unicode:chardata()
chardata() = charlist() | unicode_binary()
charlist() = [unicode_char() | unicode_binary() | charlist()]
unicode_binary() = binary()

A binary() with characters encoded in the UTF-8 coding standard.

注意iolist相關的兩個函數,他們接收的參數還可以是binary
iolist_size(Item) -> integer() >= 0

Types:Item = iolist() | binary()

iolist_to_binary(IoListOrBinary) -> binary()

Types:IoListOrBinary = iolist() | binary()
 
            
           
我們動手測試一下:
 Eshell V5.9  (abort with ^G)
1> iolist_size([]).
0
2> iolist_size([<<"anc">>]).
3
3> iolist_size([12,<<"anc">>]).
4
4> iolist_size([12,<<"anc">>,23]).
5
5> iolist_size([12,<<"anc">>,23,<<"king">>]).
9
6> iolist_size([12,<<"anc">>,23,<<"king">>,[23,34,<<"test">>]]).
15
7> iolist_size(<<"abc">>).
3
8> iolist_size(<<>>).
0
9> iolist_size([1234]).
** exception error: bad argument
in function iolist_size/1
called as iolist_size([1234])
10> iolist_size([<<1:1>>]).
** exception error: bad argument
in function iolist_size/1
called as iolist_size([<<1:1>>])
11> iolist_size( [12,23,"abc",<<abc>>]).
** exception error: bad argument
12> iolist_size( [12,23,<<abc>>]).
** exception error: bad argument
13> iolist_size( [12,23,"abc",<<"abc">>]).
8
14> L=[$H, $e, [$l, <<"lo">>, " "], [[["W","o"], <<"rl">>]] | [<<"d">>]].
[72,101,[108,<<"lo">>," "],[[["W","o"],<<"rl">>]],<<"d">>]

iolist適用的場景?

 
首先能夠找到的是mryufeng的這篇《iolist跟list有什么區別?》 http://mryufeng.iteye.com/blog/634867
這篇文章分析源碼得到了iolist數據結構的定義,並在解釋了iolist的作用:
 
Iolist的作用是用於往port送數據的時候.由於底層的系統調用如writev支持向量寫, 就避免了無謂的iolist_to_binary這樣的扁平話操作, 避免了內存拷貝,極大的提高了效率.建議多用.
 
這個是什么意思呢? 在Learn you some Erlang站點上<<Buckets of Sockets>>一文的開篇我找到了答案:
A = [a]
B = [b|A] = [b,a]
C = [c|B] = [c,b,a]
In the case of prepending, as above, whatever is held into A or B or C never needs to be rewritten. The representation of C can be seen as either [c,b,a], [c|B] or [c,|[b|[a]]], among others. In the last case, you can see that the shape of A is the same at the end of the list as when it was declared. Similarly for B. Here's how it looks with appending:

A = [a]
B = A ++ [b] = [a] ++ [b] = [a|[b]]
C = B ++ [c] = [a|[b]] ++ [c] = [a|[b|[c]]]
Do you see all that rewriting? When we create B, we have to rewrite A. When we write C, we have to rewrite B (including the [a|...] part it contains). If we were to add D in a similar manner, we would need to rewrite C. Over long strings, this becomes way too inefficient, and it creates a lot of garbage left to be cleaned up by the Erlang VM.

With binaries, things are not exactly as bad:

A = <<"a">>
B = <<A/binary, "b">> = <<"ab">>
C = <<B/binary, "c">> = <<"abc">>
In this case, binaries know their own length and data can be joined in constant time. That's good, much better than lists. They're also more compact. For these reasons, we'll often try to stick to binaries when using text in the future.

There are a few downsides, however. Binaries were meant to handle things in certain ways, and there is still a cost to modifying binaries, splitting them, etc. Moreover, sometimes we'll work with code that uses strings, binaries, and individual characters interchangeably. Constantly converting between types would be a hassle.

In these cases, IO lists are our saviour. IO lists are a weird type of data structure. They are lists of either bytes (integers from 0 to 255), binaries, or other IO lists. This means that functions that accept IO lists can accept items such as [$H, $e, [$l, <<"lo">>, " "], [[["W","o"], <<"rl">>]] | [<<"d">>]]. When this happens, the Erlang VM will just flatten the list as it needs to do it to obtain the sequence of characters Hello World.

What are the functions that accept such IO Lists? Most of the functions that have to do with outputting data do. Any function from the io module, file module, TCP and UDP sockets will be able to handle them. Some library functions, such as some coming from the unicode module and all of the functions from the re (for regular expressions) module will also handle them, to name a few.

Try the previous Hello World IO List in the shell with io:format("~s~n", [IoList]) just to see. It should work without a problem.

All in all, they're a pretty clever way of building strings to avoid the problems of immutable data structures when it comes to dynamically building content to be output.
簡單說明一下上面的內容:
|->如果是在List頭部追加內容是非常快速的,但是在List尾部追加內容就要進行遍歷
 -> 使用binary數據可以在常量時間內完成尾部追加,但是問題:①修改和split存在消耗 ;②字符和二進制數據的常量轉換
 -> iolist對這種數據混搭有一個較好的支持,Erlang VM會將list平鋪,可以使用io:format來檢驗各種數據構成的iolist輸出之后的結果
 -> 總結 iolist是單次賦值約束下,動態構建字符串內容輸出的好方法;
 
我們可以通過erlc +\'to_core\' M.erl 的方法(參見:[Erlang 0029] Erlang Inline編譯)查看一下iolist的 Core Erlang表示:
在Core Erlang中List = [1,2,3,4,5,6,7,8,9],會被表示為:[1|[2|[3|[4|[5|[6|[7|[8|[9]]]]]]]]]
看下
 L=[$H, $e, [$l, <<"lo">>, " "], [[["W","o"], <<"rl">>]] | [<<"d">>]],
iolist_size(L)
轉換為:
 do  %% Line 14
call 'erlang':'iolist_size'
([72|[101|[[108|[#{#<108>(8,1,'integer',['unsigned'|['big']]),
#<111>(8,1,'integer',['unsigned'|['big']])}#|[[32]]]]|[[[[[87]|[[111]]]|[#{#<114>(8,1,'integer',['unsigned'|['big']]),
#<108>(8,1,'integer',['unsigned'|['big']])}#]]]|[#{#<100>(8,1,'integer',['unsigned'|['big']])}#]]]]])

 

相關閱讀

Stackoverflow上有人提到了同樣的問題:

Ports, external or linked-in, accept something called io-lists for sending data to them. An io-list is a binary or a (possibly deep) list of binaries or integers in the range 0..255.
This means that rather than concatenating two lists before sending them to a port, one can just send them as two items in a list. So instead of
"foo" ++ "bar"
one do
["foo", "bar"]
In this example it is of course of miniscule difference. But the iolist in itself allows for convenient programming when creating output data. io_lib:format/2,3 itself returns an io list for example.
The function erlang:list_to_binary/1 accepts io lists, but now we have erlang:iolist_to_binary/1 which convey the intention better. There is also an erlang:iolist_size/1.
Best of all, since files and sockets are implemented as ports, you can send iolists to them. No need to flatten or append.

還有這一篇:A Ramble Through Erlang IO Lists http://prog21.dadgum.com/70.html
  The IO List is a handy data type in Erlang, but not one that's often discussed in tutorials. It's any binary. Or any list containing integers between 0 and 255. Or any arbitrarily nested list containing either of those two things. Like this:
[10, 20, "hello", <<"hello",65>>, [<<1,2,3>>, 0, 255]]
The key to IO lists is that you never flatten them. They get passed directly into low-level runtime functions (such as file:write_file), and the flattening happens without eating up any space in your Erlang process. Take advantage of that! Instead of appending values to lists, use nesting instead. For example, here's a function to put a string in quotes:
quote(String) -> $" ++ String ++ $".
If you're working with IO lists, you can avoid the append operations completely (and the second "++" above results in an entirely new version of String being created). This version uses nesting instead:
quote(String) -> [$", String, $"].
This creates three list elements no matter how long the initial string is. The first version creates length(String) + 2 elements. It's also easy to go backward and un-quote the string: just take the second list element. Once you get used to nesting you can avoid most append operations completely. 

One thing that nested list trick is handy for is manipulating filenames. Want to add a directory name and ".png" extension to a filename? Just do this:
[Directory, $/, Filename, ".png"]
Unfortunately, filenames in the file module are not true IO lists. You can pass in deep lists, but they get flattened by an Erlang function (file:file_name/1), not the runtime system. That means you can still dodge appending lists in your own code, but things aren't as efficient behind the scenes as they could be. And "deep lists" in this case meansonly lists, not binaries. Strangely, these deep lists can also contain atoms, which get expanded via atom_to_list

Ideally filenames would be IO lists, but for compatibility reasons there's still the need to support atoms in filenames. That brings up an interesting idea: why not allow atoms as part of the general IO list specification? It makes sense, as the runtime system has access to the atom table, and there's a simple correspondence between an atom and how it gets encoded in a binary; 'atom' is treated the same as "atom". I find I'm often calling atom_to_list before sending data to external ports, and that would no longer be necessary.
 
總結
  iolist是單次賦值約束下,避免了字符串和二進制數據的轉換,是動態構建字符串內容輸出的好方法;

 
 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM