[hadoop源碼閱讀][4]-org.apache.hadoop.io

本文轉載自查看原文 2012-06-15 17:32 3029 hadoop原碼閱讀/ hadoop

1.下面是主要的類層次圖

2.Writable和WritableComparable的子類們基本大同小異

3.RawComparator和WritableComparator

舉例如下,以下以text類型的comparator每個字符從高到低位比較,對於數字類型的字符串也是比較適用的

   
   
   
           
    
    
    
            
    
    
    
            /**
    
    
    
             A WritableComparator optimized for Text keys. 
    
    
    
            */
    
    
    
             
    
    
    
            public
    
    
    
             
    
    
    
            static
    
    
    
             
    
    
    
            class
    
    
    
             Comparator 
    
    
    
            extends
    
    
    
             WritableComparator { 
    
    
    
            public
    
    
    
             
    
    
    
            int
    
    
    
             compare(
    
    
    
            byte
    
    
    
            [] b1, 
    
    
    
            int
    
    
    
             s1, 
    
    
    
            int
    
    
    
             l1, 
    
    
    
            byte
    
    
    
            [] b2, 
    
    
    
            int
    
    
    
             s2, 
    
    
    
            int
    
    
    
             l2) { 
    
    
    
            int
    
    
    
             n1 
    
    
    
            =
    
    
    
             WritableUtils.decodeVIntSize(b1[s1]); 
    
    
    
            int
    
    
    
             n2 
    
    
    
            =
    
    
    
             WritableUtils.decodeVIntSize(b2[s2]); 
    
    
    
            return
    
    
    
             compareBytes(b1, s1 
    
    
    
            +
    
    
    
             n1, l1 
    
    
    
            -
    
    
    
             n1, b2, s2 
    
    
    
            +
    
    
    
             n2, l2 
    
    
    
            -
    
    
    
             n2); } }

4.Text類應用廣泛,值得仔細看下

5.InputBuffer和OutputBuffer

6.Hadoop 數據類型與文件結構 Sequence, Map, Set, Array, BloomMap Files

1.Hadoop’s SequenceFile

SequenceFile 是 Hadoop 的一個重要數據文件類型，它提供key-value的存儲，但與傳統key-value存儲（比如hash表，btree）不同的是，它是appendonly的，於是你不能對已存在的key進行寫操作。每一個key-value記錄如下圖，不僅保存了key，value值，也保存了他們的長度。

SequenceFile 有三種壓縮態：

Uncompressed – 未進行壓縮的狀態
Record Compressed - 對每一條記錄的value值進行了壓縮（文件頭中包含上使用哪種壓縮算法的信息）
Block-Compressed – 當數據量達到一定大小后，將停止寫入進行整體壓縮，整體壓縮的方法是把所有的keylength,key,vlength,value 分別合在一起進行整體壓縮

文件的壓縮態標識在文件開頭的header數據中。

在header數據之后是一個Metadata數據，他是簡單的屬性/值對，標識文件的一些其他信息。Metadata 在文件創建時就寫好了，所以也是不能更改的。

2.MapFile, SetFile, ArrayFile 及 BloomMapFile

SequenceFile 是Hadoop 的一個基礎數據文件格式，后續講的 MapFile, SetFile, ArrayFile 及 BloomMapFile 都是基於它來實現的。

MapFile – 一個key-value 對應的查找數據結構，由數據文件/data 和索引文件 /index 組成，數據文件中包含所有需要存儲的key-value對，按key的順序排列。索引文件包含一部分key值，用以指向數據文件的關鍵位置。
SetFile – 基於 MapFile 實現的，他只有key，value為不可變的數據。
ArrayFile – 也是基於 MapFile 實現，他就像我們使用的數組一樣，key值為序列化的數字。
BloomMapFile – 他在 MapFile 的基礎上增加了一個 /bloom 文件，包含的是二進制的過濾表，在每一次寫操作完成時，會更新這個過濾表

7.值得提一下binary stream with zero-compressed encoding

      
      
      
              
       
       
       
               
       
       
       
                
       
       
       
               /**
       
       
       
                * Serializes a long to a binary stream with zero-compressed encoding. * For -112 <= i <= 127, only one byte is used with the actual value. * For other values of i, the first byte value indicates whether the * long is positive or negative, and the number of bytes that follow. * If the first byte value v is between -113 and -120, the following long * is positive, with number of bytes that follow are -(v+112). * If the first byte value v is between -121 and -128, the following long * is negative, with number of bytes that follow are -(v+120). Bytes are * stored in the high-non-zero-byte-first order. * * 
       
       
       
               @param
       
       
       
                stream Binary output stream * 
       
       
       
               @param
       
       
       
                i Long to be serialized * 
       
       
       
               @throws
       
       
       
                java.io.IOException 
       
       
       
               */
       
       
       
                
       
       
       
               /*
       
       
       
                * 將一個long類型的i，寫入輸出流DataOutput中 * 如果 -112 <= i <= 127，只使用一個byte表示i並寫入輸出流中 * 第一個字節表示i的正負和接下來表示i的字節數 * 如果第一個字節-113 <= v <= -120，那么i是正數，並且接下來i占的字節數是-(v+112)（也就是1到8個字節之間） * 如果第一個字節-121 <= v <= -128，那么i是負數，並且接下來的i占的字節數是-(v+120)（也就是1到8個字節之間） * 寫入時先寫i的高位，再寫低位 * 
       
       
       
               */
       
       
       
                
       
       
       
               public
       
       
       
                
       
       
       
               static
       
       
       
                
       
       
       
               void
       
       
       
                writeVLong(DataOutput stream, 
       
       
       
               long
       
       
       
                i) 
       
       
       
               throws
       
       
       
                IOException { 
       
       
       
               if
       
       
       
                (i 
       
       
       
               >=
       
       
       
                
       
       
       
               -
       
       
       
               112
       
       
       
                
       
       
       
               &&
       
       
       
                i 
       
       
       
               <=
       
       
       
                
       
       
       
               127
       
       
       
               ) { stream.writeByte((
       
       
       
               byte
       
       
       
               )i); 
       
       
       
               return
       
       
       
               ; } 
       
       
       
               int
       
       
       
                len 
       
       
       
               =
       
       
       
                
       
       
       
               -
       
       
       
               112
       
       
       
               ; 
       
       
       
               if
       
       
       
                (i 
       
       
       
               <
       
       
       
                
       
       
       
               0
       
       
       
               ) { i 
       
       
       
               ^=
       
       
       
                
       
       
       
               -
       
       
       
               1L
       
       
       
               ; 
       
       
       
               //
       
       
       
                take one's complement' 
       
       
       
                
       
       
       
                len 
       
       
       
               =
       
       
       
                
       
       
       
               -
       
       
       
               120
       
       
       
               ; } 
       
       
       
               long
       
       
       
                tmp 
       
       
       
               =
       
       
       
                i; 
       
       
       
               while
       
       
       
                (tmp 
       
       
       
               !=
       
       
       
                
       
       
       
               0
       
       
       
               ) { tmp 
       
       
       
               =
       
       
       
                tmp 
       
       
       
               >>
       
       
       
                
       
       
       
               8
       
       
       
               ; len
       
       
       
               --
       
       
       
               ; } stream.writeByte((
       
       
       
               byte
       
       
       
               )len); len 
       
       
       
               =
       
       
       
                (len 
       
       
       
               <
       
       
       
                
       
       
       
               -
       
       
       
               120
       
       
       
               ) 
       
       
       
               ?
       
       
       
                
       
       
       
               -
       
       
       
               (len 
       
       
       
               +
       
       
       
                
       
       
       
               120
       
       
       
               ) : 
       
       
       
               -
       
       
       
               (len 
       
       
       
               +
       
       
       
                
       
       
       
               112
       
       
       
               ); 
       
       
       
               for
       
       
       
                (
       
       
       
               int
       
       
       
                idx 
       
       
       
               =
       
       
       
                len; idx 
       
       
       
               !=
       
       
       
                
       
       
       
               0
       
       
       
               ; idx
       
       
       
               --
       
       
       
               ) { 
       
       
       
               int
       
       
       
                shiftbits 
       
       
       
               =
       
       
       
                (idx 
       
       
       
               -
       
       
       
                
       
       
       
               1
       
       
       
               ) 
       
       
       
               *
       
       
       
                
       
       
       
               8
       
       
       
               ; 
       
       
       
               long
       
       
       
                mask 
       
       
       
               =
       
       
       
                
       
       
       
               0xFFL
       
       
       
                
       
       
       
               <<
       
       
       
                shiftbits; stream.writeByte((
       
       
       
               byte
       
       
       
               )((i 
       
       
       
               &
       
       
       
                mask) 
       
       
       
               >>
       
       
       
                shiftbits)); } }