1.WritableComparable
查看HadoopAPI,如圖所示:
WritableComparable繼承自Writable和java.lang.Comparable接口,是一個Writable也是一個Comparable,也就是說,既可以序列化,也可以比較!
再看看它的實現類,發現BooleanWritable, BytesWritable, ByteWritable, DoubleWritable, FloatWritable, IntWritable, LongWritable, MD5Hash, NullWritable, Record, RecordTypeInfo, Text, VIntWritable, VLongWritable都實現了WritableComparable類!
WritableComparable的實現類之間相互來比較,在Map/Reduce中,任何用作鍵來使用的類都應該實現WritableComparable接口!
Example:
1 package cn.roboson.writable; 2 3 import java.io.DataInput; 4 import java.io.DataOutput; 5 import java.io.IOException; 6 7 import org.apache.hadoop.io.WritableComparable; 8 9 /** 10 * 1.自定義一個類,繼承WritableComparable 11 * 2.發現有三個未實現的方法,兩個是Writable接口的(序列化),一個是Comparable接口的(用來比較) 12 * 3.自定義比較,這里以counter來作為比較 13 * @author roboson 14 * 15 */ 16 public class MyWritableComparable implements WritableComparable<MyWritableComparable>{ 17 18 private int counter; 19 private long timestamp; 20 public MyWritableComparable() { 21 // TODO Auto-generated constructor stub 22 } 23 24 public MyWritableComparable(int counter,long timestamp) { 25 // TODO Auto-generated constructor stub 26 this.counter = counter; 27 this.timestamp = timestamp; 28 } 29 30 @Override 31 public void readFields(DataInput in) throws IOException { 32 // TODO Auto-generated method stub 33 34 //將輸入流中的字節流數據轉化為結構化數據 35 counter = in.readInt(); 36 timestamp = in.readLong(); 37 } 38 39 @Override 40 public void write(DataOutput out) throws IOException { 41 // TODO Auto-generated method stub 42 43 //講結構化數據寫入輸出流 44 out.writeInt(counter); 45 out.writeLong(timestamp); 46 } 47 48 @Override 49 public int compareTo(MyWritableComparable other) { 50 // TODO Auto-generated method stub 51 int thisValue = this.counter; 52 int otherValue = other.counter; 53 return (thisValue < otherValue ? -1 : (thisValue == otherValue ? 0 : 1)); 54 } 55 56 public int getCounter() { 57 return counter; 58 } 59 60 public void setCounter(int counter) { 61 this.counter = counter; 62 } 63 64 public long getTimestamp() { 65 return timestamp; 66 } 67 68 public void setTimestamp(long timestamp) { 69 this.timestamp = timestamp; 70 } 71 72 73 public static void main(String[] args) { 74 MyWritableComparable comparable = new MyWritableComparable(3,4); 75 MyWritableComparable otherComparable = new MyWritableComparable(4, 5); 76 int value = comparable.compareTo(otherComparable); 77 if(value==-1){ 78 System.out.println("comparable<otherComparable"); 79 }else if(value==0){ 80 System.out.println("comparable=otherComparable"); 81 }else{ 82 System.out.println("comparable>otherComparable"); 83 } 84 } 85 }
運行結果:
2.RawComparator
對於MapReduce來說,因為中間有個基於鍵的排序階段,所以類型的比較是非常重要的。Hadoop中提供了原生的比較接口RawComparator,該接口繼承子Java Comparator接口。RawComparator接口允許其實現直接比較數據流中的記錄,無需先把數據流飯序列化為對象,這樣便避免了新建對象的額外開銷。
1 package org.apache.hadoop.io; 2 3 import java.util.Comparator; 4 5 public interface RawComparator<T> extends Comparator<T>{ 6 7 //自己的方法 8 public int compare(byte[] b1, int s1, int l1, byte[] b2,int s2, int l2); 9 10 //繼承自Comparator的方法 11 @Override 12 public int compare(T o1, T o2); 13 14 @Override 15 public boolean equals(Object obj); 16 }
查看HadoopAPI:
該類並非被多數的衍生類所實現,其具體的子類為WritableComparator,多數情況下是作為實現Writable接口的類的內置類,提供序列化字節的比較。如下圖說所示:BooleanWritable, BytesWritable, ByteWritable, org.apache.hadoop.io.serializer.DeserializerComparator, DoubleWritable, FloatWritable, IntWritable, JavaSerializationComparator, LongWritable, LongWritable, MD5Hash, NullWritable, RecordComparator, Text, UTF8,都實現了RawComparator,作為其內部類。
而WritableComparator則是其的具體子類。
3.WritableComparator
在《Hadoop權威指南》中,說到這兒,很模糊,只說WritableComparator是對繼承自WritableComparable類的RawCompartor類的一個通用實現。讓人看着很迷惑,這句話什么意思呢?
首先、在第二個小標題RawComparator中,我門都知道WritableComparator實現了RawComparator這個接口,也就是說,WritableComparator是RawComparator的實現。
其次、是對繼承自WritableComparable類的RawComparator的一個通用實現。那么繼承自WritableComparable類的RawComparator都有哪些呢?也就是說那些類,繼承自WritableComparator,並且實現了RawComparator?在第二個小標題RawComparator中有也都說明清楚了,上面的紅色部分!同理,實現了WritableComparable類的在第一個小標題WritableComparable中也有說明,紅色部分字體!也就誰說WritableComparator是對BooleanWritable.Comparator, BytesWritable.Comparator, ByteWritable.Comparator, DoubleWritable.Comparator, FloatWritable.Comparator, IntWritable.Comparator, LongWritable.Comparator, MD5Hash.Comparator, NullWritable.Comparator, RecordComparator, Text.Comparator, UTF8.Comparator這些類的一個通用實現!這句話就引出了WritableComparator的兩個功能:第一,它提供了對原始compare()方法的一個默認實現。該方法能夠飯序列化將流中進行比較的對象,並調用對象的compara()方法。第二,它充當的是RawComparator實例的工廠(已注冊Writable的實現)。例如,為了獲得IntWratable的comparator,我們直接如下調用:
RawComparator<IntWritable> comparator = WritableComparator.get(IntWratable.class);
再來看看WritableComparator這個類是如何定義的,如下圖所示:
WritableComparator類類似於一個注冊表,里面記錄了所有Comparator類的集合。Comparators成員用一張Hash表記錄Key=Class,value=WritableComprator的注冊信息.這就是它能夠充當RawComparator實例工廠的原因!因為它本省的實現中有意個HashMap集合,HashMap<Class,WritableComparator>根據對應的Class,就能返回一個響應的WritableComparator!
Example:
1 package cn.roboson.writable; 2 3 import java.io.ByteArrayInputStream; 4 import java.io.ByteArrayOutputStream; 5 import java.io.DataInputStream; 6 import java.io.DataOutputStream; 7 import java.io.IOException; 8 9 import org.apache.hadoop.io.IntWritable; 10 import org.apache.hadoop.io.RawComparator; 11 import org.apache.hadoop.io.Writable; 12 import org.apache.hadoop.io.WritableComparator; 13 14 /** 15 * 1.通過WritableComparator獲得IntWritable類的RawComparator實例 16 * 2.通過兩種方式來比較 17 * @author roboson 18 * 19 */ 20 21 public class ComparableFinish { 22 23 public static void main(String[] args) throws IOException { 24 25 //創建兩個IntWritable來比較 26 IntWritable writable1 = new IntWritable(163); 27 IntWritable writable2 = new IntWritable(165); 28 29 //獲得IntWritable的RawComparator實例 30 RawComparator<IntWritable> intRawComparator = WritableComparator.get(IntWritable.class); 31 32 //直接比較對象 33 int value1 =intRawComparator.compare(writable1, writable2); 34 35 if(value1==-1){ 36 System.out.println("writable1<writable2"); 37 }else if(value1==0){ 38 System.out.println("writable1=writable2"); 39 }else{ 40 System.out.println("writable1>writable2"); 41 } 42 43 //序列化兩個對象,獲得其字節流 44 byte[] byte1 = serizlize(writable1); 45 byte[] byte2 = serizlize(writable2); 46 47 //直接通過字符流比較大小 48 int value2 = intRawComparator.compare(byte1, 0, 4, byte2, 0, 4); 49 if(value2==-1){ 50 System.out.println("writable1<writable2"); 51 }else if(value2==0){ 52 System.out.println("writable1=writable2"); 53 }else{ 54 System.out.println("writable1>writable2"); 55 } 56 } 57 58 public static byte[] serizlize(Writable writable) throws IOException{ 59 60 //創建一個輸出字節流對象 61 ByteArrayOutputStream out = new ByteArrayOutputStream(); 62 DataOutputStream dataout = new DataOutputStream(out); 63 64 //將結構化數據的對象writable寫入到輸出字節流。 65 writable.write(dataout); 66 return out.toByteArray(); 67 } 68 69 public static byte[] deserizlize(Writable writable,byte[] bytes) throws IOException{ 70 71 //創建一個輸入字節流對象,將字節數組中的數據,寫入到輸入流中 72 ByteArrayInputStream in = new ByteArrayInputStream(bytes); 73 DataInputStream datain = new DataInputStream(in); 74 75 //將輸入流中的字節流數據反序列化 76 writable.readFields(datain); 77 return bytes; 78 79 } 80 }
運行結果:
關於序列化方面的知識,可以參考我的博客《Hadoop序列化》地址如下:
http://www.cnblogs.com/robert-blue/p/4157768.html
參考博文:
http://blog.csdn.net/keda8997110/article/details/8518255
http://www.360doc.com/content/12/0827/09/9318309_232551844.shtml