Hadoop中WritableComparable 和 comparator


1.WritableComparable

查看HadoopAPI,如圖所示:

WritableComparable繼承自Writable和java.lang.Comparable接口,是一個Writable也是一個Comparable,也就是說,既可以序列化,也可以比較!

再看看它的實現類,發現BooleanWritable, BytesWritable, ByteWritable, DoubleWritable, FloatWritable, IntWritable, LongWritable, MD5Hash, NullWritable, Record, RecordTypeInfo, Text, VIntWritable, VLongWritable都實現了WritableComparable類!

WritableComparable的實現類之間相互來比較,在Map/Reduce中,任何用作鍵來使用的類都應該實現WritableComparable接口!

 

Example:

 1 package cn.roboson.writable;
 2 
 3 import java.io.DataInput;
 4 import java.io.DataOutput;
 5 import java.io.IOException;
 6 
 7 import org.apache.hadoop.io.WritableComparable;
 8 
 9 /**
10  * 1.自定義一個類,繼承WritableComparable
11  * 2.發現有三個未實現的方法,兩個是Writable接口的(序列化),一個是Comparable接口的(用來比較)
12  * 3.自定義比較,這里以counter來作為比較
13  * @author roboson
14  *
15  */
16 public class MyWritableComparable implements WritableComparable<MyWritableComparable>{
17     
18     private int counter;
19     private long timestamp;
20     public MyWritableComparable() {
21         // TODO Auto-generated constructor stub
22     }
23 
24     public MyWritableComparable(int counter,long timestamp) {
25         // TODO Auto-generated constructor stub
26         this.counter = counter;
27         this.timestamp = timestamp;
28     }
29     
30     @Override
31     public void readFields(DataInput in) throws IOException {
32         // TODO Auto-generated method stub
33         
34         //將輸入流中的字節流數據轉化為結構化數據
35         counter = in.readInt();
36         timestamp = in.readLong();
37     }
38 
39     @Override
40     public void write(DataOutput out) throws IOException {
41         // TODO Auto-generated method stub
42         
43         //講結構化數據寫入輸出流
44         out.writeInt(counter);
45         out.writeLong(timestamp);
46     }
47 
48     @Override
49     public int compareTo(MyWritableComparable other) {
50         // TODO Auto-generated method stub
51         int thisValue = this.counter;
52         int otherValue = other.counter;
53         return (thisValue < otherValue ? -1 : (thisValue == otherValue ? 0 : 1));
54     }
55 
56     public int getCounter() {
57         return counter;
58     }
59 
60     public void setCounter(int counter) {
61         this.counter = counter;
62     }
63 
64     public long getTimestamp() {
65         return timestamp;
66     }
67 
68     public void setTimestamp(long timestamp) {
69         this.timestamp = timestamp;
70     }
71     
72 
73     public static void main(String[] args) {
74         MyWritableComparable comparable = new MyWritableComparable(3,4);
75         MyWritableComparable otherComparable = new MyWritableComparable(4, 5);
76         int value = comparable.compareTo(otherComparable);
77         if(value==-1){
78             System.out.println("comparable<otherComparable");
79         }else if(value==0){
80             System.out.println("comparable=otherComparable");
81         }else{
82             System.out.println("comparable>otherComparable");
83         }
84     }
85 }

 

運行結果:

 

2.RawComparator

對於MapReduce來說,因為中間有個基於鍵的排序階段,所以類型的比較是非常重要的。Hadoop中提供了原生的比較接口RawComparator,該接口繼承子Java Comparator接口。RawComparator接口允許其實現直接比較數據流中的記錄,無需先把數據流飯序列化為對象,這樣便避免了新建對象的額外開銷。

 1 package org.apache.hadoop.io;
 2 
 3 import java.util.Comparator;
 4 
 5 public interface RawComparator<T> extends Comparator<T>{
 6     
 7     //自己的方法
 8     public int compare(byte[] b1, int s1, int l1, byte[] b2,int s2, int l2);
 9     
10     //繼承自Comparator的方法
11     @Override
12     public int compare(T o1, T o2);
13     
14     @Override
15     public boolean equals(Object obj);
16 }

 

查看HadoopAPI:

該類並非被多數的衍生類所實現,其具體的子類為WritableComparator,多數情況下是作為實現Writable接口的類的內置類,提供序列化字節的比較。如下圖說所示:BooleanWritable, BytesWritable, ByteWritable, org.apache.hadoop.io.serializer.DeserializerComparator, DoubleWritable, FloatWritable, IntWritable, JavaSerializationComparator, LongWritable, LongWritable, MD5Hash, NullWritable, RecordComparator, Text, UTF8,都實現了RawComparator,作為其內部類。

而WritableComparator則是其的具體子類。

3.WritableComparator

在《Hadoop權威指南》中,說到這兒,很模糊,只說WritableComparator是對繼承自WritableComparable類的RawCompartor類的一個通用實現。讓人看着很迷惑,這句話什么意思呢?

首先、在第二個小標題RawComparator中,我門都知道WritableComparator實現了RawComparator這個接口,也就是說,WritableComparator是RawComparator的實現。

其次、是對繼承自WritableComparable類的RawComparator的一個通用實現。那么繼承自WritableComparable類的RawComparator都有哪些呢?也就是說那些類,繼承自WritableComparator,並且實現了RawComparator?在第二個小標題RawComparator中有也都說明清楚了,上面的紅色部分!同理,實現了WritableComparable類的在第一個小標題WritableComparable中也有說明,紅色部分字體!也就誰說WritableComparator是對BooleanWritable.Comparator, BytesWritable.Comparator, ByteWritable.Comparator, DoubleWritable.Comparator, FloatWritable.Comparator, IntWritable.Comparator, LongWritable.Comparator, MD5Hash.Comparator, NullWritable.Comparator, RecordComparator, Text.Comparator, UTF8.Comparator這些類的一個通用實現!這句話就引出了WritableComparator的兩個功能:第一,它提供了對原始compare()方法的一個默認實現。該方法能夠飯序列化將流中進行比較的對象,並調用對象的compara()方法。第二,它充當的是RawComparator實例的工廠(已注冊Writable的實現)。例如,為了獲得IntWratable的comparator,我們直接如下調用:

RawComparator<IntWritable> comparator = WritableComparator.get(IntWratable.class);

再來看看WritableComparator這個類是如何定義的,如下圖所示:

WritableComparator類類似於一個注冊表,里面記錄了所有Comparator類的集合。Comparators成員用一張Hash表記錄Key=Classvalue=WritableComprator的注冊信息.這就是它能夠充當RawComparator實例工廠的原因!因為它本省的實現中有意個HashMap集合,HashMap<Class,WritableComparator>根據對應的Class,就能返回一個響應的WritableComparator!

 

Example:

 1 package cn.roboson.writable;
 2 
 3 import java.io.ByteArrayInputStream;
 4 import java.io.ByteArrayOutputStream;
 5 import java.io.DataInputStream;
 6 import java.io.DataOutputStream;
 7 import java.io.IOException;
 8 
 9 import org.apache.hadoop.io.IntWritable;
10 import org.apache.hadoop.io.RawComparator;
11 import org.apache.hadoop.io.Writable;
12 import org.apache.hadoop.io.WritableComparator;
13 
14 /**
15  * 1.通過WritableComparator獲得IntWritable類的RawComparator實例
16  * 2.通過兩種方式來比較
17  * @author roboson
18  *
19  */
20 
21 public class ComparableFinish {
22     
23     public static void main(String[] args) throws IOException {
24         
25         //創建兩個IntWritable來比較
26         IntWritable writable1 = new IntWritable(163);
27         IntWritable writable2 = new IntWritable(165);
28         
29         //獲得IntWritable的RawComparator實例
30         RawComparator<IntWritable> intRawComparator = WritableComparator.get(IntWritable.class);
31         
32         //直接比較對象
33         int value1 =intRawComparator.compare(writable1, writable2);
34         
35         if(value1==-1){
36             System.out.println("writable1<writable2");
37         }else if(value1==0){
38             System.out.println("writable1=writable2");
39         }else{
40             System.out.println("writable1>writable2");
41         }
42         
43         //序列化兩個對象,獲得其字節流
44         byte[] byte1 = serizlize(writable1);
45         byte[] byte2 = serizlize(writable2);
46         
47         //直接通過字符流比較大小
48         int value2 = intRawComparator.compare(byte1, 0, 4, byte2, 0, 4);
49         if(value2==-1){
50             System.out.println("writable1<writable2");
51         }else if(value2==0){
52             System.out.println("writable1=writable2");
53         }else{
54             System.out.println("writable1>writable2");
55         }
56     }
57     
58     public static byte[] serizlize(Writable writable) throws IOException{
59         
60         //創建一個輸出字節流對象
61         ByteArrayOutputStream out = new ByteArrayOutputStream();
62         DataOutputStream dataout = new DataOutputStream(out);
63         
64         //將結構化數據的對象writable寫入到輸出字節流。
65         writable.write(dataout);
66         return out.toByteArray();
67     }
68     
69     public static byte[] deserizlize(Writable writable,byte[] bytes) throws IOException{
70         
71         //創建一個輸入字節流對象,將字節數組中的數據,寫入到輸入流中
72         ByteArrayInputStream in = new ByteArrayInputStream(bytes);
73         DataInputStream datain = new DataInputStream(in);
74         
75         //將輸入流中的字節流數據反序列化
76         writable.readFields(datain);
77         return bytes;
78         
79     }
80 }

 

運行結果:

關於序列化方面的知識,可以參考我的博客《Hadoop序列化》地址如下:

http://www.cnblogs.com/robert-blue/p/4157768.html

參考博文:

http://blog.csdn.net/keda8997110/article/details/8518255

http://www.360doc.com/content/12/0827/09/9318309_232551844.shtml


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM