Thrift序列化與反序列化

本文轉載自查看原文 2017-03-19 22:11 1639 [39]Open Source

Thrift序列化與反序列化的實現機制分析

Thrift是如何實現序死化與反序列化的，在IDL文件中，更改IDL文件中的變量序號或者［使用默認序號的情況下，新增變量時，將新增的變量不放在IDL文件的結尾，均會導致Thrift文件的反序列后無法做到向后兼容］，我們只有理解Thrift是如何實現序列化的，才能了解這種現象產生的原因，才能把代碼寫的更讓人放心

關於Thrift域的版本號的定義可以在http://thrift.apache.org/static/files/thrift-20070401.pdf這篇文章中找到說定義

 
                Versioning  
                in  
                Thrift  
                is  
                implemented via field identifiers. 
               
                The field header  
                for  
                every member of a  
                struct  
                in  
                Thrift  
                is 
               
                encoded with a unique field identifier. The combination of 
               
                this  
                field identifier and its type specifier  
                is  
                used to 
               
                uniquely identify the field. The Thrift definition language 
               
                supports automatic assignment of field identifiers, 
               
                but it  
                is  
                good programming practice to always explicitly 
               
                specify field identifiers.

翻譯過來，大概意思就是Thrift中每個域都有一個版本號，這個版本號是由屬性的數字序號＋屬性的類型來確定的

一個簡單的Thrift文件

 
                struct  
                Test { 
               
                1 : required i32 key; 
               
                2 : required  
                string  
                value; 
               
                }

執行

 
                thrift -gen java Test.thrift

將thrift文件轉換成java源文件，在此不列出詳細的源文件內容，只列出與序列化與反序列化相關的代碼

序列化，實際上就是write,如下所示

 
                //http://www.aiprograming.com/b/pengpeng/24<br>public void write(org.apache.thrift.protocol.TProtocol oprot, Test struct) throws org.apache.thrift.TException { 
               
                struct 
                .validate(); 
               
                oprot.writeStructBegin(STRUCT_DESC); 
               
                oprot.writeFieldBegin(KEY_FIELD_DESC); 
               
                oprot.writeI32( 
                struct 
                .key); 
               
                oprot.writeFieldEnd(); 
               
                if  
                ( 
                struct 
                .value !=  
                null 
                ) { 
               
                oprot.writeFieldBegin(VALUE_FIELD_DESC); 
               
                oprot.writeString( 
                struct 
                .value); 
               
                oprot.writeFieldEnd(); 
               
                } 
               
                oprot.writeFieldStop(); 
               
                oprot.writeStructEnd(); 
               
                }

struct.validate()主要用來校驗thrift文件中定義的required域即必傳的值是不是有值，沒有值就會拋出TProtocolException異常

 
                public  
                void  
                validate() throws org.apache.thrift.TException { 
               
                // check for required fields 
               
                // alas, we cannot check 'key' because it's a primitive and you chose the non-beans generator. 
               
                if  
                (value ==  
                null 
                ) { 
               
                throw  
                new  
                org.apache.thrift.protocol.TProtocolException( 
                "Required field 'value' was not present! Struct: "  
                + toString()); 
               
                } 
               
                }

oprot.writeStructBegin(STRUCT_DESC);STRUCT_DESC = new org.apache.thrift.protocol.TStruct("Test");即開始寫結構體的標識，在這里我們以TBinaryProtocol二進制的傳輸作為例子，TBinaryProtocol中writeStructBegin的實現如下

 
                public  
                void  
                writeStructBegin(TStruct  
                struct 
                ) { 
               
                }

即什么都沒有做,接下來oprot.writeFieldBegin(KEY_FIELD_DESC);中

KEY_FIELD_DESC = new org.apache.thrift.protocol.TField("key", org.apache.thrift.protocol.TType.I32, (short)1);

TBinaryProtocol中對應的實現如下

 
                 public  
                 void  
                 writeFieldBegin(TField field) throws TException { 
                
                 this 
                 .writeByte(field.type); 
                
                 this 
                 .writeI16(field.id); 
                
                 }

從上面的代碼中可以看出序列化的過程中寫入的是域的類型以及域的數字序號，從org.apache.thrift.protocol.TType中，我們也可以知道在thrift IDL支持的數據類型，如下所示

 
                 public  
                 final  
                 class  
                 TType { 
                
                 public  
                 static  
                 final  
                 byte  
                 STOP = 0; 
                
                 public  
                 static  
                 final  
                 byte  
                 VOID = 1; 
                
                 public  
                 static  
                 final  
                 byte  
                 BOOL = 2; 
                
                 public  
                 static  
                 final  
                 byte  
                 BYTE = 3; 
                
                 public  
                 static  
                 final  
                 byte  
                 DOUBLE = 4; 
                
                 public  
                 static  
                 final  
                 byte  
                 I16 = 6; 
                
                 public  
                 static  
                 final  
                 byte  
                 I32 = 8; 
                
                 public  
                 static  
                 final  
                 byte  
                 I64 = 10; 
                
                 public  
                 static  
                 final  
                 byte  
                 STRING = 11; 
                
                 public  
                 static  
                 final  
                 byte  
                 STRUCT = 12; 
                
                 public  
                 static  
                 final  
                 byte  
                 MAP = 13; 
                
                 public  
                 static  
                 final  
                 byte  
                 SET = 14; 
                
                 public  
                 static  
                 final  
                 byte  
                 LIST = 15; 
                
                 public  
                 static  
                 final  
                 byte  
                 ENUM = 16; 
                
                 public  
                 TType() { 
                
                 }

其中STOP用於序列化完所有的域后，寫入序死化文件，表示所有的域都序列化完成，接下來是oprot.writeI32(struct.key);這條語句就是寫入要序列化的int類型值，對應TBinaryProtocol的實現如下所示：

 
                 public  
                 void  
                 writeI32( 
                 int  
                 i32) throws TException { 
                
                 this 
                 .i32out[0] = ( 
                 byte 
                 )(255 & i32 >> 24); 
                
                 this 
                 .i32out[1] = ( 
                 byte 
                 )(255 & i32 >> 16); 
                
                 this 
                 .i32out[2] = ( 
                 byte 
                 )(255 & i32 >> 8); 
                
                 this 
                 .i32out[3] = ( 
                 byte 
                 )(255 & i32); 
                
                 this 
                 .trans_.write( 
                 this 
                 .i32out, 0, 4); 
                
                 }

大致意思就是將int轉換為byte數組，寫入下層的channel中，接下來就是oprot.writeFieldEnd();對應TBinaryProtocol的實現如下所示：

public void writeFieldEnd() {
}

接下來的這段代應就是序列化Test.thrift中定義的value,和上面的序列化過程基本類似，但是也有區別，在序列化string類型時，會先在序死化文件里寫入字符串的長度，然后再寫入字符串的值

 
                 if  
                 ( 
                 struct 
                 .value !=  
                 null 
                 ) { 
                
                 oprot.writeFieldBegin(VALUE_FIELD_DESC); 
                
                 oprot.writeString( 
                 struct 
                 .value); 
                
                 oprot.writeFieldEnd(); 
                
                 }

最后，會向序列化的文件里面寫入一個字節的0表示序列化結束，如下所示

 
                 public  
                 void  
                 writeFieldStop() throws TException { 
                
                 this 
                 .writeByte(( 
                 byte 
                 )0); 
                
                 }

從上面的序列化過程中，我們可以知道序列化后的文件里面只有域的類型以及域的數字序號，沒有域的名稱，因此與JSON/XML這種序列化工具相比，thrift序列化后生成的文件體積要小很多

有了序列化的生成過程，再來看看thrift是如何反序列化，就非常簡單了，反序列化的代碼如下所示

 
                 public  
                 void  
                 read(org.apache.thrift.protocol.TProtocol iprot, Test  
                 struct 
                 ) throws org.apache.thrift.TException { 
                
                 org.apache.thrift.protocol.TField schemeField; 
                
                 iprot.readStructBegin(); 
                
                 while  
                 ( 
                 true 
                 ) 
                
                 { 
                
                 schemeField = iprot.readFieldBegin(); 
                
                 if  
                 (schemeField.type == org.apache.thrift.protocol.TType.STOP) { 
                
                 break 
                 ; 
                
                 } 
                
                 switch  
                 (schemeField.id) { 
                
                 case  
                 1:  
                 // KEY 
                
                 if  
                 (schemeField.type == org.apache.thrift.protocol.TType.I32) { 
                
                 struct 
                 .key = iprot.readI32(); 
                
                 struct 
                 .setKeyIsSet( 
                 true 
                 ); 
                
                 }  
                 else  
                 { 
                
                 org.apache.thrift.protocol.TProtocolUtil.skip(iprot, schemeField.type); 
                
                 } 
                
                 break 
                 ; 
                
                 case  
                 2:  
                 // VALUE 
                
                 if  
                 (schemeField.type == org.apache.thrift.protocol.TType.STRING) { 
                
                 struct 
                 .value = iprot.readString(); 
                
                 struct 
                 .setValueIsSet( 
                 true 
                 ); 
                
                 }  
                 else  
                 { 
                
                 org.apache.thrift.protocol.TProtocolUtil.skip(iprot, schemeField.type); 
                
                 } 
                
                 break 
                 ; 
                
                 default 
                 : 
                
                 org.apache.thrift.protocol.TProtocolUtil.skip(iprot, schemeField.type); 
                
                 } 
                
                 iprot.readFieldEnd(); 
                
                 } 
                
                 iprot.readStructEnd(); 
                
                 // check for required fields of primitive type, which can't be checked in the validate method 
                
                 if  
                 (! 
                 struct 
                 .isSetKey()) { 
                
                 throw  
                 new  
                 org.apache.thrift.protocol.TProtocolException( 
                 "Required field 'key' was not found in serialized data! Struct: "  
                 + toString()); 
                
                 } 
                
                 struct 
                 .validate(); 
                
                 }

反序列化最為核心的代碼在while循環這里，schemeField是由域的類型type及域的數字序號id構成的一個類，如下所示

 
                 public  
                 class  
                 TField { 
                
                 public  
                 final String name; 
                
                 public  
                 final  
                 byte  
                 type; 
                
                 public  
                 final  
                 short  
                 id; 
                
                 public  
                 TField() { 
                
                 this 
                 ( 
                 "" 
                 , ( 
                 byte 
                 )0, ( 
                 short 
                 )0); 
                
                 } 
                
                 public  
                 TField(String n,  
                 byte  
                 t,  
                 short  
                 i) { 
                
                 this 
                 .name = n; 
                
                 this 
                 .type = t; 
                
                 this 
                 .id = i; 
                
                 } 
                
                 public  
                 String toString() { 
                
                 return  
                 "<TField name:\'"  
                 +  
                 this 
                 .name +  
                 "\' type:"  
                 +  
                 this 
                 .type +  
                 " field-id:"  
                 +  
                 this 
                 .id +  
                 ">" 
                 ; 
                
                 } 
                
                 public  
                 boolean  
                 equals 
                 (TField otherField) { 
                
                 return  
                 this 
                 .type == otherField.type &&  
                 this 
                 .id == otherField.id; 
                
                 } 
                
                 }

iprot.readFieldBegin();就是從序列化文件中構造一個TField類型的對象，TBinaryProtocol的實現如下所示，從下面的源代碼可以看出，首先讀取域的類型，然后讀取域的數字序號

 
                 public  
                 TField readFieldBegin() throws TException { 
                
                 byte  
                 type =  
                 this 
                 .readByte(); 
                
                 short  
                 id = type == 0?0: 
                 this 
                 .readI16(); 
                
                 return  
                 new  
                 TField( 
                 "" 
                 , type, id); 
                
                 }

構造完了TFiled對象之后，我們需要讀取域的值，看switch語句，也很容易理解，要讀取域的值，需要兩個前提

1.域的數字序號相同

2.域的類型相同

在滿足上面的兩個要求的前提下，再根據域的類型，調用相應的讀取方法，如果域的數字序號相同，但是域的類型不同，則會跳過給該域賦值，執行的代碼邏輯是

 
                 org.apache.thrift.protocol.TProtocolUtil.skip(iprot, schemeField.type);

最后，反序列化完成后，還要需檢查一下必傳的值是否已經傳了，調用下面這段代碼

 
                 struct 
                 .validate();

由反序列化的過程，可以知道，Thrift的反序列化，沒有用到java的反射技術，也沒有開設過多的內存空間，因此同JSON/XML相比，反序列化更快，更省內存，從反序列化的過程中，我們可以看到

Thrift的向后兼容性，需要滿足一定的條件

1.域的數字序號不能改變

2.域的類型不能改變

滿足了上面的兩點，無論你增加還是刪除域，都可以實現向后兼容，勿需擔心　

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Thrift序列化與反序列化的實現機制分析 Java對象的序列化與反序列化序列化和反序列化 SpringBoot序列化與反序列化 Json序列化和反序列化 c++ 序列化和反序列化 PHP 序列化與反序列化函數 Java 對象序列化和反序列化 Xlua Json 序列化反序列化 lua table序列化和反序列化