Thrift序列化與反序列化的實現機制分析
Thrift是如何實現序死化與反序列化的,在IDL文件中,更改IDL文件中的變量序號或者[使用默認序號的情況下,新增變量時,將新增的變量不放在IDL文件的結尾,均會導致Thrift文件的反序列后無法做到向后兼容],我們只有理解Thrift是如何實現序列化的,才能了解這種現象產生的原因,才能把代碼寫的更讓人放心
關於Thrift域的版本號的定義可以在http://thrift.apache.org/static/files/thrift-20070401.pdf這篇文章中找到說定義
1
2
3
4
5
6
7
8
|
Versioning
in
Thrift
is
implemented via field identifiers.
The field header
for
every member of a
struct
in
Thrift
is
encoded with a unique field identifier. The combination of
this
field identifier and its type specifier
is
used to
uniquely identify the field. The Thrift definition language
supports automatic assignment of field identifiers,
but it
is
good programming practice to always explicitly
specify field identifiers.
|
翻譯過來,大概意思就是Thrift中每個域都有一個版本號,這個版本號是由屬性的數字序號 + 屬性的類型來確定的
一個簡單的Thrift文件
1
2
3
4
|
struct
Test {
1 : required i32 key;
2 : required
string
value;
}
|
執行
1
|
thrift -gen java Test.thrift
|
將thrift文件轉換成java源文件,在此不列出詳細的源文件內容,只列出與序列化與反序列化相關的代碼
序列化,實際上就是write,如下所示
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
//http://www.aiprograming.com/b/pengpeng/24<br>public void write(org.apache.thrift.protocol.TProtocol oprot, Test struct) throws org.apache.thrift.TException {
struct
.validate();
oprot.writeStructBegin(STRUCT_DESC);
oprot.writeFieldBegin(KEY_FIELD_DESC);
oprot.writeI32(
struct
.key);
oprot.writeFieldEnd();
if
(
struct
.value !=
null
) {
oprot.writeFieldBegin(VALUE_FIELD_DESC);
oprot.writeString(
struct
.value);
oprot.writeFieldEnd();
}
oprot.writeFieldStop();
oprot.writeStructEnd();
}
|
struct.validate()主要用來校驗thrift文件中定義的required域即必傳的值是不是有值,沒有值就會拋出TProtocolException異常
1
2
3
4
5
6
7
|
public
void
validate() throws org.apache.thrift.TException {
// check for required fields
// alas, we cannot check 'key' because it's a primitive and you chose the non-beans generator.
if
(value ==
null
) {
throw
new
org.apache.thrift.protocol.TProtocolException(
"Required field 'value' was not present! Struct: "
+ toString());
}
}
|
oprot.writeStructBegin(STRUCT_DESC);STRUCT_DESC = new org.apache.thrift.protocol.TStruct("Test");即開始寫結構體的標識,在這里我們以TBinaryProtocol二進制 的傳輸作為例子,TBinaryProtocol中writeStructBegin的實現如下
1
2
|
public
void
writeStructBegin(TStruct
struct
) {
}
|
即什么都沒有做,接下來oprot.writeFieldBegin(KEY_FIELD_DESC);中
KEY_FIELD_DESC = new org.apache.thrift.protocol.TField("key", org.apache.thrift.protocol.TType.I32, (short)1);
TBinaryProtocol中對應的實現如下
1
2
3
4
|
public
void
writeFieldBegin(TField field) throws TException {
this
.writeByte(field.type);
this
.writeI16(field.id);
}
|
從上面的代碼中可以看出序列化的過程中寫入的是域的類型以及域的數字序號,從org.apache.thrift.protocol.TType中,我們也可以知道在thrift IDL支持的數據類型,如下所示
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
public
final
class
TType {
public
static
final
byte
STOP = 0;
public
static
final
byte
VOID = 1;
public
static
final
byte
BOOL = 2;
public
static
final
byte
BYTE = 3;
public
static
final
byte
DOUBLE = 4;
public
static
final
byte
I16 = 6;
public
static
final
byte
I32 = 8;
public
static
final
byte
I64 = 10;
public
static
final
byte
STRING = 11;
public
static
final
byte
STRUCT = 12;
public
static
final
byte
MAP = 13;
public
static
final
byte
SET = 14;
public
static
final
byte
LIST = 15;
public
static
final
byte
ENUM = 16;
public
TType() {
}
|
其中STOP用於序列化完所有的域后,寫入序死化文件,表示所有的域都序列化完成,接下來是oprot.writeI32(struct.key);這條語句就是寫入要序列化的int類型值,對應TBinaryProtocol的實現如下所示:
1
2
3
4
5
6
7
|
public
void
writeI32(
int
i32) throws TException {
this
.i32out[0] = (
byte
)(255 & i32 >> 24);
this
.i32out[1] = (
byte
)(255 & i32 >> 16);
this
.i32out[2] = (
byte
)(255 & i32 >> 8);
this
.i32out[3] = (
byte
)(255 & i32);
this
.trans_.write(
this
.i32out, 0, 4);
}
|
大致意思就是將int轉換為byte數組,寫入下層的channel中,接下來就是oprot.writeFieldEnd();對應TBinaryProtocol的實現如下所示:
public void writeFieldEnd() { }
接下來的這段代應就是序列化Test.thrift中定義的value,和上面的序列化過程基本類似,但是也有區別,在序列化string類型時,會先在序死化文件里寫入字符串的長度,然后再寫入字符串的值
1
2
3
4
5
|
if
(
struct
.value !=
null
) {
oprot.writeFieldBegin(VALUE_FIELD_DESC);
oprot.writeString(
struct
.value);
oprot.writeFieldEnd();
}
|
最后,會向序列化的文件里面寫入一個字節的0表示序列化結束,如下所示
1
2
3
|
public
void
writeFieldStop() throws TException {
this
.writeByte((
byte
)0);
}
|
從上面的序列化過程中,我們可以知道序列化后的文件里面只有域的類型以及域的數字序號,沒有域的名稱,因此與JSON/XML這種序列化工具相比,thrift序列化后生成的文件體積要小很多
有了序列化的生成過程,再來看看thrift是如何反序列化,就非常簡單了,反序列化的代碼如下所示
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
|
public
void
read(org.apache.thrift.protocol.TProtocol iprot, Test
struct
) throws org.apache.thrift.TException {
org.apache.thrift.protocol.TField schemeField;
iprot.readStructBegin();
while
(
true
)
{
schemeField = iprot.readFieldBegin();
if
(schemeField.type == org.apache.thrift.protocol.TType.STOP) {
break
;
}
switch
(schemeField.id) {
case
1:
// KEY
if
(schemeField.type == org.apache.thrift.protocol.TType.I32) {
struct
.key = iprot.readI32();
struct
.setKeyIsSet(
true
);
}
else
{
org.apache.thrift.protocol.TProtocolUtil.skip(iprot, schemeField.type);
}
break
;
case
2:
// VALUE
if
(schemeField.type == org.apache.thrift.protocol.TType.STRING) {
struct
.value = iprot.readString();
struct
.setValueIsSet(
true
);
}
else
{
org.apache.thrift.protocol.TProtocolUtil.skip(iprot, schemeField.type);
}
break
;
default
:
org.apache.thrift.protocol.TProtocolUtil.skip(iprot, schemeField.type);
}
iprot.readFieldEnd();
}
iprot.readStructEnd();
// check for required fields of primitive type, which can't be checked in the validate method
if
(!
struct
.isSetKey()) {
throw
new
org.apache.thrift.protocol.TProtocolException(
"Required field 'key' was not found in serialized data! Struct: "
+ toString());
}
struct
.validate();
}
|
反序列化最為核心的代碼在while循環這里,schemeField是由域的類型type及域的數字序號id構成的一個類,如下所示
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
|
public
class
TField {
public
final String name;
public
final
byte
type;
public
final
short
id;
public
TField() {
this
(
""
, (
byte
)0, (
short
)0);
}
public
TField(String n,
byte
t,
short
i) {
this
.name = n;
this
.type = t;
this
.id = i;
}
public
String toString() {
return
"<TField name:\'"
+
this
.name +
"\' type:"
+
this
.type +
" field-id:"
+
this
.id +
">"
;
}
public
boolean
equals
(TField otherField) {
return
this
.type == otherField.type &&
this
.id == otherField.id;
}
}
|
iprot.readFieldBegin();就是從序列化文件中構造一個TField類型的對象,TBinaryProtocol的實現如下所示,從下面的源代碼可以看出,首先讀取域的類型,然后讀取域的數字序號
1
2
3
4
5
|
public
TField readFieldBegin() throws TException {
byte
type =
this
.readByte();
short
id = type == 0?0:
this
.readI16();
return
new
TField(
""
, type, id);
}
|
構造完了TFiled對象之后,我們需要讀取域的值,看switch語句,也很容易理解,要讀取域的值,需要兩個前提
1.域的數字序號相同
2.域的類型相同
在滿足上面的兩個要求的前提下,再根據域的類型,調用相應的讀取方法,如果域的數字序號相同,但是域的類型不同,則會跳過給該域賦值,執行的代碼邏輯是
1
|
org.apache.thrift.protocol.TProtocolUtil.skip(iprot, schemeField.type);
|
最后,反序列化完成后,還要需檢查一下必傳的值是否已經傳了,調用下面這段代碼
1
|
struct
.validate();
|
由反序列化的過程,可以知道,Thrift的反序列化,沒有用到java的反射技術,也沒有開設過多的內存空間,因此同JSON/XML相比,反序列化更快,更省內存,從反序列化的過程中,我們可以看到
Thrift的向后兼容性,需要滿足一定的條件
1.域的數字序號不能改變
2.域的類型不能改變
滿足了上面的兩點,無論你增加還是刪除域,都可以實現向后兼容,勿需擔心