RPC綜述 - PB, Thrift, Avro

本文轉載自查看原文 2013-05-16 17:25 7574 RPC

Apache Avro 與 Thrift 比較, http://www.tbdata.org/archives/1307

Thrift vs. Protocol Buffers, http://stuartsierra.com/2008/07/10/thrift-vs-protocol-buffers

Thrift vs Protocol Buffers vs Avro - Biased Comparison – SlideShare

Schema evolution in Avro, Protocol Buffers and Thrift

Protocol Buffers

http://code.google.com/p/protobuf/

https://developers.google.com/protocol-buffers/docs/overview

RPC問題

單純的看待這個問題就是序列化和反序列化的問題

更復雜的問題是RPC問題, 在跨平台和跨語言的情況下, 模塊之間的交互和調用(Transparent interaction between multiple programming languages)

這是個久遠的問題, COM, Corba……
1. 序列化問題, 怎么樣將類對象或其他數據轉化為用於傳輸的通用的格式, 如二進制, 文本, xml
2. 數據類型問題, 不同語言的數據類型的差異
3. 方法調用問題, 不同語言的方法調用的差異

簡單的思路, 發送端實現序列化模塊, 將對象轉化為二進制數據, 然后在接收端實現反序列化模塊, 解析二進制, 並恢復成對象
對於數據類型問題, 實現不同語言間的匹配, 比如C++對象, Java對象, C結構...
對於RPC, 也需要解決方法調用差異, 比如在Java中調用RPC, 而服務端為C++
並且對於不同的RPC Call都需要實現不同的發送端和接受端的代碼……
用戶使用的時候相當的復雜...

當然Corba提出的IDL, 可以部分解決這個問題, 先抽象出所有語言中的共同的部分, 並定義抽象的接口描述語言
用戶只需要用用IDL來描述需要傳輸的數據類型和需要調用的接口, 由corba引擎來完成其余的對各種語言的轉化
Corba由於過於龐大和復雜, 一直停留在學術階段...

接着有一種新的思路的產生, web service
不需要提供各種不同的序列化和反序列化模塊, 而是提供一種通用的, 機器可理解的文本語言, XML. Soup協議...
風靡一時, 這種思路確實從另一個側面解決了這個問題
后來的基於http協議的面向restful service編程, 也是類似的思路, 只不過角度不同, 使操作類型極簡化...

當然當大數據時代來臨的時候, 大家發現基於XML, 甚至Json的文本協議的方案的傳輸效率很成問題
所以Google和Facebook, 又開始研究基於二進制的RPC方案, 於是產生PB, Thrift, Avro, 其實本質和理論上也是來源於corba

下面列出各種之前的方案的問題,

•SOAP

    XML, XML and more XML. Do we really need to parse so much XML?

•CORBA

    Amazing idea, horrible execution

    Overdesigned and heavyweight

•DCOM, COM+

    Embraced mainly in windows client software

•HTTP/JSON/XML/Whatever

    Okay, proven – hurray!

    But lack protocol description.

    You have to maintain both client and server code.

    You still have to write your own wrapper to the protocol.

    XML has high parsing overhead.

    (relatively) expensive to process; large due to repeated tags

Thrift vs Protocol Buffers vs Avro

首先這三種方案是有共性的, 也就是可以解決上述之前方案帶來的問題

Interface Description (IDL), 使用IDL並支持代碼生成
Performance, 高效率
Versioning, 對不同版本和schema演化很好的支持
Binary Format, 使用Binary作為傳輸格式

關於3種方案的二進制編碼協議, 以及如何應對schema evolution, 參考下面的Blog

Schema evolution in Avro, Protocol Buffers and Thrift

Thrift vs Protocol Buffers

總體比較

Overall, I think Thrift wins on features and Protocol Buffers win on documentation. Implementation-wise, they’re quite similar.
The major difference is that Thrift provides a full client/server RPC implementation, whereas Protocol Buffers only generate stubs to use in your own RPC system.

比較經典的評價, 兩者非常相似, Thrift勝在功能, 而PB勝在文檔...
功能還是要大於文檔的, 所以Thrift使用的人更多...