- 介紹
- 概述
- 預備知識
- 協議
- Protocol Primitive Types
- Notes on reading the request format grammars
- Common Request and Response Structure
- The APIs
- Constants
- Some Common Philosophical Questions
該文檔覆蓋Kafka 0.8及后續版本所實現的協議,旨在提供一份可讀性強的指導手冊,包括可用請求、二進制格式和利用協議實現客戶端等內容。該文檔假設你理解此處描述的基本設計原則和術語。
0.7及之前版本的協議與當前版本類似,但我們選擇一次性打破兼容性,來去除一些令人討厭的東西以及產生新東西。
部分術語
metadata:元數據
broker:[暫無好用翻譯]
message:消息
offset:位移
topic:主題
group:群組
partition:分區
Kafka協議相當簡單,只有六個客戶端請求API:
- Metadata – 描述當前可用brokers,以及它們的host和port,並提供哪個broker有哪些partitions的信息。
- Send – 發送消息至broker。
- Fetch – 從broker獲取消息,包括獲取數據、獲取集群的metadata(元數據)、獲取offset(位移)信息。
- Offsets – 獲取指定topic中partition的可用位移信息。
- Offset Commit – 提交一個消費者group的位移集合。
- Offset Fetch – 獲取一個消費者group的位移集合。
每項內容會在之后詳細描述。
Kafka使用基於TCP的二進制協議。協議定義了所有請求/響應消息對的API。所有消息有大小限定,並由以下主要類型組成。
客戶端初始化一個socket連接,之后寫入一系列請求消息,並讀取相應的響應消息。在連接和斷開連接時不需要執行握手。TCP樂於讓你為許多請求保持長連接,並分攤TCP握手的開銷。即便不考慮這些,TCP的開銷也相當小。
由於數據分區的原因,客戶端需要保持對多個brokers的連接,與包含數據的服務端保持通信。盡管如此,一個客戶端實例一般沒有必要保持對單個broker的多連接(如連接池)。
服務端保證在單個TCP連接上,請求會按照發送時的順序來執行,返回時的順序也一樣。broker的請求處理模塊對於一個連接只允許一個請求在處理中,以此來保證處理的順序性。注意,客戶端可以(理想上應該)使用非阻塞IO來實現請求管道及獲取高吞吐量。例如,客戶端可以在等待之前請求的響應結果時繼續發送請求,因為未處理請求會緩存在底層操作系統的socket緩沖區中。除非有特別說明,否則所有的請求由客戶端初始化,並從服務器獲取響應結果消息。
服務器端可配置請求大小的最大限制,超過限制則導致socket重連。
Kafka是一種分區系統,因此並非所有的服務器都包含完整的數據集。topics被分割到預定義數目P的分區中,每個分區復制為N份拷貝。topic分區以0, 1, …提交日志順序[?翻譯有些不通順]。
分區特性在所有系統上都有數據如何分配分區的問題。Kafka客戶端直接控制分區分配,brokers沒有何種消息該發布到哪個分區的特定語義。客戶端發布消息時直接提交消息到指定分區,獲取消息時直接從指定分區獲取。若兩個客戶端想使用相同的分區模式,則它們必須有同樣的方法來計算key到partition的映射。
這些發布和獲取數據的請求必須發送到當前作為指定分區leader的broker上。該條件只broker執行,所以對特定分區的請求若發送到錯誤的broker上,會導致返回NotLeaderForPartition錯誤碼(后詳述)。
客戶端如何找出哪些topics存在、包含哪些分區、這些分區在哪些brokers上,從而發送請求到正確的主機上。這個信息是動態的,所以不能以靜態映射文件形式配置在客戶端。Kafka的所有brokers可以響應metadata請求,以及描述集群的當前狀態。
換句話說,客戶端需要找到其中一個broker,該broker會告知客戶端其他存在的brokers以及它們持有的分區。第一個broker可能無效,因此客戶端的實現需要包括兩或三個引導啟動的URL。用戶可以選擇在客戶端使用負載均衡器或只是靜態配置兩或三個kafka的hosts。
客戶端不需要保持輪詢來檢測集群是否有變更,它可以在初始化的時候獲取並緩存metadata,直到它收到標示metadata過期的錯誤。這個錯誤有兩種形式:
1) socket錯誤,表示客戶端不能與特定的broker通信。
2) 該broker不再持有所請求的數據
- 循環訪問kafka”引導”地址,直到找到可以連接的一個。獲取集群metadata。
- 處理獲取和生產請求,發送到指定topic/partitions相應的broker
- 若產生相應的錯誤,刷新metadata並重試。
上面提到消息分配分區由生產者客戶端來控制。這意味着,如何將該功能展示給終端用戶是一個問題。
Kafka分區實際上有兩個目的:
- 在brokers之間均衡數據和請求負載It balances data and request load over brokers
- 在分區中允許本地狀態和維持順序,從而在消費者進程間分攤處理。稱之為語義分區。
對於給定的應用場景,你可能只關注其中一個或者全部。
為了實現簡單的負載均衡,一個簡單的客戶端實現方法是,對於所有brokers的請求采用Round-Robin方式。在生產者producers遠多於brokers的環境中,可以采用單個客戶端隨機選擇一個partition發布消息的方式作為另一個選擇。后一個策略的TCP連接更少。
語義分區的意思是使用消息中的某種key來分配消息到不同分區。比如,如果你在處理點擊消息流,你可能想通過用戶ID來分區,從而使一個特定用戶的所有數據進入一個消費者。為了滿足這個需求,客戶端可以采用與消息相關的某個key,並應用哈希算法來選擇分區。
我們的API鼓勵對批量小數據放在一起進行批處理,以提升效率。我們發現批處理可以獲取很大的性能提升。發送和獲取消息的API都鼓勵操作批量消息而非單條消息。智能的客戶端可以利用這一點,並支持“異步”模式,將單個發送的消息分批處理,以更大的數據塊發送數據。更進一步的說,允許跨多個topics和partitions的批處理,因此生產消息請求可能包括多個分區的數據,獲取消息請求可能一次性從多個分區拉取數據。
客戶端實現也可以選擇一次發送一條消息,從而忽略這個特性。
協議設計需要能夠在向后兼容模式上進行增量式進化[?翻譯不順]。版本控制建立在每個API的基礎上,每個版本包含一個請求和響應對。每個請求包含一個API key來標示調用的API,以及一個版本號來標示請求的格式及期望的響應格式。
這樣做的目的是,客戶端可以實現協議的特定版本,並在請求中標示該版本。我們的目標是,在不允許停機以及客戶端和服務器不能一次性全部變更的情況下,允許API的進化。
服務端拒絕不支持的版本請求,並且響應給客戶端,告知基於該請求版本的期望協議格式。期望的升級方式是,新的特性先在服務端鋪開(舊客戶端不會使用它們),當新客戶端布署時,這些新特性逐漸得到使用。
當前所有版本的基數值為0,我們引入這些API時,會指出每個版本各自的格式。
Protocol Primitive Types
The protocol is built out of the following primitive types.
Fixed Width Primitives
int8, int16, int32, int64 - Signed integers with the given precision (in bits) stored in big endian order.
Variable Length Primitives
bytes, string - These types consist of a signed integer giving a length N followed by N bytes of content. A length of -1 indicates null. string uses an int16 for its size, and bytes uses an int32.
Arrays
This is a notation for handling repeated structures. These will always be encoded as an int32 size containing the length N followed by N repetitions of the structure which can itself be made up of other primitive types. In the BNF grammars below we will show an array of a structure foo as [foo].
Optional entries
Certain fields may be present under certain conditions (say if there is no error). These are represented as |foo|.
Notes on reading the request format grammars
The BNFs below give an exact context free grammar for the request and response binary format. For each API I will give the request and response together followed by all the sub-definitions. The BNF is intentionally not compact in order to give human-readable name (for example I define a production for ErrorCode even though it is just an int16 in order to give it a symbolic name). As always in a BNF a sequence of productions indicates concatenation, so the MetadataRequest given below would be a sequence of bytes containing first a VersionId, then a ClientId, and then an array of TopicNames (each of which has its own definition). Productions are always given in camel case and primitive types in lower case. When there are multiple possible productions these are separated with '|' and may be enclosed in parenthesis for grouping. The top-level definition is always given first and subsequent sub-parts are indented.
Common Request and Response Structure
All requests and responses originate from the following grammar which will be incrementally describe through the rest of this document:
| RequestOrResponse => Size (RequestMessage | ResponseMessage) Size => int32 |
|
| Field |
Description |
| MessageSize |
The MessageSize field gives the size of the subsequent request or response message in bytes. The client can read requests by first reading this 4 byte size as an integer N, and then reading and parsing the subsequent N bytes of the request. |
Requests
Requests all have the following format:
| RequestMessage => ApiKey ApiVersion CorrelationId ClientId RequestMessage ApiKey => int16 ApiVersion => int16 CorrelationId => int32 ClientId => string RequestMessage => MetadataRequest | ProduceRequest | FetchRequest | OffsetRequest | OffsetCommitRequest | OffsetFetchRequest |
|
| Field |
Description |
| ApiKey |
This is a numeric id for the API being invoked (i.e. is it a metadata request, a produce request, a fetch request, etc). |
| ApiVersion |
This is a numeric version number for this api. We version each API and this version number allows the server to properly interpret the request as the protocol evolves. Responses will always be in the format corresponding to the request version. Currently the supported version for all APIs is 0. |
| CorrelationId |
This is a user-supplied integer. It will be passed back in the response by the server, unmodified. It is useful for matching request and response between the client and server. |
| ClientId |
This is a user supplied identifier for the client application. The user can use any identifier they like and it will be used when logging errors, monitoring aggregates, etc. For example, one might want to monitor not just the requests per second overall, but the number coming from each client application (each of which could reside on multiple servers). This id acts as a logical grouping across all requests from a particular client. |
The various request and response messages will be described below.
Responses
| Response => CorrelationId ResponseMessage CorrelationId => int32 ResponseMessage => MetadataResponse | ProduceResponse | FetchResponse | OffsetResponse | OffsetCommitResponse | OffsetFetchResponse |
|
| Field |
Description |
| CorrelationId |
The server passes back whatever integer the client supplied as the correlation in the request. |
The response will always match the paired request (e.g. we will send a MetadataResponse in return to a MetadataRequest).
Message sets
One structure common to both the produce and fetch requests is the message set format. A message in kafka is a key-value pair with a small amount of associated metadata. A message set is just a sequence of messages with offset and size information. This format happens to be used both for the on-disk storage on the broker and the on-the-wire format.
A message set is also the unit of compression in Kafka, and we allow messages to recursively contain compressed message sets to allow batch compression.
N.B., MessageSets are not preceded by an int32 like other array elements in the protocol.
| MessageSet => [Offset MessageSize Message] Offset => int64 MessageSize => int32 |
Message format
| Message => Crc MagicByte Attributes Key Value Crc => int32 MagicByte => int8 Attributes => int8 Key => bytes Value => bytes |
|
| Field |
Description |
| Offset |
This is the offset used in kafka as the log sequence number. When the producer is sending messages it doesn't actually know the offset and can fill in any value here it likes. |
| Crc |
The CRC is the CRC32 of the remainder of the message bytes. This is used to check the integrity of the message on the broker and consumer. |
| MagicByte |
This is a version id used to allow backwards compatible evolution of the message binary format. |
| Attributes |
This byte holds metadata attributes about the message. The lowest 2 bits contain the compression codec used for the message. The other bits should be set to 0. |
| Key |
The key is an optional message key that was used for partition assignment. The key can be null. |
| Value |
The value is the actual message contents as an opaque byte array. Kafka supports recursive messages in which case this may itself contain a message set. The message can be null. |
Compression
Kafka supports compressing messages for additional efficiency, however this is more complex than just compressing a raw message. Because individual messages may not have sufficient redundancy to enable good compression ratios, compressed messages must be sent in special batches (although you may use a batch of one if you truly wish to compress a message on its own). The messages to be sent are wrapped (uncompressed) in a MessageSet structure, which is then compressed and stored in the Value field of a single "Message" with the appropriate compression codec set. The receiving system parses the actual MessageSet from the decompressed value.
Kafka currently supports two compression codecs with the following codec numbers:
| Compression |
Codec |
| None |
0 |
| GZIP |
1 |
| Snappy |
2 |
The APIs
This section gives details on each of the individual APIs, their usage, their binary format, and the meaning of their fields.
Metadata API
This API answers the following questions:
- What topics exist?
- How many partitions does each topic have?
- Which broker is currently the leader for each partition?
- What is the host and port for each of these brokers
This is the only request that can be addressed to any broker in the cluster.
Since there may be many topics the client can give an optional list of topic names in order to only return metadata for a subset of topics.
The metdata returned is at the partition level, but grouped together by topic for convenience and to avoid redundancy. For each partition the metadata contains the information for the leader as well as for all the replicas and the list of replicas that are currently in-sync.
Metadata Request
| MetadataRequest => [TopicName] TopicName => string |
|
| Field |
Description |
| TopicName |
The topics to produce metadata for. If empty the request will yield metadata for all topics. |
Metadata Response
The response contains metadata for each partition, with partitions grouped together by topic. This metadata refers to brokers by their broker id. The brokers each have a host and port.
| MetadataResponse => [Broker][TopicMetadata] Broker => NodeId Host Port NodeId => int32 Host => string Port => int32 TopicMetadata => TopicErrorCode TopicName [PartitionMetadata] TopicErrorCode => int16 PartitionMetadata => PartitionErrorCode PartitionId Leader Replicas Isr PartitionErrorCode => int16 PartitionId => int32 Leader => int32 Replicas => [int32] Isr => [int32] |
|
| Field |
Description |
| Leader |
The node id for the kafka broker currently acting as leader for this partition. If no leader exists because we are in the middle of a leader election this id will be -1. |
| Replicas |
The set of alive nodes that currently acts as slaves for the leader for this partition. |
| Isr |
The set subset of the replicas that are "caught up" to the leader |
| Broker |
The node id, hostname, and port information for a kafka broker |
Produce API
The produce API is used to send message sets to the server. For efficiency it allows sending message sets intended for many topic partitions in a single request.
The produce API uses the generic message set format, but since no offset has been assigned to the messages at the time of the send the producer is free to fill in that field in any way it likes.
Produce Request
| ProduceRequest => RequiredAcks Timeout [TopicName [Partition MessageSetSize MessageSet]] RequiredAcks => int16 Timeout => int32 Partition => int32 MessageSetSize => int32 |
|
| Field |
Description |
| RequiredAcks |
This field indicates how many acknowledgements the servers should receive before responding to the request. If it is 0 the server will not send any response (this is the only case where the server will not reply to a request). If it is 1, the server will wait the data is written to the local log before sending a response. If it is -1 the server will block until the message is committed by all in sync replicas before sending a response. For any number > 1 the server will block waiting for this number of acknowledgements to occur (but the server will never wait for more acknowledgements than there are in-sync replicas). |
| Timeout |
This provides a maximum time in milliseconds the server can await the receipt of the number of acknowledgements in RequiredAcks. The timeout is not an exact limit on the request time for a few reasons: (1) it does not include network latency, (2) the timer begins at the beginning of the processing of this request so if many requests are queued due to server overload that wait time will not be included, (3) we will not terminate a local write so if the local write time exceeds this timeout it will not be respected. To get a hard timeout of this type the client should use the socket timeout. |
| TopicName |
The topic that data is being published to. |
| Partition |
The partition that data is being published to. |
| MessageSetSize |
The size, in bytes, of the message set that follows. |
| MessageSet |
A set of messages in the standard format described above. |
Produce Response
| ProduceResponse => [TopicName [Partition ErrorCode Offset]] TopicName => string Partition => int32 ErrorCode => int16 Offset => int64 |
|
| Field |
Description |
| Topic |
The topic this response entry corresponds to. |
| Partition |
The partition this response entry corresponds to. |
| ErrorCode |
The error from this partition, if any. Errors are given on a per-partition basis because a given partition may be unavailable or maintained on a different host, while others may have successfully accepted the produce request. |
| Offset |
The offset assigned to the first message in the message set appended to this partition. |
Fetch API
The fetch API is used to fetch a chunk of one or more logs for some topic-partitions. Logically one specifies the topics, partitions, and starting offset at which to begin the fetch and gets back a chunk of messages.
Fetch requests follow a long poll model so they can be made to block for a period of time if sufficient data is not immediately available.
As an optimization the server is allowed to return a partial message at the end of the message set. Clients should handle this case.
One thing to note is that the fetch API requires specifying the partition to consume from. The question is how should a consumer know what partitions to consume from? In particular how can you balance the partitions over a set of consumers acting as a group so that each consumer gets a subset of partitions. We have done this assignment dynamically using zookeeper for the scala and java client. The downside of this approach is that it requires a fairly fat client and a zookeeper connection. We haven't yet created a Kafka API to allow this functionality to be moved to the server side and accessed more conveniently. A simple consumer client can be implemented by simply requiring that the partitions be specified in config, though this will not allow dynamic reassignment of partitions should that consumer fail. We hope to address this gap in the next major release.
Fetch Request
| FetchRequest => ReplicaId MaxWaitTime MinBytes [TopicName [Partition FetchOffset MaxBytes]] ReplicaId => int32 MaxWaitTime => int32 MinBytes => int32 TopicName => string Partition => int32 FetchOffset => int64 MaxBytes => int32 |
|
| Field |
Description |
| ReplicaId |
The replica id indicates the node id of the replica initiating this request. Normal client consumers should always specify this as -1 as they have no node id. Other brokers set this to be their own node id. The value -2 is accepted to allow a non-broker to issue fetch requests as if it were a replica broker for debugging purposes. |
| MaxWaitTime |
The max wait time is the maximum amount of time in milliseconds to block waiting if insufficient data is available at the time the request is issued. |
| MinBytes |
This is the minimum number of bytes of messages that must be available to give a response. If the client sets this to 0 the server will always respond immediately, however if there is no new data since their last request they will just get back empty message sets. If this is set to 1, the server will respond as soon as at least one partition has at least 1 byte of data or the specified timeout occurs. By setting higher values in combination with the timeout the consumer can tune for throughput and trade a little additional latency for reading only large chunks of data (e.g. setting MaxWaitTime to 100 ms and setting MinBytes to 64k would allow the server to wait up to 100ms to try to accumulate 64k of data before responding). |
| TopicName |
The name of the topic. |
| Partition |
The id of the partition the fetch is for. |
| FetchOffset |
The offset to begin this fetch from. |
| MaxBytes |
The maximum bytes to include in the message set for this partition. This helps bound the size of the response. |
Fetch Response
| FetchResponse => [TopicName [Partition ErrorCode HighwaterMarkOffset MessageSetSize MessageSet]] TopicName => string Partition => int32 ErrorCode => int16 HighwaterMarkOffset => int64 MessageSetSize => int32 |
|
| Field |
Description |
| TopicName |
The name of the topic this response entry is for. |
| Partition |
The id of the partition this response is for. |
| HighwaterMarkOffset |
The offset at the end of the log for this partition. This can be used by the client to determine how many messages behind the end of the log they are. |
| MessageSetSize |
The size in bytes of the message set for this partition |
| MessageSet |
The message data fetched from this partition, in the format described above. |
Offset API
This API describes the valid offset range available for a set of topic-partitions. As with the produce and fetch APIs requests must be directed to the broker that is currently the leader for the partitions in question. This can be determined using the metadata API.
The response contains the starting offset of each segment for the requested partition as well as the "log end offset" i.e. the offset of the next message that would be appended to the given partition.
We agree that this API is slightly funky.
Offset Request
| OffsetRequest => ReplicaId [TopicName [Partition Time MaxNumberOfOffsets]] ReplicaId => int32 TopicName => string Partition => int32 Time => int64 MaxNumberOfOffsets => int32 |
|
| Field |
Decription |
| Time |
Used to ask for all messages before a certain time (ms). There are two special values. Specify -1 to receive the latest offsets and -2 to receive the earliest available offset. Note that because offsets are pulled in descending order, asking for the earliest offset will always return you a single element. |
Offset Response
| OffsetResponse => [TopicName [PartitionOffsets]] PartitionOffsets => Partition ErrorCode [Offset] Partition => int32 ErrorCode => int16 Offset => int64 |
Offset Commit/Fetch API
These APIs allow for centralized management of offsets. Read more Offset Management. As per comments on KAFKA-993 these API calls are not fully functional in releases until Kafka 0.8.1.1. It will be available in the 0.8.2 release.
Consumer Metadata Request
The offsets for a given consumer group are maintained by a specific broker called the offset coordinator. i.e., a consumer needs to issue its offset commit and fetch requests to this specific broker. It can discover the current offset coordinator by issuing a consumer metadata request.
| ConsumerMetadataRequest => ConsumerGroup ConsumerGroup => string |
Consumer Metadata Response
On a successful (ErrorCode == 0) response, the coordinator fields provide the id/host/port details of the offset coordinator.
| ConsumerMetadataResponse => ErrorCode |CoordinatorId CoordinatorHost CoordinatorPort| ErrorCode => int16 CoordinatorId => int32 CoordinatorHost => string CoordinatorPort => int32 |
Offset Commit Request
| OffsetCommitRequest => ConsumerGroup [TopicName [Partition Offset TimeStamp Metadata]] ConsumerGroup => string TopicName => string Partition => int32 Offset => int64 TimeStamp => int64 Metadata => string |
(If the time stamp field is set to -1, then the broker sets the time stamp to the receive time before committing the offset.)
Offset Commit Response
| OffsetCommitResponse => [TopicName [Partition ErrorCode]]] TopicName => string Partition => int32 ErrorCode => int16 |
Offset Fetch Request
| OffsetFetchRequest => ConsumerGroup [TopicName [Partition]] ConsumerGroup => string TopicName => string Partition => int32 |
Offset Fetch Response
| OffsetFetchResponse => [TopicName [Partition Offset Metadata ErrorCode]] TopicName => string Partition => int32 Offset => int64 Metadata => string ErrorCode => int16 |
Note that if there is no offset associated with a topic-partition under that consumer group the broker does not set an error code (since it is not really an error), but returns empty metadata and sets the offset field to -1.
Constants
Api Keys
The following are the numeric codes that the ApiKey in the request can take for each of the above request types.
| API name |
ApiKey Value |
| ProduceRequest |
0 |
| FetchRequest |
1 |
| OffsetRequest |
2 |
| MetadataRequest |
3 |
| Non-user facing control APIs |
4-7 |
| OffsetCommitRequest |
8 |
| OffsetFetchRequest |
9 |
| ConsumerMetadataRequest |
10 |
Error Codes
We use numeric codes to indicate what problem occurred on the server. These can be translated by the client into exceptions or whatever the appropriate error handling mechanism in the client language. Here is a table of the error codes currently in use:
| Error |
Code |
Description |
| NoError |
0 |
No error--it worked! |
| Unknown |
-1 |
An unexpected server error |
| OffsetOutOfRange |
1 |
The requested offset is outside the range of offsets maintained by the server for the given topic/partition. |
| InvalidMessage |
2 |
This indicates that a message contents does not match its CRC |
| UnknownTopicOrPartition |
3 |
This request is for a topic or partition that does not exist on this broker. |
| InvalidMessageSize |
4 |
The message has a negative size |
| LeaderNotAvailable |
5 |
This error is thrown if we are in the middle of a leadership election and there is currently no leader for this partition and hence it is unavailable for writes. |
| NotLeaderForPartition |
6 |
This error is thrown if the client attempts to send messages to a replica that is not the leader for some partition. It indicates that the clients metadata is out of date. |
| RequestTimedOut |
7 |
This error is thrown if the request exceeds the user-specified time limit in the request. |
| BrokerNotAvailable |
8 |
This is not a client facing error and is used only internally by intra-cluster broker communication. |
| Unused |
9 |
Unused |
| MessageSizeTooLarge |
10 |
The server has a configurable maximum message size to avoid unbounded memory allocation. This error is thrown if the client attempt to produce a message larger than this maximum. |
| StaleControllerEpochCode |
11 |
Internal error code for broker-to-broker communication. |
| OffsetMetadataTooLargeCode |
12 |
If you specify a string larger than configured maximum for offset metadata |
| OffsetsLoadInProgressCode |
14 |
The broker returns this error code for an offset fetch request if it is still loading offsets (after a leader change for that offsets topic partition). |
| ConsumerCoordinatorNotAvailableCode |
15 |
The broker returns this error code for consumer metadata requests or offset commit requests if the offsets topic has not yet been created. |
| NotCoordinatorForConsumerCode |
16 |
The broker returns this error code if it receives an offset fetch or commit request for a consumer group that it is not a coordinator for. |
Some Common Philosophical Questions
Some people have asked why we don't use HTTP. There are a number of reasons, the best is that client implementors can make use of some of the more advanced TCP features--the ability to multiplex requests, the ability to simultaneously poll many connections, etc. We have also found HTTP libraries in many languages to be surprisingly shabby.
Others have asked if maybe we shouldn't support many different protocols. Prior experience with this was that it makes it very hard to add and test new features if they have to be ported across many protocol implementations. Our feeling is that most users don't really see multiple protocols as a feature, they just want a good reliable client in the language of their choice.
Another question is why we don't adopt XMPP, STOMP, AMQP or an existing protocol. The answer to this varies by protocol, but in general the problem is that the protocol does determine large parts of the implementation and we couldn't do what we are doing if we didn't have control over the protocol. Our belief is that it is possible to do better than existing messaging systems have in providing a truly distributed messaging system, and to do this we need to build something that works differently.
A final question is why we don't use a system like Protocol Buffers or Thrift to define our request messages. These packages excel at helping you to managing lots and lots of serialized messages. However we have only a few messages. Support across languages is somewhat spotty (depending on the package). Finally the mapping between binary log format and wire protocol is something we manage somewhat carefully and this would not be possible with these systems. Finally we prefer the style of versioning APIs explicitly and checking this to inferring new values as nulls as it allows more nuanced control of compatibility.
