etcd 與 Zookeeper、Consul 等其它 kv 組件的對比


基於etcd的分布式配置中心

etcd docs | etcd versus other key-value stores https://etcd.io/docs/v3.4.0/learning/why/

The name “etcd” originated from two ideas, the unix “/etc” folder and “d"istributed systems. The “/etc” folder is a place to store configuration data for a single system whereas etcd stores configuration information for large scale distributed systems. Hence, a “d"istributed “/etc” is “etcd”.

etcd is designed as a general substrate for large scale distributed systems. These are systems that will never tolerate split-brain operation and are willing to sacrifice availability to achieve this end. etcd stores metadata in a consistent and fault-tolerant way. An etcd cluster is meant to provide key-value storage with best of class stability, reliability, scalability and performance.

Distributed systems use etcd as a consistent key-value store for configuration management, service discovery, and coordinating distributed work. Many organizations use etcd to implement production systems such as container schedulers, service discovery services, and distributed data storage. Common distributed patterns using etcd include leader electiondistributed locks, and monitoring machine liveness.

Use cases

  • Container Linux by CoreOS: Applications running on Container Linux get automatic, zero-downtime Linux kernel updates. Container Linux uses locksmith to coordinate updates. Locksmith implements a distributed semaphore over etcd to ensure only a subset of a cluster is rebooting at any given time.
  • Kubernetes stores configuration data into etcd for service discovery and cluster management; etcd’s consistency is crucial for correctly scheduling and operating services. The Kubernetes API server persists cluster state into etcd. It uses etcd’s watch API to monitor the cluster and roll out critical configuration changes.

Comparison chart

Perhaps etcd already seems like a good fit, but as with all technological decisions, proceed with caution. Please note this documentation is written by the etcd team. Although the ideal is a disinterested comparison of technology and features, the authors’ expertise and biases obviously favor etcd. Use only as directed.

The table below is a handy quick reference for spotting the differences among etcd and its most popular alternatives at a glance. Further commentary and details for each column are in the sections following the table.

  etcd ZooKeeper Consul NewSQL (Cloud Spanner, CockroachDB, TiDB)
Concurrency Primitives Lock RPCsElection RPCscommand line lockscommand line electionsrecipes in go External curator recipes in Java Native lock API Rare, if any
Linearizable Reads Yes No Yes Sometimes
Multi-version Concurrency Control Yes No No Sometimes
Transactions Field compares, Read, Write Version checks, Write Field compare, Lock, Read, Write SQL-style
Change Notification Historical and current key intervals Current keys and directories Current keys and prefixes Triggers (sometimes)
User permissions Role based ACLs ACLs Varies (per-table GRANT, per-database roles)
HTTP/JSON API Yes No Yes Rarely
Membership Reconfiguration Yes >3.5.0 Yes Yes
Maximum reliable database size Several gigabytes Hundreds of megabytes (sometimes several gigabytes) Hundreds of MBs Terabytes+
Minimum read linearization latency Network RTT No read linearization RTT + fsync Clock barriers (atomic, NTP)

ZooKeeper

ZooKeeper solves the same problem as etcd: distributed system coordination and metadata storage. However, etcd has the luxury of hindsight taken from engineering and operational experience with ZooKeeper’s design and implementation. The lessons learned from Zookeeper certainly informed etcd’s design, helping it support large scale systems like Kubernetes. The improvements etcd made over Zookeeper include:

  • Dynamic cluster membership reconfiguration
  • Stable read/write under high load
  • A multi-version concurrency control data model
  • Reliable key monitoring which never silently drop events
  • Lease primitives decoupling connections from sessions
  • APIs for safe distributed shared locks

Furthermore, etcd supports a wide range of languages and frameworks out of the box. Whereas Zookeeper has its own custom Jute RPC protocol, which is totally unique to Zookeeper and limits its supported language bindings, etcd’s client protocol is built from gRPC, a popular RPC framework with language bindings for go, C++, Java, and more. Likewise, gRPC can be serialized into JSON over HTTP, so even general command line utilities like curl can talk to it. Since systems can select from a variety of choices, they are built on etcd with native tooling rather than around etcd with a single fixed set of technologies.

When considering features, support, and stability, new applications planning to use Zookeeper for a consistent key value store would do well to choose etcd instead.

Consul

Consul is an end-to-end service discovery framework. It provides built-in health checking, failure detection, and DNS services. In addition, Consul exposes a key value store with RESTful HTTP APIs. As it stands in Consul 1.0, the storage system does not scale as well as other systems like etcd or Zookeeper in key-value operations; systems requiring millions of keys will suffer from high latencies and memory pressure. The key value API is missing, most notably, multi-version keys, conditional transactions, and reliable streaming watches.

etcd and Consul solve different problems. If looking for a distributed consistent key value store, etcd is a better choice over Consul. If looking for end-to-end cluster service discovery, etcd will not have enough features; choose Kubernetes, Consul, or SmartStack.

NewSQL (Cloud Spanner, CockroachDB, TiDB)

Both etcd and NewSQL databases (e.g., CockroachTiDBGoogle Spanner) provide strong data consistency guarantees with high availability. However, the significantly different system design parameters lead to significantly different client APIs and performance characteristics.

NewSQL databases are meant to horizontally scale across data centers. These systems typically partition data across multiple consistent replication groups (shards), potentially distant, storing data sets on the order of terabytes and above. This sort of scaling makes them poor candidates for distributed coordination as they have long latencies from waiting on clocks and expect updates with mostly localized dependency graphs. The data is organized into tables, including SQL-style query facilities with richer semantics than etcd, but at the cost of additional complexity for processing, planning, and optimizing queries.

In short, choose etcd for storing metadata or coordinating distributed applications. If storing more than a few GB of data or if full SQL queries are needed, choose a NewSQL database.

Using etcd for metadata

etcd replicates all data within a single consistent replication group. For storing up to a few GB of data with consistent ordering, this is the most efficient approach. Each modification of cluster state, which may change multiple keys, is assigned a global unique ID, called a revision in etcd, from a monotonically increasing counter for reasoning over ordering. Since there’s only a single replication group, the modification request only needs to go through the raft protocol to commit. By limiting consensus to one replication group, etcd gets distributed consistency with a simple protocol while achieving low latency and high throughput.

The replication behind etcd cannot horizontally scale because it lacks data sharding. In contrast, NewSQL databases usually shard data across multiple consistent replication groups, storing data sets on the order of terabytes and above. However, to assign each modification a global unique and increasing ID, each request must go through an additional coordination protocol among replication groups. This extra coordination step may potentially conflict on the global ID, forcing ordered requests to retry. The result is a more complicated approach with typically worse performance than etcd for strict ordering.

If an application reasons primarily about metadata or metadata ordering, such as to coordinate processes, choose etcd. If the application needs a large data store spanning multiple data centers and does not heavily depend on strong global ordering properties, choose a NewSQL database.

Using etcd for distributed coordination

etcd has distributed coordination primitives such as event watches, leases, elections, and distributed shared locks out of the box. These primitives are both maintained and supported by the etcd developers; leaving these primitives to external libraries shirks the responsibility of developing foundational distributed software, essentially leaving the system incomplete. NewSQL databases usually expect these distributed coordination primitives to be authored by third parties. Likewise, ZooKeeper famously has a separate and independent library of coordination recipes. Consul, which provides a native locking API, goes so far as to apologize that it’s “ not a bulletproof method”.

In theory, it’s possible to build these primitives atop any storage systems providing strong consistency. However, the algorithms tend to be subtle; it is easy to develop a locking algorithm that appears to work, only to suddenly break due to thundering herd and timing skew. Furthermore, other primitives supported by etcd, such as transactional memory depend on etcd’s MVCC data model; simple strong consistency is not enough.

For distributed coordination, choosing etcd can help prevent operational headaches and save engineering effort.

 

https://mp.weixin.qq.com/s/86LN9l1hdviquFT8gwy0oA

etcd 與 Zookeeper、Consul 等其它 kv 組件的對比

關於 etcd

本文的主角是 etcd。名稱 “etcd” 源自兩個想法,即 unix “/etc” 文件夾 和 “d” 分布式系統。“/etc” 文件夾是用於存儲單個系統的配置數據的位置,而 etcd 用於存儲大規模分布式的配置信息。因此,分配了 “d” 的 “/etc” 就是 “etcd”。

etcd 被設計為大型分布式系統的通用基板。這些大型系統需要避免腦裂,並且願意犧牲可用性來實現此目的。etcd 以一致且容錯的方式存儲元數據。etcd 集群旨在提供具有穩定性、可靠性、可伸縮性和性能的鍵值存儲。

分布式系統將 etcd 用作配置管理、服務發現和協調分布式工作的一致鍵值存儲組件。許多組織在生產系統上使用 etcd,例如容器調度程序、服務發現服務和分布式數據存儲。使用 etcd 的常見分布式模式包括領導者選舉、分布式鎖和監視機器活動狀態等。

使用案例

  1. CoreOS 的 Container Linux:在 Container Linux 上運行的應用程序將獲得零停機時間的 Linux 內核自動更新。Container Linux 使用鎖來協調更新。Locksmith 在 etcd上 實現了一個分布式信號量,以確保在任何給定時間僅集群的一個子集正在重啟。

  2. Kubernetes 將配置數據存儲到 etcd 中以進行服務發現和集群管理;etcd的一致性對於容器的編排至關重要。Kubernetes API 服務器將群集狀態持久保存到 etcd 中。它使用 etcd 的 watch API 監視集群並回滾關鍵的配置更改。

多維度對比

也許 etcd 已經看起來很合適,但是與所有技術選型一樣,我們需要謹慎進行。盡管理想的情況是對技術和功能進行客觀的比較,但是作者的專業知識和偏見顯然傾向於etcd(實驗和文檔由etcd的作者編寫)。

下表是一目了然的快速參考,可發現 etcd 及其最受歡迎的替代方案之間的差異。表格后面的各節中提供了每列的進一步說明和詳細信息。

 

與 ZooKeeper

ZooKeeper 解決了與 etcd 相同的問題:分布式系統協調和元數據存儲。但是, etcd 踩在前人的肩膀上,其參考了 ZooKeeper 的設計和實現經驗。從 Zookeeper 汲取的經驗教訓無疑為 etcd 的設計提供了支撐,從而幫助其支持 Kubernetes 等大型系統。對 Zookeeper 進行的 etcd 改進包括:

  • 動態重新配置集群成員

  • 高負載下穩定的讀寫

  • 多版本並發控制數據模型

  • 可靠的鍵值監控

  • 租期原語將 session 中的連接解耦

  • 用於分布式共享鎖的 API

此外,etcd 開箱即用地支持多種語言和框架。Zookeeper 擁有自己的自定義Jute RPC 協議,該協議對於 Zookeeper 而言是完全唯一的,並限制了其受支持的語言綁定,而 etcd 的客戶端協議則是基於 gRPC 構建的,gRP 是一種流行的 RPC 框架,具有 go,C ++,Java 等語言支持。同樣,gRPC 可以通過 HTTP 序列化為 JSON,因此即使是通用命令行使用程序(例如curl)也可以與之通信。由於系統可以從多種選擇中進行選擇,因此它們是基於具有本機工具的 etcd 構建的,而不是基於一組固定的技術圍繞 etcd 構建的。

在考慮功能,支持和穩定性時,etcd 相比於 Zookeeper,更加適合用作一致性的鍵值存儲的組件。

Consul

Consul 是一個端到端的服務發現框架。它提供內置的運行狀況檢查,故障檢測和 DNS 服務。此外,Consul 還使用 RESTful HTTP API 公開了密鑰值存儲。在 Consul 1.0 中,存儲系統在鍵值操作中無法像 etcd 或 Zookeeper 等其他組件那樣擴展。數百萬個鍵的系統將遭受高延遲和內存壓力。Consul 最明顯的是缺少多版本鍵,條件事務和可靠的流監視。

etcd 和 Consul 解決了不同的問題。如果要尋找分布式一致鍵值存儲,那么與 Consul 相比,etcd是更好的選擇。如果正在尋找端到端的集群服務發現,etcd 將沒有足夠的功能。可以選擇 Kubernetes,Consul或 SmartStack。

NewSQL(Cloud Spanner, CockroachDB, TiDB)

etcd 和 NewSQL 數據庫(例如Cockroach,TiDB,Google Spanner)都提供了具有高可用性的強大數據一致性保證。但是,不同的系統設計思路導致顯著不同的客戶端 API 和性能特征。

NewSQL 數據庫旨在跨數據中心水平擴展。這些系統通常跨多個一致的復制組(分片)對數據進行分區,這些復制組可能相距很遠,並以 TB 或更高級別存儲數據集。這種縮放比例使它們成為分布式協調的較差候選者,因為它們需要很長的等待時間,並且期望使用大多數本地化的依賴拓撲進行更新。NewSQL 數據被組織成表格,包括具有比 etcd 更為豐富的語義的 SQL 樣式的查詢工具,但是以處理和優化查詢的額外復雜性為代價。

簡而言之,選擇 etcd 來存儲元數據或協調分布式應用程序。如果存儲的數據超過數 GB,或者需要完整的 SQL 查詢,請選擇 NewSQL 數據庫。

使用 etcd 存儲元配置數據

etcd 在單個復制組中復制所有數據。對於以一致的順序存儲多達幾 GB 的數據,這是最有效的方法。集群狀態的每次修改(可能會更改多個鍵)都從一個單調遞增的計數器中分配了一個全局唯一 ID(在etcd中稱為修訂版),以進行排序。由於只有一個復制組,因此修改請求只需通過 raft 協議提交。通過將共識限制在一個復制組中,etcd 使用簡單的協議即可獲得分布式一致性,同時實現低延遲和高吞吐量。

etcd 后面的復制無法水平擴展,因為它缺少數據分片。相反,NewSQL 數據庫通常在多個一致的復制組之間分片數據,存儲數據集的級別為 TB 或更高。但是,要為每個修改分配一個全局唯一且遞增的 ID,每個請求必須通過復制組之間的附加協調協議。這個額外的協調步驟可能會在全局 ID 上發生沖突,從而強制有序的請求重試。結果是,對於嚴格的一致性,NewSQL 方法的性能通常比 etcd 更復雜。

如果應用程序主要是出於元數據或元數據排序的原因(例如協調流程),請選擇etcd。如果應用程序需要跨多個數據中心的大型數據存儲,並且在很大程度上不依賴於強大的全局排序屬性,請選擇 NewSQL 數據庫。

使用 etcd 作為分布式協調組件

etcd 具有分布式協調原語,例如事件監視,租約,選舉和開箱即用的分布式鎖。這些原語由 etcd 開發人員維護和支持;將這些功能留給承擔了開發基礎分布式軟件的外部庫,實質上使系統不完整。NewSQL 數據庫通常期望這些分布式協調原語由第三方編寫。同樣,ZooKeeper 有一個獨立的協調庫。提供本地鎖 API 的 Consul 甚至對 “不是防彈方法” 深表歉意(1個client釋放鎖之后,其它client無法立刻獲得鎖,這可能是由於lock-delay設置引起的。)。

從理論上講,可以在提供強一致性的任何存儲系統上構建這些原語。但是,算法往往很微妙。很容易開發出一種看起來有效的鎖定算法,但是由於邊界和時序偏差而中斷。此外,etcd 支持的其他原語(例如事務性存儲器)取決於 etcd 的 MVCC 數據模型;簡單的強一致性是不夠的。

對於分布式協調,選擇 etcd 可以幫助避免操作上的麻煩並減少工作量。

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM