資源管理器高可用性
. The ResourceManager (RM) is responsible for tracking the resources in a cluster, and scheduling applications (e.g., MapReduce jobs). Prior to Hadoop 2.4, the ResourceManager is the single point of failure in a YARN cluster. The High Availability feature adds redundancy in the form of an Active/Standby ResourceManager pair to remove this otherwise single point of failure.
RM負責跟蹤集群中的資源,然后調度類似於MR這樣具體的應用程序。在Hadoop2.4版本以前,RM在YARN集群中的一個可能造成集群故障的單點。通過以主備RM的方式增加冗余,高可用性功能 規避了單點問題導致的集群不可用。
ResourceManager HA is realized through an Active/Standby architecture - at any point of time, one of the RMs is Active, and one or more RMs are in Standby mode waiting to take over should anything happen to the Active. The trigger to transition-to-active comes from either the admin (through CLI) or through the integrated failover-controller when automatic-failover is enabled.
RM HA功能是通過主從備份架構實現的:在任何時候,多個RM中的一個作為主RM提供服務,另有一個或者多個RM處於待命狀態,當有主RM出事了以后,待命的RM能夠進行接管。如果要觸發切換到主RM事務,可以由管理員從命令行的輸入,也可在自動failover功能開關打開以后,通過集成failover控制器觸發。
Manual transitions and failover手工觸發故障切換
When automatic failover is not enabled, admins have to manually transition one of the RMs to Active. To failover from one RM to the other, they are expected to first transition the Active-RM to Standby and transition a Standby-RM to Active. All this can be done using the “yarn rmadmin” CLI.
當自動failover功能未打開時候,管理員必須手工設置多個RM中的一個到主服務狀態。為了實現從一個RM 到另外一個的failover切換,需要首先把主RM設置從active狀態切換到standby狀態,然后把一個standby的切換到active。這些操作可以通過yarn rmadmin 命令行進行。
Automatic failover自動故障切換
The RMs have an option to embed the Zookeeper-based ActiveStandbyElector to decide which RM should be the Active. When the Active goes down or becomes unresponsive, another RM is automatically elected to be the Active which then takes over. Note that, there is no need to run a separate ZKFC daemon as is the case for HDFS because ActiveStandbyElector embedded in RMs acts as a failure detector and a leader elector instead of a separate ZKFC deamon.
RM有個選項去嵌入一個基於Zookeeper的主備選舉器,它能夠決定哪個RM應該是active的。當主RM掛掉或者無法響應,另外一個RM會自動的被選舉為主RM,隨后去接管。注意,沒有必要去啟動一個獨立的ZKFC守護進程,因為對HDFS來說,嵌入在RM里面的主從選舉器能夠作為一個故障檢測模塊和一個領袖選舉器工作,而非一個獨立的ZKFC守護進程。
Client, ApplicationMaster and NodeManager on RM failover客戶端、應用主節點,節點管理器在資源管理器上的故障切換
When there are multiple RMs, the configuration (yarn-site.xml) used by clients and nodes is expected to list all the RMs. Clients, ApplicationMasters (AMs) and NodeManagers (NMs) try connecting to the RMs in a round-robin fashion until they hit the Active RM. If the Active goes down, they resume the round-robin polling until they hit the “new” Active. This default retry logic is implemented as org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider. You can override the logic by implementing org.apache.hadoop.yarn.client.RMFailoverProxyProvider and setting the value of yarn.client.failover-proxy-provider to the class name.
當有多個資源管理器的時候,被節點和客戶端所使用的配置(yarn-site.xml)需要列舉出全部資源管理器。客戶端、應用主節點們和節點管理器們嘗試以輪詢方式連接資源管理器們,一直到訪問的主資源管理器。如果主資源管理器掛掉,他們繼續執行循環查詢一直找到新的主節點。默認的重試邏輯是在org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider中實現的。可通過實現org.apache.hadoop.yarn.client.RMFailoverProxyProvider類來重寫重試邏輯,然后把類名替換到yarn.client.failover-proxy-provider的值中。
Recovering prevous active-RM’s state修復前一個主資源管理器的狀態
With the ResourceManger Restart enabled, the RM being promoted to an active state loads the RM internal state and continues to operate from where the previous active left off as much as possible depending on the RM restart feature. A new attempt is spawned for each managed application previously submitted to the RM. Applications can checkpoint periodically to avoid losing any work. The state-store must be visible from the both of Active/Standby RMs. Currently, there are two RMStateStore implementations for persistence - FileSystemRMStateStore and ZKRMStateStore. The ZKRMStateStore implicitly allows write access to a single RM at any point in time, and hence is the recommended store to use in an HA cluster. When using the ZKRMStateStore, there is no need for a separate fencing mechanism to address a potential split-brain situation where multiple RMs can potentially assume the Active role. When using the ZKRMStateStore, it is advisable to NOT set the “zookeeper.DigestAuthenticationProvider.superDigest” property on the Zookeeper cluster to ensure that the zookeeper admin does not have access to YARN application/user credential information.
在資源管理器重啟功能打開情況下,被設置為激活狀態的資源管理器,盡最大可能的從前一個激活的資源管理器停止的地方加載其內部狀態並恢復操作。資源管理器會嘗試把之前提交到資源管理器的中的每個被管理的應用都重新提交。應用程序通過定期設置檢查點規避丟失掉任務。不管是對激活的還是備用的資源管理器,狀態儲存對他們都必須是可見的。當前,有兩種實現了持久化存儲的資源管理器狀態存儲:FileSystemRMStateStore 和 ZKRMStateStore。 ZKRMStateStore允許即時向單個的資源管理器更新狀態,所以也是在高可用集群中的推薦的一種存儲辦法。當使用ZKRMStateStore的時候,沒有必要設置單獨的防御機制,去處理可能出現的多個資源管理器潛在的把自己設置為激活狀態的腦裂狀態。當使用ZKRMStateStore的時候,建議在Zookeeper集群中不設置zookeeper.DigestAuthenticationProvider.superDigest這個配置,確保Zookeeper管理員不會獲取到YARN用戶和應用程序的機密信息。
原文見:https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html
ResourceManager Restart:https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRestart.html