Solr4.8.0源碼分析(20)之SolrCloud的Recovery策略(一)
題記:
我們在使用SolrCloud中會經常發現會有備份的shard出現狀態Recoverying,這就表明SolrCloud的數據存在着不一致性,需要進行Recovery,這個時候的SolrCloud建索引是不會寫入索引文件中的(每個shard接受到update后寫入自己的ulog中)。關於Recovery的內容包含三篇,本文是第一篇介紹Recovery的原因以及總體流程。
1. Recovery的起因
Recovery一般發生在以下三個時候:
- SolrCloud啟動的時候,主要由於在建索引的時候發生意外關閉,導致一些shard的數據與leader不一致,那么在啟動的時候剛起的shard就會從leader那里同步數據。
- SolrCloud在進行leader選舉中出現錯誤,一般出現在leader宕機引起replica進行選舉成leader過程中。
- SolrCloud在進行update時候,由於某種原因leader轉發update至replica沒有成功,會迫使replica進行recoverying進行數據同步。
前面兩種情況暫時不介紹,本文先介紹下第三種情況。大致原理如下圖所示:
之前在<Solr4.8.0源碼分析(15) 之 SolrCloud索引深入(2)>中講到,不管update請求發送到哪個shard 分片中,最后在solrcloud里面進行分發的順序都是從Leader發往Replica。Leader接受到update請求后先將document放入自己的索引文件以及update寫入ulog中,然后將update同時轉發給各個Replica分片。這就流程在就是之前講到的add的索引鏈過程。
那么在索引鏈的add過程完畢后,SolrCloud會再依次調用finish()函數用來接受每一個Replica的響應,檢查Replica的update操作是否成功。如果一旦有一個Replica沒有成功,就會向update失敗的Replica發送RequestRecovering命令強迫該分片進行Recoverying。
1 private void doFinish() { 2 // TODO: if not a forward and replication req is not specified, we could 3 // send in a background thread 4 5 cmdDistrib.finish(); 6 List<Error> errors = cmdDistrib.getErrors(); 7 // TODO - we may need to tell about more than one error... 8 9 // if its a forward, any fail is a problem - 10 // otherwise we assume things are fine if we got it locally 11 // until we start allowing min replication param 12 if (errors.size() > 0) { 13 // if one node is a RetryNode, this was a forward request 14 if (errors.get(0).req.node instanceof RetryNode) { 15 rsp.setException(errors.get(0).e); 16 } else { 17 if (log.isWarnEnabled()) { 18 for (Error error : errors) { 19 log.warn("Error sending update", error.e); 20 } 21 } 22 } 23 // else 24 // for now we don't error - we assume if it was added locally, we 25 // succeeded 26 } 27 28 29 // if it is not a forward request, for each fail, try to tell them to 30 // recover - the doc was already added locally, so it should have been 31 // legit 32 33 for (final SolrCmdDistributor.Error error : errors) { 34 if (error.req.node instanceof RetryNode) { 35 // we don't try to force a leader to recover 36 // when we cannot forward to it 37 continue; 38 } 39 // TODO: we should force their state to recovering ?? 40 // TODO: do retries?? 41 // TODO: what if its is already recovering? Right now recoveries queue up - 42 // should they? 43 final String recoveryUrl = error.req.node.getBaseUrl(); 44 45 Thread thread = new Thread() { 46 { 47 setDaemon(true); 48 } 49 @Override 50 public void run() { 51 log.info("try and ask " + recoveryUrl + " to recover"); 52 HttpSolrServer server = new HttpSolrServer(recoveryUrl); 53 try { 54 server.setSoTimeout(60000); 55 server.setConnectionTimeout(15000); 56 57 RequestRecovery recoverRequestCmd = new RequestRecovery(); 58 recoverRequestCmd.setAction(CoreAdminAction.REQUESTRECOVERY); 59 recoverRequestCmd.setCoreName(error.req.node.getCoreName()); 60 try { 61 server.request(recoverRequestCmd); 62 } catch (Throwable t) { 63 SolrException.log(log, recoveryUrl 64 + ": Could not tell a replica to recover", t); 65 } 66 } finally { 67 server.shutdown(); 68 } 69 } 70 }; 71 ExecutorService executor = req.getCore().getCoreDescriptor().getCoreContainer().getUpdateShardHandler().getUpdateExecutor(); 72 executor.execute(thread); 73 74 } 75 }
2. Recovery的總體流程
Replica接收到來自Leader的RequestRecovery命令后就會開始進行RecoveryStrategy線程,然后進行Recovery。總體流程如下圖索引:
- 在RequestRecovery請求判斷中,我例舉了一部分(不是全部)請求命令,這是正常的索引鏈過程。
- 如果接受到的是RequestRecovery命令,那么本分片就會啟動RecoveryStrategy線程來進行Recovery。
1 // if true, we are recovering after startup and shouldn't have (or be receiving) additional updates (except for local tlog recovery) 2 boolean recoveringAfterStartup = recoveryStrat == null; 3 4 recoveryStrat = new RecoveryStrategy(cc, cd, this); 5 recoveryStrat.setRecoveringAfterStartup(recoveringAfterStartup); 6 recoveryStrat.start(); 7 recoveryRunning = true;
- 分片會設置分片的狀態recoverying。需要指出的是如果一旦檢測到本分片成為了leader,那么Recovery過程就會退出。因為Recovery是從leader中同步數據的。
1 zkController.publish(core.getCoreDescriptor(), ZkStateReader.RECOVERING);
- 這里要判斷下firsttime是否為true(在重啟分片的時候會檢查之前是否進行replication且沒做完就被關閉了),firsttime是控制是否先進入PeerSync Recovery策略的,如果為false則跳過PeerSync進入Replicate。
1 if (recoveringAfterStartup) { 2 // if we're recovering after startup (i.e. we have been down), then we need to know what the last versions were 3 // when we went down. We may have received updates since then. 4 recentVersions = startingVersions; 5 try { 6 if ((ulog.getStartingOperation() & UpdateLog.FLAG_GAP) != 0) { 7 // last operation at the time of startup had the GAP flag set... 8 // this means we were previously doing a full index replication 9 // that probably didn't complete and buffering updates in the 10 // meantime. 11 log.info("Looks like a previous replication recovery did not complete - skipping peer sync. core=" 12 + coreName); 13 firstTime = false; // skip peersync 14 } 15 } catch (Exception e) { 16 SolrException.log(log, "Error trying to get ulog starting operation. core=" 17 + coreName, e); 18 firstTime = false; // skip peersync 19 } 20 }
- 最后進行選擇進入是PeerSync策略和Replicate策略,在<Solr In Action 筆記(4) 之 SolrCloud分布式索引基礎>中簡單提到過兩者的區別。關於具體的不同將在后面兩節詳細介紹。
- Peer sync, 如果中斷的時間較短,recovering node只是丟失少量update請求,那么它可以從leader的update log中獲取。這個臨界值是100個update請求,如果大於100,就會從leader進行完整的索引快照恢復。
- Replication, 如果節點下線太久以至於不能從leader那進行同步,它就會使用solr的基於http進行索引的快照恢復。
- 最后設置分片的狀態為active。並判斷是否是sucessfulrrecovery,如果否則會多出嘗試Recovery。
總結:
本文主要介紹了Recovery的起因以及Recovery過程,由於是簡述所以內容較簡單,主要提到了兩種不同的Recovery策略,后續兩文種將分別詳細介紹。