Hbase合並Region的過程中出現永久RIT的解決


在合並Region的過程中出現永久RIT怎么辦?筆者在生產環境中就遇到過這種情況,在批量合並Region的過程中,出現了永久MERGING_NEW的情況,雖然這種情況不會影響現有集群的正常的服務能力,但是如果集群有某個節點發生重啟,那么可能此時該RegionServer上的Region是沒法均衡的。因為在RIT狀態時,HBase是不會執行Region負載均衡的,即使手動執行balancer命令也是無效的。

如果不解決這種RIT情況,那么后續有HBase節點相繼重啟,這樣會導致整個集群的Region驗證不均衡,這是很致命的,對集群的性能將會影響很大。經過查詢HBase JIRA單,發現這種MERGING_NEW永久RIT的情況是觸發了HBASE-17682的BUG,需要打上該Patch來修復這個BUG,其實就是HBase源代碼在判斷業務邏輯時,沒有對MERGING_NEW這種狀態進行判斷,直接進入到else流程中了。源代碼如下:

for (RegionState state : regionsInTransition.values()) {
        HRegionInfo hri = state.getRegion();
        if (assignedRegions.contains(hri)) {
          // Region is open on this region server, but in transition.
          // This region must be moving away from this server, or splitting/merging.
          // SSH will handle it, either skip assigning, or re-assign.
          LOG.info("Transitioning " + state + " will be handled by ServerCrashProcedure for " + sn);
        } else if (sn.equals(state.getServerName())) {
          // Region is in transition on this region server, and this
          // region is not open on this server. So the region must be
          // moving to this server from another one (i.e. opening or
          // pending open on this server, was open on another one.
          // Offline state is also kind of pending open if the region is in
          // transition. The region could be in failed_close state too if we have
          // tried several times to open it while this region server is not reachable)
          if (state.isPendingOpenOrOpening() || state.isFailedClose() || state.isOffline()) {
            LOG.info("Found region in " + state +
              " to be reassigned by ServerCrashProcedure for " + sn);
            rits.add(hri);
          } else if(state.isSplittingNew()) {
            regionsToCleanIfNoMetaEntry.add(state.getRegion());
          } else {
            LOG.warn("THIS SHOULD NOT HAPPEN: unexpected " + state);
          }
        }
      }

修復之后代碼:

for (RegionState state : regionsInTransition.values()) {
        HRegionInfo hri = state.getRegion();
        if (assignedRegions.contains(hri)) {
          // Region is open on this region server, but in transition.
          // This region must be moving away from this server, or splitting/merging.
          // SSH will handle it, either skip assigning, or re-assign.
          LOG.info("Transitioning " + state + " will be handled by ServerCrashProcedure for " + sn);
        } else if (sn.equals(state.getServerName())) {
          // Region is in transition on this region server, and this
          // region is not open on this server. So the region must be
          // moving to this server from another one (i.e. opening or
          // pending open on this server, was open on another one.
          // Offline state is also kind of pending open if the region is in
          // transition. The region could be in failed_close state too if we have
          // tried several times to open it while this region server is not reachable)
          if (state.isPendingOpenOrOpening() || state.isFailedClose() || state.isOffline()) {
            LOG.info("Found region in " + state +
              " to be reassigned by ServerCrashProcedure for " + sn);
            rits.add(hri);
          } else if(state.isSplittingNew()) {
            regionsToCleanIfNoMetaEntry.add(state.getRegion());
          } else if (isOneOfStates(state, State.SPLITTING_NEW, State.MERGING_NEW)) {
             regionsToCleanIfNoMetaEntry.add(state.getRegion());
           }else {
            LOG.warn("THIS SHOULD NOT HAPPEN: unexpected " + state);
          }
        }
      }

但是,這里有一個問題,目前該JIRA單只是說了需要去修復BUG,打Patch。但是,實際生產情況下,面對這種RIT情況,是不可能長時間停止集群,影響應用程序讀寫的。那么,有沒有臨時的解決辦法,先臨時解決當前的MERGING_NEW這種永久RIT,之后在進行HBase版本升級操作。

辦法是有的,在分析了MERGE合並的流程之后,發現HBase在執行Region合並時,會先生成一個初始狀態的MERGING_NEW。整個Region合並流程如下:

從流程圖中可以看到,MERGING_NEW是一個初始化狀態,在Master的內存中,而處於Backup狀態的Master內存中是沒有這個新Region的MERGING_NEW狀態的,那么可以通過對HBase的Master進行一個主備切換,來臨時消除這個永久RIT狀態。而HBase是一個高可用的集群,進行主備切換時對用戶應用來說是無感操作。因此,面對MERGING_NEW狀態的永久RIT可以使用對HBase進行主備切換的方式來做一個臨時處理方案。之后,我們在對HBase進行修復BUG,打Patch進行版本升級。

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM