Hbase合並Region的過程中出現永久RIT的解決

本文轉載自查看原文 2019-03-30 15:47 530 Hbase

在合並Region的過程中出現永久RIT怎么辦？筆者在生產環境中就遇到過這種情況，在批量合並Region的過程中，出現了永久MERGING_NEW的情況，雖然這種情況不會影響現有集群的正常的服務能力，但是如果集群有某個節點發生重啟，那么可能此時該RegionServer上的Region是沒法均衡的。因為在RIT狀態時，HBase是不會執行Region負載均衡的，即使手動執行balancer命令也是無效的。

如果不解決這種RIT情況，那么后續有HBase節點相繼重啟，這樣會導致整個集群的Region驗證不均衡，這是很致命的，對集群的性能將會影響很大。經過查詢HBase JIRA單，發現這種MERGING_NEW永久RIT的情況是觸發了HBASE-17682的BUG，需要打上該Patch來修復這個BUG，其實就是HBase源代碼在判斷業務邏輯時，沒有對MERGING_NEW這種狀態進行判斷，直接進入到else流程中了。源代碼如下：

for (RegionState state : regionsInTransition.values()) {
        HRegionInfo hri = state.getRegion();
        if (assignedRegions.contains(hri)) {
          // Region is open on this region server, but in transition.
          // This region must be moving away from this server, or splitting/merging.
          // SSH will handle it, either skip assigning, or re-assign.
          LOG.info("Transitioning " + state + " will be handled by ServerCrashProcedure for " + sn);
        } else if (sn.equals(state.getServerName())) {
          // Region is in transition on this region server, and this
          // region is not open on this server. So the region must be
          // moving to this server from another one (i.e. opening or
          // pending open on this server, was open on another one.
          // Offline state is also kind of pending open if the region is in
          // transition. The region could be in failed_close state too if we have
          // tried several times to open it while this region server is not reachable)
          if (state.isPendingOpenOrOpening() || state.isFailedClose() || state.isOffline()) {
            LOG.info("Found region in " + state +
              " to be reassigned by ServerCrashProcedure for " + sn);
            rits.add(hri);
          } else if(state.isSplittingNew()) {
            regionsToCleanIfNoMetaEntry.add(state.getRegion());
          } else {
            LOG.warn("THIS SHOULD NOT HAPPEN: unexpected " + state);
          }
        }
      }

修復之后代碼：

for (RegionState state : regionsInTransition.values()) {
        HRegionInfo hri = state.getRegion();
        if (assignedRegions.contains(hri)) {
          // Region is open on this region server, but in transition.
          // This region must be moving away from this server, or splitting/merging.
          // SSH will handle it, either skip assigning, or re-assign.
          LOG.info("Transitioning " + state + " will be handled by ServerCrashProcedure for " + sn);
        } else if (sn.equals(state.getServerName())) {
          // Region is in transition on this region server, and this
          // region is not open on this server. So the region must be
          // moving to this server from another one (i.e. opening or
          // pending open on this server, was open on another one.
          // Offline state is also kind of pending open if the region is in
          // transition. The region could be in failed_close state too if we have
          // tried several times to open it while this region server is not reachable)
          if (state.isPendingOpenOrOpening() || state.isFailedClose() || state.isOffline()) {
            LOG.info("Found region in " + state +
              " to be reassigned by ServerCrashProcedure for " + sn);
            rits.add(hri);
          } else if(state.isSplittingNew()) {
            regionsToCleanIfNoMetaEntry.add(state.getRegion());
          } else if (isOneOfStates(state, State.SPLITTING_NEW, State.MERGING_NEW)) {
             regionsToCleanIfNoMetaEntry.add(state.getRegion());
           }else {
            LOG.warn("THIS SHOULD NOT HAPPEN: unexpected " + state);
          }
        }
      }

但是，這里有一個問題，目前該JIRA單只是說了需要去修復BUG，打Patch。但是，實際生產情況下，面對這種RIT情況，是不可能長時間停止集群，影響應用程序讀寫的。那么，有沒有臨時的解決辦法，先臨時解決當前的MERGING_NEW這種永久RIT，之后在進行HBase版本升級操作。

辦法是有的，在分析了MERGE合並的流程之后，發現HBase在執行Region合並時，會先生成一個初始狀態的MERGING_NEW。整個Region合並流程如下：

從流程圖中可以看到，MERGING_NEW是一個初始化狀態，在Master的內存中，而處於Backup狀態的Master內存中是沒有這個新Region的MERGING_NEW狀態的，那么可以通過對HBase的Master進行一個主備切換，來臨時消除這個永久RIT狀態。而HBase是一個高可用的集群，進行主備切換時對用戶應用來說是無感操作。因此，面對MERGING_NEW狀態的永久RIT可以使用對HBase進行主備切換的方式來做一個臨時處理方案。之后，我們在對HBase進行修復BUG，打Patch進行版本升級。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 HBase 永久RIT(Region-In-Transition)問題 hbase hbck及region RIT處理關於hexo init過程中出現fail to install dependencies的解決安裝MyEclipse過程中出現的問題及解決方法記一次hbase陷入永久RIT FreeRADIUS配置過程中出現的問題 Hbase Region合並 HBase Region合並分析解決安裝Macromedia FlashPaper2過程中出現的錯誤的解決辦法！解決svn遷移過程中出現：SVN Error: is not the same repository as的問題