一個topic有多個隊列,分散在不同的broker。producer在發送消息的時候,需要選擇一個隊列
producer發送消息全局時序圖:
隊列選擇與容錯策略結論:
- 在不開啟容錯的情況下,輪詢隊列進行發送,如果失敗了,重試的時候過濾失敗的Broker
- 如果開啟了容錯策略,會通過RocketMQ的預測機制來預測一個Broker是否可用
- 如果上次失敗的Broker可用那么還是會選擇該Broker的隊列
- 如果上述情況失敗,則隨機選擇一個進行發送
- 在發送消息的時候會記錄一下調用的時間與是否報錯,根據該時間去預測broker的可用時間
String lastBrokerName = null == mq ? null : mq.getBrokerName(); MessageQueue tmpmq = this.selectOneMessageQueue(lastBrokerName); if (tmpmq != null) { mq = tmpmq; //....
如上,如果發送失敗了,重試的時候lastBrokerName將不為空,進入到selectOneMessageQueue方法
public MessageQueue selectOneMessageQueue(final TopicPublishInfo tpInfo, final String lastBrokerName) { if (this.sendLatencyFaultEnable) { try { int index = tpInfo.getSendWhichQueue().getAndIncrement(); for (int i = 0; i < tpInfo.getMessageQueueList().size(); i++) { int pos = Math.abs(index++) % tpInfo.getMessageQueueList().size(); if (pos < 0) pos = 0; MessageQueue mq = tpInfo.getMessageQueueList().get(pos); if (latencyFaultTolerance.isAvailable(mq.getBrokerName())) { if (null == lastBrokerName || mq.getBrokerName().equals(lastBrokerName)) return mq; } } final String notBestBroker = latencyFaultTolerance.pickOneAtLeast(); int writeQueueNums = tpInfo.getQueueIdByBroker(notBestBroker); if (writeQueueNums > 0) { final MessageQueue mq = tpInfo.selectOneMessageQueue(); if (notBestBroker != null) { mq.setBrokerName(notBestBroker); mq.setQueueId(tpInfo.getSendWhichQueue().getAndIncrement() % writeQueueNums); } return mq; } else { latencyFaultTolerance.remove(notBestBroker); } } catch (Exception e) { } return tpInfo.selectOneMessageQueue(); } return tpInfo.selectOneMessageQueue(lastBrokerName); }
首先判斷sendLatencyFaultEnable是否為true,來走不同的流程,默認為false
public MessageQueue selectOneMessageQueue(final String lastBrokerName) { // 如果為空,即第一次發生,未發生錯誤重試 // 直接輪詢隊列進行發送 if (lastBrokerName == null) { return selectOneMessageQueue(); } else { // 與selectOneMessageQueue類似,過濾的lastBrokerName的隊列 int index = this.sendWhichQueue.getAndIncrement(); for (int i = 0; i < this.messageQueueList.size(); i++) { int pos = Math.abs(index++) % this.messageQueueList.size(); if (pos < 0) pos = 0; MessageQueue mq = this.messageQueueList.get(pos); if (!mq.getBrokerName().equals(lastBrokerName)) { return mq; } } return selectOneMessageQueue(); } } public MessageQueue selectOneMessageQueue() { int index = this.sendWhichQueue.getAndIncrement(); int pos = Math.abs(index) % this.messageQueueList.size(); if (pos < 0) pos = 0; return this.messageQueueList.get(pos); }
總的來說都是輪詢,只是一個有過濾失敗的lastBrokerName,一個沒有
sendLatencyFaultEnable開啟:
- 1
int index = tpInfo.getSendWhichQueue().getAndIncrement(); for (int i = 0; i < tpInfo.getMessageQueueList().size(); i++) { int pos = Math.abs(index++) % tpInfo.getMessageQueueList().size(); if (pos < 0) pos = 0; MessageQueue mq = tpInfo.getMessageQueueList().get(pos); // 判斷該Broker是否可用,不可用則進行第二部分的邏輯 if (latencyFaultTolerance.isAvailable(mq.getBrokerName())) { // 非失敗重試,直接返回到的隊列 // 失敗重試的情況,如果和選擇的隊列是上次重試是一樣的,則返回 if (null == lastBrokerName || mq.getBrokerName().equals(lastBrokerName)) return mq; } }
- 2
//從容錯信息中取一個Broker final String notBestBroker = latencyFaultTolerance.pickOneAtLeast(); int writeQueueNums = tpInfo.getQueueIdByBroker(notBestBroker); if (writeQueueNums > 0) {// 有可寫隊列 // 往后取一個 final MessageQueue mq = tpInfo.selectOneMessageQueue(); if (notBestBroker != null) { // 將取到的隊列信息設置為取到的broker mq.setBrokerName(notBestBroker); // 隊列重置 mq.setQueueId(tpInfo.getSendWhichQueue().getAndIncrement() % writeQueueNums); } return mq; } else { latencyFaultTolerance.remove(notBestBroker); }
第一部分主要是選擇一個可用的並且brokerName為lastBrokerName的隊列,這里其實有點疑問,是失敗的時候lastBrokerName才不為空,這時候為什么還會選擇可用且brokerName為lastBrokerName的隊列?這個猜測可能是覺得當前brokerName的上一次發送的隊列失敗了,可能下個隊列會成功,加上當前延遲容錯機制下的確保可用情況下,選擇另外的隊列。
假設沒有找到對應的隊列,只有一種情況
- 延遲容錯機制覺得lastBrokerName這個broker不可用
那么將會進入第二部分代碼,首先調用pickOneAtLeast獲取一個broker,再調用selectOneMessageQueue獲取一個隊列,如果pickOneAtLeast取到的不為空,那么將隊列信息替換
容錯策略
如何判斷broker是否可用
public boolean isAvailable(final String name) { final FaultItem faultItem = this.faultItemTable.get(name); if (faultItem != null) { return faultItem.isAvailable(); } return true; }
分兩部分
- faultItemTable放進去的時機
- FaultItem的isAvailable實現
isAvailable實現
public boolean isAvailable() { return (System.currentTimeMillis() - startTimestamp) >= 0; }
判斷當前時間是否大於startTimestamp,為什么只是判斷一個時間就可以知道Broker是否可用?
faultItemTable
通過查找faultItemTable使用的地方,找到updateFaultItem方法
public void updateFaultItem(final String name/*brokerName*/, final long currentLatency, final long notAvailableDuration) { FaultItem old = this.faultItemTable.get(name); if (null == old) { final FaultItem faultItem = new FaultItem(name); faultItem.setCurrentLatency(currentLatency); faultItem.setStartTimestamp(System.currentTimeMillis() + notAvailableDuration); old = this.faultItemTable.putIfAbsent(name, faultItem); if (old != null) { old.setCurrentLatency(currentLatency); old.setStartTimestamp(System.currentTimeMillis() + notAvailableDuration); } } else { old.setCurrentLatency(currentLatency); old.setStartTimestamp(System.currentTimeMillis() + notAvailableDuration); } }
通過brokerName找到對應的FaultItem,startTimestamp=當前時間+notAvailableDuration,找到updateFaultItem使用的地方,看看notAvailableDuration是什么,找到MQFaultStrategy.updateFaultItem(String, long, boolean)方法
public void updateFaultItem(final String brokerName, final long currentLatency, boolean isolation) { if (this.sendLatencyFaultEnable) {// 開啟延遲容錯功能 long duration = computeNotAvailableDuration(isolation ? 30000 : currentLatency); this.latencyFaultTolerance.updateFaultItem(brokerName, currentLatency, duration); } } private long computeNotAvailableDuration(final long currentLatency) { for (int i = latencyMax.length - 1; i >= 0; i--) { if (currentLatency >= latencyMax[i]) return this.notAvailableDuration[i]; } return 0; }
MQFaultStrategy.java部分屬性
public class MQFaultStrategy { private final static Logger log = ClientLogger.getLog(); /** * 延遲故障容錯,維護每個Broker的發送消息的延遲 * key:brokerName */ private final LatencyFaultTolerance<String> latencyFaultTolerance = new LatencyFaultToleranceImpl(); /** * 發送消息延遲容錯開關 */ private boolean sendLatencyFaultEnable = false; /** * 延遲級別數組 */ private long[] latencyMax = {50L, 100L, 550L, 1000L, 2000L, 3000L, 15000L}; /** * 不可用時長數組 */ private long[] notAvailableDuration = {0L, 0L, 30000L, 60000L, 120000L, 180000L, 600000L}; ..... }
notAvailableDuration為notAvailableDuration數組某個位置的值,latencyMax和notAvailableDuration數組的值分別如下
latencyMax | notAvailableDuration |
---|---|
50L | 0L |
100L | 0L |
550L | 30000L |
1000L | 60000L |
2000L | 120000L |
3000L | 180000L |
15000L | 600000L |
即
- currentLatency如果大於等於50小於100,則notAvailableDuration為0
- currentLatency如果大於等於100小於550,則notAvailableDuration為0
- currentLatency如果大於等於550小於1000,則notAvailableDuration為300000
- …以此類推
假設isolation傳入true,那么notAvailableDuration將傳入600000。
結合isAvailable方法,大概流程如下,RocketMQ為每個Broker預測了個可用時間(當前時間+notAvailableDuration),當當前時間大於該時間,才代表Broker可用,而notAvailableDuration有6個級別和latencyMax的區間一一對應,根據傳入的currentLatency去預測該Broker在什么時候可用
那么看下updateFaultItem使用的地方,看看currentLatency傳入的是什么
// 1. try { beginTimestampPrev = System.currentTimeMillis(); sendResult = this.sendKernelImpl(msg, mq, communicationMode, sendCallback, topicPublishInfo, timeout); endTimestamp = System.currentTimeMillis(); this.updateFaultItem(mq.getBrokerName(), endTimestamp - beginTimestampPrev, false); // 2. } catch (xxException e) { endTimestamp = System.currentTimeMillis(); this.updateFaultItem(mq.getBrokerName(), endTimestamp - beginTimestampPrev, true); }
currentLatency為發送消息的執行時間,根據執行時間來看落入哪個區間,在0~100的時間內notAvailableDuration都是0,都是可用的,大於該值后,可用的時間就會開始變大了,而在報錯的時候isolation參數為true,那么該broker在600000毫秒后才可用
pickOneAtLeast
當真的出現600000毫秒后才可用的情況,在selectOneMessageQueue方法的第一部分代碼就走不下去了,只能走到第二部分代碼,先調用pickOneAtLeast方法獲取一個broker
public String pickOneAtLeast() { final Enumeration<FaultItem> elements = this.faultItemTable.elements(); List<FaultItem> tmpList = new LinkedList<FaultItem>(); // 將faultItemTable里的元素全放到list中 while (elements.hasMoreElements()) { final FaultItem faultItem = elements.nextElement(); tmpList.add(faultItem); } if (!tmpList.isEmpty()) { // 先打亂再排序 Collections.shuffle(tmpList); Collections.sort(tmpList); final int half = tmpList.size() / 2; if (half <= 0) {// 只有一個元素的情況 return tmpList.get(0).getName(); } else {// 根據half取余 final int i = this.whichItemWorst.getAndIncrement() % half; return tmpList.get(i).getName(); } } return null; }