大廠面試Kafka，一定會問到的冪等性

本文轉載自查看原文 2019-08-10 02:35 4023 java面試/ kafka/ 消息隊列/ 大數據面試/ 分布式

01 冪等性如此重要

Kafka作為分布式MQ，大量用於分布式系統中，如消息推送系統、業務平台系統（如結算平台），就拿結算來說，業務方作為上游把數據打到結算平台，如果一份數據被計算、處理了多次，產生的后果將會特別嚴重。

02 哪些因素影響冪等性

使用Kafka時,需要保證exactly-once語義。要知道在分布式系統中，出現網絡分區是不可避免的，如果kafka broker 在回復ack時，出現網絡故障或者是full gc導致ack timeout，producer將會重發，如何保證producer重試時不造成重復or亂序？又或者producer 掛了，新的producer並沒有old producer的狀態數據，這個時候如何保證冪等？即使Kafka 發送消息滿足了冪等，consumer拉取到消息后，把消息交給線程池workers，workers線程對message的處理可能包含異步操作，又會出現以下情況：

先commit，再執行業務邏輯：提交成功，處理失敗。造成丟失
先執行業務邏輯，再commit：提交失敗，執行成功。造成重復執行
先執行業務邏輯，再commit：提交成功，異步執行fail。造成丟失

本文將針對以上問題作出討論

03 Kafka保證發送冪等性

針對以上的問題，kafka在0.11版新增了冪等型producer和事務型producer。前者解決了單會話冪等性等問題，后者解決了多會話冪等性。

單會話冪等性

為解決producer重試引起的亂序和重復。Kafka增加了pid和seq。Producer中每個RecordBatch都有一個單調遞增的seq; Broker上每個tp也會維護pid-seq的映射，並且每Commit都會更新lastSeq。這樣recordBatch到來時，broker會先檢查RecordBatch再保存數據：如果batch中 baseSeq(第一條消息的seq)比Broker維護的序號(lastSeq)大1，則保存數據，否則不保存(inSequence方法)。

ProducerStateManager.scala

private def maybeValidateAppend(producerEpoch: Short, firstSeq: Int, offset: Long): Unit = {
    validationType match {
      case ValidationType.None =>

      case ValidationType.EpochOnly =>
        checkProducerEpoch(producerEpoch, offset)

      case ValidationType.Full =>
        checkProducerEpoch(producerEpoch, offset)
        checkSequence(producerEpoch, firstSeq, offset)
    }
}

private def checkSequence(producerEpoch: Short, appendFirstSeq: Int, offset: Long): Unit = {
  if (producerEpoch != updatedEntry.producerEpoch) {
    if (appendFirstSeq != 0) {
      if (updatedEntry.producerEpoch != RecordBatch.NO_PRODUCER_EPOCH) {
        throw new OutOfOrderSequenceException(s"Invalid sequence number for new epoch at offset $offset in " +
          s"partition $topicPartition: $producerEpoch (request epoch), $appendFirstSeq (seq. number)")
      } else {
        throw new UnknownProducerIdException(s"Found no record of producerId=$producerId on the broker at offset $offset" +
          s"in partition $topicPartition. It is possible that the last message with the producerId=$producerId has " +
          "been removed due to hitting the retention limit.")
      }
    }
  } else {
    val currentLastSeq = if (!updatedEntry.isEmpty)
      updatedEntry.lastSeq
    else if (producerEpoch == currentEntry.producerEpoch)
      currentEntry.lastSeq
    else
      RecordBatch.NO_SEQUENCE

    if (currentLastSeq == RecordBatch.NO_SEQUENCE && appendFirstSeq != 0) {
      // We have a matching epoch, but we do not know the next sequence number. This case can happen if
      // only a transaction marker is left in the log for this producer. We treat this as an unknown
      // producer id error, so that the producer can check the log start offset for truncation and reset
      // the sequence number. Note that this check follows the fencing check, so the marker still fences
      // old producers even if it cannot determine our next expected sequence number.
      throw new UnknownProducerIdException(s"Local producer state matches expected epoch $producerEpoch " +
        s"for producerId=$producerId at offset $offset in partition $topicPartition, but the next expected " +
        "sequence number is not known.")
    } else if (!inSequence(currentLastSeq, appendFirstSeq)) {
      throw new OutOfOrderSequenceException(s"Out of order sequence number for producerId $producerId at " +
        s"offset $offset in partition $topicPartition: $appendFirstSeq (incoming seq. number), " +
        s"$currentLastSeq (current end sequence number)")
    }
  }
}

  private def inSequence(lastSeq: Int, nextSeq: Int): Boolean = {
    nextSeq == lastSeq + 1L || (nextSeq == 0 && lastSeq == Int.MaxValue)
  }

引申：Kafka producer 對有序性做了哪些處理

假設我們有5個請求，batch1、batch2、batch3、batch4、batch5；如果只有batch2 ack failed，3、4、5都保存了，那2將會隨下次batch重發而造成重復。我們可以設置max.in.flight.requests.per.connection=1（客戶端在單個連接上能夠發送的未響應請求的個數）來解決亂序，但降低了系統吞吐。

新版本kafka設置enable.idempotence=true后能夠動態調整max-in-flight-request。正常情況下max.in.flight.requests.per.connection大於1。當重試請求到來且時，batch 會根據 seq重新添加到隊列的合適位置，並把max.in.flight.requests.per.connection設為1，這樣它前面的 batch序號都比它小，只有前面的都發完了，它才能發。

    private void insertInSequenceOrder(Deque<ProducerBatch> deque, ProducerBatch batch) {
        // When we are requeing and have enabled idempotence, the reenqueued batch must always have a sequence.
        if (batch.baseSequence() == RecordBatch.NO_SEQUENCE)
            throw new IllegalStateException("Trying to re-enqueue a batch which doesn't have a sequence even " +
                "though idempotency is enabled.");

        if (transactionManager.nextBatchBySequence(batch.topicPartition) == null)
            throw new IllegalStateException("We are re-enqueueing a batch which is not tracked as part of the in flight " +
                "requests. batch.topicPartition: " + batch.topicPartition + "; batch.baseSequence: " + batch.baseSequence());

        ProducerBatch firstBatchInQueue = deque.peekFirst();
        if (firstBatchInQueue != null && firstBatchInQueue.hasSequence() && firstBatchInQueue.baseSequence() < batch.baseSequence()) {

            List<ProducerBatch> orderedBatches = new ArrayList<>();
            while (deque.peekFirst() != null && deque.peekFirst().hasSequence() && deque.peekFirst().baseSequence() < batch.baseSequence())
                orderedBatches.add(deque.pollFirst());

            log.debug("Reordered incoming batch with sequence {} for partition {}. It was placed in the queue at " +
                "position {}", batch.baseSequence(), batch.topicPartition, orderedBatches.size());
            // Either we have reached a point where there are batches without a sequence (ie. never been drained
            // and are hence in order by default), or the batch at the front of the queue has a sequence greater
            // than the incoming batch. This is the right place to add the incoming batch.
            deque.addFirst(batch);

            // Now we have to re insert the previously queued batches in the right order.
            for (int i = orderedBatches.size() - 1; i >= 0; --i) {
                deque.addFirst(orderedBatches.get(i));
            }

            // At this point, the incoming batch has been queued in the correct place according to its sequence.
        } else {
            deque.addFirst(batch);
        }
    }

多會話冪等性

在單會話冪等性中介紹，kafka通過引入pid和seq來實現單會話冪等性，但正是引入了pid，當應用重啟時，新的producer並沒有old producer的狀態數據。可能重復保存。

Kafka事務通過隔離機制來實現多會話冪等性

kafka事務引入了transactionId 和Epoch，設置transactional.id后，一個transactionId只對應一個pid, 且Server 端會記錄最新的 Epoch 值。這樣有新的producer初始化時，會向TransactionCoordinator發送InitPIDRequest請求， TransactionCoordinator 已經有了這個 transactionId對應的 meta，會返回之前分配的 PID，並把 Epoch 自增 1 返回，這樣當old producer恢復過來請求操作時，將被認為是無效producer拋出異常。如果沒有開啟事務，TransactionCoordinator會為新的producer返回new pid，這樣就起不到隔離效果，因此無法實現多會話冪等。

private def maybeValidateAppend(producerEpoch: Short, firstSeq: Int, offset: Long): Unit = {
    validationType match {
      case ValidationType.None =>

      case ValidationType.EpochOnly =>
        checkProducerEpoch(producerEpoch, offset)

      case ValidationType.Full => //開始事務，執行這個判斷
        checkProducerEpoch(producerEpoch, offset)
        checkSequence(producerEpoch, firstSeq, offset)
    }
}

private def checkProducerEpoch(producerEpoch: Short, offset: Long): Unit = {
    if (producerEpoch < updatedEntry.producerEpoch) {
      throw new ProducerFencedException(s"Producer's epoch at offset $offset is no longer valid in " +
        s"partition $topicPartition: $producerEpoch (request epoch), ${updatedEntry.producerEpoch} (current epoch)")
    }
  }

04 Consumer端冪等性

如上所述，consumer拉取到消息后，把消息交給線程池workers，workers對message的handle可能包含異步操作，又會出現以下情況：

先commit，再執行業務邏輯：提交成功，處理失敗。造成丟失
先執行業務邏輯，再commit：提交失敗，執行成功。造成重復執行
先執行業務邏輯，再commit：提交成功，異步執行fail。造成丟失

對此我們常用的方法時，works取到消息后先執行如下code：

if(cache.contain(msgId)){
  // cache中包含msgId，已經處理過
		continue;
}else {
  lock.lock();
  cache.put(msgId,timeout);
  commitSync();
  lock.unLock();
}
// 后續完成所有操作后，刪除cache中的msgId，只要msgId存在cache中，就認為已經處理過。Note：需要給cache設置有消息

如果喜歡我的文章，請長按二維碼，關注靳剛同學

您的轉發是對我最大的支持，謝謝！

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 （轉）大廠常問到的14個Java面試題 [算法]還在用遞歸實現斐波那契數列，面試官一定會鄙視你到死用上GIT你一定會愛上他面試大廠，90%會被問到的Java面試題（附答案） Kafka筆記—可靠性、冪等性和事務冪等性 android 人臉檢測你一定會遇到的坑 ios block一定會犯的幾個錯誤關於冪等性以及怎么實現冪等性面試被問到IIC，總結。