webrtc源碼分析(6)- jitter delay計算詳解

本文轉載自查看原文 2021-07-12 10:26 552 webrtc/ 全部

1. 前言

本文主要介紹webrtc jitter buffer中的對於視頻幀抖動的計算，關於jitter buffer如何處理亂序組幀的可以參考WebRTC視頻JitterBuffer詳解，關於處理的抖動后，如何保證視頻和音頻的同步的可以參考WebRTC音視頻同步詳解

webrtc版本:M91

2. 正文

2.1 jitter buffer的思想

視頻幀從發送端發出后到接收端會經歷如下過程:

時刻線: ---------收到包----------開始解碼---------完成解碼/開始渲染----------完成渲染
耗時:  *網絡傳輸延時*|****延時*****|****解碼耗時*********|********渲染耗時*****|

其中網絡耗時，解碼耗時間和渲染耗時都一定會有，而延時則是為了平滑播放而人為增添的一個耗時；

什么是平滑播放？ webrtc的平滑播放是指接收端的幀能夠趨向於原始采集源的幀間隔進行播放，由於網絡抖動的存在，導致每個幀來的或早或慢，早沒問題，但是慢就會導致前面的幀要播放的長而當前幀播放的晚，所以收到一幀后不立刻解碼，而是添加一個延時(wait_ms)后再開始解碼，這樣就能確保下一個幀來的慢也能到；

如下圖所示，有三個幀A，B，C，接收到A幀之后不馬上進行解碼渲染，而是做了一個延時，如此一來，哪怕B幀會因為抖動的關系，導致到達時刻是一個區間，也能保證在A的播放時刻內到達

這個添加的延時的大小非常重要，大了就導致端到端的延遲過高，小了就無法起到有效緩沖的作用，而幀到達的抖動本身是由於傳輸時間的抖動導致的，所以最合適精確的值是幀傳輸時間的抖動，這個傳輸時間的抖動會因為幀的大小以及網絡情況的變化而變化，所以要根據這些因素進行調整，簡單的思路是對於一段時間內的幀傳輸時間抖動做加權平均，但這不夠精細；webrtc的思路是這樣的，統計一段時間內的最大幀大小(MaxFrameSize)和平均幀大小(AvgFrameSize)，計算當前的網絡傳輸速率(C)，那就可以通過以下公式得到最保險的抖動大小:

\[jitter = \frac{MaxFrameSize - AvgFrameSize}{C} \]

由於觀測過程會存在誤差，導致幀間抖動以及網絡傳輸速率不准，引入了卡爾曼濾波去修正以獲得一個更准確的值

2.2 卡爾曼濾波

jitter的計算思想如2.1所述是很簡單的，卡爾曼濾波更多的是起一個修正的作用，但在實際代碼中涉及的比較廣，所以要看懂jitter的計算，卡爾曼濾波是繞不過的點，卡爾曼濾波的思想是很朴素的，就是基於貝葉斯，通過找到一個與A相關聯的系統B，通過觀測系統B的狀態塌縮從而得到A的狀態塌縮，但不是三言兩語能解釋清楚的，要完全理解，請參考這篇筆記--從貝葉斯到卡爾曼和其中的課程視頻，只有理解了其思想和本質，到了理解jitter的算法部分，只要理解了其狀態方程和觀測方程，剩下的就是很簡單的套公式了。

對於如下預測方程和觀測方程

\[預測方程: X_{k} = F(X_{k-1}) + Q_{K} \\ 觀測方程: Y_k = H(X_{K}) + R_{k} \\ Q_{K}為預測誤差， Q_{K}服從正態分布N(0 ,Q)\\ R_{k}為觀測誤差， R_{k}服從正態分布N(0 ,R) \]

卡爾曼濾波公式如下：

\[先驗期望: \vec{u_{k}^{-}} = F \cdot \vec{u_{k-1}^{+}} \hspace{4cm} (2.2.1)\\ 先驗方差: {\Sigma_{k}^{-}}=F {\Sigma_{k-1}^{+}}F^T+Q \hspace{2.5cm} (2.2.2)\\ 卡爾曼增益: k=\frac{H{\Sigma_{k}^{-}}}{H\Sigma_{k}^{-}H^T + R} \hspace{3cm} (2.2.3)\\ 后驗期望: \vec{u_{k}^{+}} = \vec{u_{k}^{-}} + k*(\vec{y_{k}} - H\vec{u_{k}^{-}}) \hspace{1.1cm} (2.2.4)\\ 后驗方差: {\Sigma_{k}^{+}}= {(I-kH)\Sigma_{k}^{-}} \hspace{3cm} (2.2.5)\\ \]

2.3 jitter濾波算法分析

2.3.1 算法闡述

本節必須在理解透了2.2卡爾曼濾波的思想和本質，才能繼續；

首先需要對系統建出兩個概率模型-狀態方程和觀測方程;

幀的傳輸延時抖動和三個因素有關：幀大小，網絡傳輸速率，傳輸通道情況，建模如下:

狀態方程:

\[theta(i) = theta(i-1) + u(i-1) \hspace{3cm} (2.3.1) \]

theta(i): 一個二維矩陣$\left[\begin{array}{c} \frac{1}{C(i)} \\m(i) \end{array} \right]$，$C(i)$是網絡傳輸速率，$m(i)$是網絡排隊延遲，是一個與網絡傳輸速率，傳輸途徑，當前發送碼率有關的值，相當於傳輸通道情況變化導致的延遲，這兩個變化都很混沌和很多已知未知的因素有關，所以用了直接加噪聲這種最簡單暴力的建模

u(i-1): 預估誤差，高斯噪聲矩陣， $P(u)\sim (0, Q)$，此處Q是個矩陣

即: 認為theta(i)是與theta(i -1)呈斜率為1的線性關系，這種建模誤差由u(i -1) 去彌補

觀測方程:

\[d(i) = H * theta(i) + v(i) \hspace{3cm} (2.3.2) \]

d(i): 幀間傳輸延遲

H: 觀測系數變換矩陣$\left[\begin{array}{c} dL(i) & 1 \end{array} \right]$，$dL(i)$是當上一幀和當前幀的大小差

v(i): 觀測誤差，$P(v)\sim (0, R)$, 此處$R$是一個數

其中，將公式2.3.1代入2.3.2中，得以下方程, 看起來更直觀

\[\begin{align*} d(i) &= \left[\begin{array}{} dL(i) & 1 \end{array} \right] * \left[\begin{array}{} \frac{1}{C(i)} \\ m(i)\end{array} \right] + v(i) \\ &= \frac{dL(i)}{C(i)} + m(i) + v(i) \hspace{3cm} (2.3.3) \end{align*} \]

即: 幀間傳輸延遲的觀測值等於幀變化大小/傳輸速率 + 網絡變化延遲 + 觀測噪聲

有了兩個概率模型之后，參照2.2中的卡爾曼濾波公式就可以得到以下方程了:

先驗期望: $theta\_hat^{-}(i) = theta\_hat(i-1) \hspace{4cm} (2.3.4)$

先驗方差: $theta_cov^{-}(i) = theta_cov(i-1) + Q(i) \hspace{2.5cm} (2.3.5) $

卡爾曼增益: $k = \frac{H * theta\_cov^{-}(i)}{H * theta\_cov^{-}(i) * H^T + R} \hspace{6.0cm} (2.3.6)$

后驗期望: $theta\_hat(i) = theta\_hat^{-}(i) + k * (d(i) - H*theta\_hat^{-}(i)) \hspace{2cm} (2.3.7)$

后驗方差: $theta_cov(i) = (I - kH) * theta_cov^{-}(i) \hspace{2.8cm} (2.3.8) $

這里為了貼近google相關文檔, 變量起的名字有點四不像, 后綴hat表示期望，cov表示協方差，對於先驗的期望和方差，右上角添加了負號

2.3.2 源碼分析

2.3.2.1 計算幀間傳輸抖動觀測值d(i)

jitter這一塊的計算在webrtc里很集中，在解碼線程調用GetNextFrame( )時，會從FramesToDecode隊列取出最新要解碼的幀，這時就根據該幀的大小和網絡傳輸時間計算抖動d(i)

EncodedFrame* FrameBuffer::GetNextFrame() {
......

  if (!superframe_delayed_by_retransmission) {
    // 正常情況下，非重傳
    int64_t frame_delay;
	// 計算當前幀和上一幀在傳輸時間上的抖動
    if (inter_frame_delay_.CalculateDelay(first_frame->Timestamp(),
                                          &frame_delay, receive_time_ms)) {
      // 通過當前幀大小和傳輸時間上的抖動更新預估器
      jitter_estimator_.UpdateEstimate(frame_delay, superframe_size);
    }
	
    float rtt_mult = protection_mode_ == kProtectionNackFEC ? 0.0 : 1.0;
    absl::optional<float> rtt_mult_add_cap_ms = absl::nullopt;
    if (rtt_mult_settings_.has_value()) {
      rtt_mult = rtt_mult_settings_->rtt_mult_setting;
      rtt_mult_add_cap_ms = rtt_mult_settings_->rtt_mult_add_cap_ms;
    }
    // 獲取當前幀的用於平滑網絡延遲，並設置到延遲器中用於后續解碼播放
    timing_->SetJitterDelay(
        jitter_estimator_.GetJitterEstimate(rtt_mult, rtt_mult_add_cap_ms));
    //獲取之后馬上就會做解碼，now_ms就是decode_time
    timing_->UpdateCurrentDelay(render_time_ms, now_ms);
  } else {
    if (RttMultExperiment::RttMultEnabled() || add_rtt_to_playout_delay_)
      jitter_estimator_.FrameNacked();
  }

......
}

幀間傳輸抖動d(i)的計算很簡單，每一幀都有rtp時間與接收時間，可以通過計算兩幀傳輸時間，然后相減就可以得到傳輸抖動，如下圖所示:

具體的計算過程如下:

/**
 * @description: 計算幀間傳輸延遲
 * @param {timestamp} 到達幀的rtp timestamp
 * @param {delay} 計算出的結果
 * @param {CurrentWallClock} 當前時鍾
 * @return {} 如果有計算出幀間delay，返回true
 */
bool VCMInterFrameDelay::CalculateDelay(uint32_t timestamp,
                                        int64_t* delay,
                                        int64_t currentWallClock) {
  if (_prevWallClock == 0) {
    // First set of data, initialization, wait for next frame.
    _prevWallClock = currentWallClock;
    _prevTimestamp = timestamp;
    *delay = 0;
    return true;
  }

  int32_t prevWrapArounds = _wrapArounds;
  // 檢查是否回環，調整_wrapArounds
  CheckForWrapArounds(timestamp);

  // This will be -1 for backward wrap arounds and +1 for forward wrap arounds.
  // 看是前向回環還是后向回環
  int32_t wrapAroundsSincePrev = _wrapArounds - prevWrapArounds;

  // Account for reordering in jitter variance estimate in the future?
  // Note that this also captures incomplete frames which are grabbed for
  // decoding after a later frame has been complete, i.e. real packet losses.
  // 如果沒有發生時間戳回環而且時間戳小於上一個幀，那么當前幀是一個過時的幀，
  if ((wrapAroundsSincePrev == 0 && timestamp < _prevTimestamp) ||
      wrapAroundsSincePrev < 0) {
    *delay = 0;
    return false;
  }

  // Compute the compensated timestamp difference and convert it to ms and round
  // it to closest integer.


  // divide 90 是因為rtp timestamp中的以90khz為單位, ->1ms采樣90hz
  // 要想將timestamp轉化成ms，就需要除90
  // 根據wrapAroundsSincePrev發生回環數，增大1<<32
  // +0.5 上取整
  _dTS = static_cast<int64_t>(
      (timestamp + wrapAroundsSincePrev * (static_cast<int64_t>(1) << 32) -
       _prevTimestamp) /
          90.0 +
      0.5);
  

  // frameDelay is the difference of dT and dTS -- i.e. the difference of the
  // wall clock time difference and the timestamp difference between two
  // following frames.
  // 這個地方計算兩幀的delay是分開來計算的
  //(到達時間差) dt =  currentWallClock - _prevWallClock
  //(發送時間差) dts = currentTimestamp - prevTimeStamp
  *delay = static_cast<int64_t>(currentWallClock - _prevWallClock - _dTS);

  _prevTimestamp = timestamp;
  _prevWallClock = currentWallClock;

  return true;
}

2.3.2.2 計算deviation

從inter_frame_delay_.CalculateDelay( )計算除這個抖動值之后，就開始通過函數VCMJitterEstimator::UpdateEstimate( )開始准備卡爾曼濾波。

// Updates the estimates with the new measurements.
void VCMJitterEstimator::UpdateEstimate(int64_t frameDelayMS,
                                        uint32_t frameSizeBytes,
                                        bool incompleteFrame /* = false */) {
  if (frameSizeBytes == 0) {
    return;
  }
  int deltaFS = frameSizeBytes - _prevFrameSize;
  // 啟動階段，先收集幾個包，計算一個平均的frameSize
  if (_fsCount < kFsAccuStartupSamples) {
    _fsSum += frameSizeBytes;
    _fsCount++;
  } else if (_fsCount == kFsAccuStartupSamples) {
    // Give the frame size filter.
    _avgFrameSize = static_cast<double>(_fsSum) / static_cast<double>(_fsCount);
    _fsCount++;
  }


  if (!incompleteFrame || frameSizeBytes > _avgFrameSize) {
    // 當frameSize大於_avgFrameSize時，_avgFrameSize平滑趨近於frameSizeBytes
    double avgFrameSize = _phi * _avgFrameSize + (1 - _phi) * frameSizeBytes;
    if (frameSizeBytes < _avgFrameSize + 2 * sqrt(_varFrameSize)) {
      // Only update the average frame size if this sample wasn't a key frame.
      _avgFrameSize = avgFrameSize;
    }
    // Update the variance anyway since we want to capture cases where we only
    // get key frames.
    // 更新framesize方差
    _varFrameSize = VCM_MAX(
        _phi * _varFrameSize + (1 - _phi) * (frameSizeBytes - avgFrameSize) *
                                   (frameSizeBytes - avgFrameSize),
        1.0);
  }

  // Update max frameSize estimate.
  // 在關鍵幀到達后maxFrameSize會突然增大，這里使用了_psi不斷的去收斂_maxFrameSize
  // 到普通P幀大小
  _maxFrameSize =
      VCM_MAX(_psi * _maxFrameSize, static_cast<double>(frameSizeBytes));

  if (_prevFrameSize == 0) {
    _prevFrameSize = frameSizeBytes;
    return;
  }
  _prevFrameSize = frameSizeBytes;

  // Cap frameDelayMS based on the current time deviation noise.
  // A*sqrt(_varNoise)這樣的形式在整個文件中出現數次，A實際上是標准正態分布N~(0, 1)下的
  // 分位點，當擴大sqrt(_varNoise)倍時就是 N~(0, _varNoise)下的分位點
  int64_t max_time_deviation_ms =
      static_cast<int64_t>(time_deviation_upper_bound_ * sqrt(_varNoise) + 0.5);
  // 將frameDelayMs 控制在[-max_time_deviation_ms, max_time_deviation_ms]范圍內
  frameDelayMS = std::max(std::min(frameDelayMS, max_time_deviation_ms),
                          -max_time_deviation_ms);


  // 利用先驗期望計算deviation: d(i) - H * theta_hatˉ(i)
  double deviation = DeviationFromExpectedDelay(frameDelayMS, deltaFS);

  {
    
     RTC_LOG(LS_WARNING)<<"deviation" << deviation
                        <<", _numStdDevDelayOutlier" << _numStdDevDelayOutlier
                        <<", sqrt(_varNoise)"<<sqrt(_varNoise)
                        <<", _numStdDevDelayOutlier * sqrt(_varNoise)" << _numStdDevDelayOutlier * sqrt(_varNoise);
  }


  // Only update the Kalman filter if the sample is not considered an extreme
  // outlier. Even if it is an extreme outlier from a delay point of view, if
  // the frame size also is large the deviation is probably due to an incorrect
  // line slope.
  // 不使用異常點(outlier)更新卡爾曼濾波器
  // 這里異常點的判斷依據是:觀測噪聲值正常會分布於區間[-range, range]
  // range 作為分位點被設置成了: _numStdDevDelayOutlier * sqrt(_varNoise) 
  // _numStdDevDelayOutlier: 標准正態分布中的分位點, 但被設置成了15，這個值怎么會這么大..
  // 但還有一點，當前幀的frame size很大的時候帶來的觀測噪聲很大，有可能是
  // 因為數據量小的時候測得傳輸速率1/C(即_theta[0])不准導致的，那這就不是一個異常點

  if (fabs(deviation) < _numStdDevDelayOutlier * sqrt(_varNoise) ||
      frameSizeBytes >
          _avgFrameSize + _numStdDevFrameSizeOutlier * sqrt(_varFrameSize)) {
    // Update the variance of the deviation from the line given by the Kalman
    // filter.
    // 利用deviation更新觀測噪聲的概率分布
    EstimateRandomJitter(deviation, incompleteFrame);
    // Prevent updating with frames which have been congested by a large frame,
    // and therefore arrives almost at the same time as that frame.
    // This can occur when we receive a large frame (key frame) which has been
    // delayed. The next frame is of normal size (delta frame), and thus deltaFS
    // will be << 0. This removes all frame samples which arrives after a key
    // frame.
    if ((!incompleteFrame || deviation >= 0.0) &&
        static_cast<double>(deltaFS) > -0.25 * _maxFrameSize) {
      // Update the Kalman filter with the new data
      KalmanEstimateChannel(frameDelayMS, deltaFS);
    }
  } else {
    // 出現了異常值，
    // 異常值可能是因為某些意外引起的，不希望引起指數濾波器的太大變化
    // 所以使用異常值邊界_numStdDevDelayOutlier去更新_varNoise
    int nStdDev =
        (deviation >= 0) ? _numStdDevDelayOutlier : -_numStdDevDelayOutlier;
    EstimateRandomJitter(nStdDev * sqrt(_varNoise), incompleteFrame);
  }

  // Post process the total estimated jitter
  if (_startupCount >= kStartupDelaySamples) {
    PostProcessEstimate();
  } else {
    _startupCount++;
  }
}

首先會統計平均幀大小的_avgFrameSize和最大的幀大小_maxFrameSize。

然后開始計算deviation，deviation是公式2.3.7中的$d(i) - H*theta\_hat^{-}(i)$。最開始的時候，我曾把它直接理解成觀測誤差v(i)，因為它很像公式2.3.2的變形，也就是觀測值-H*實際值=觀測噪聲，看起來很合理，但可惜$theta\_hat^{-}(i)$不是一個實際值，它是所有實際值的期望，所以這里就把它稱為deviation，雖然不是觀測誤差，但是deviation和觀測誤差是相近的。

deviation的具體計算過程如下, 有一個隱藏點，由於$theta\_hat^{-}(i) = theta\_hat(i-1)$，所以在計算過程中直接把$theta\_hat(i-1)$拿來用了

/**
 * @description: 利用先驗期望計算deviation，計算公式為: v(i) = d(i) - H *theta_hat(i)
 * v(i): 觀測值和先驗
 * d(i): frameDelayMs 觀測值
 * theta_hat(i): theta是一個二維向量[1/C(i)  m(i)]^T, 
 * 1/C(i)是傳輸速率倒數，m(i-1)是網絡時延，theta_hat(i-1) 表示i-1時刻該向量期望，也就是后驗值
 * H(i): [dL(i)  1]^T, dL(i)為第i幀和第i-1幀的數據量之差
 * @param {frameDelayMS} 此幀距上幀到達的時間間隔
 * @param {deltaFSBytes} 此幀和上幀大小相差
 * @return {觀測噪聲值}
 */
double VCMJitterEstimator::DeviationFromExpectedDelay(
    int64_t frameDelayMS,
    int32_t deltaFSBytes) const {
  // d(i) - H(i) *theta_hatˉ(i) =d(i) - _theta[0] * deltaFSBytes + _theta[1] * 1
  // 注意! 此處使用的_theta[0]和 _theta[1] 實際上是theta_hat(i-1)
  // 但由於theta_hatˉ(i) = theta_hat(i-1)，所以直接拿來用了
  return frameDelayMS - (_theta[0] * deltaFSBytes + _theta[1]);
}

在計算完deviation之后，會根據這個deviation去判斷當前采樣點在概率分布上是否為一個超過一個異常邊界的異常值，如果不是異常值，會使用它對觀測噪聲概率分布進行更新，但是這里設置的，如果是，則直接用異常邊界去更新觀測噪聲概率分布。但這個異常邊界的值被設置為15，這個分位點在標准正態分布中是非常非常大的，根據標准正態分布表，設置到3.5已經是萬分之一的概率了，至今想不明白這個點。不知道是否哪里理解錯了，希望知道有緣人告知一下呀。

一般的卡爾曼濾波的觀測噪聲的概率分布都是固定的，但這里是動態變化的，更新噪聲概率分布的具體過程如下

// Estimates the random jitter by calculating the variance of the sample
// distance from the line given by theta.
/**
 * @description: 根據deviation，更新觀測噪聲均值和噪聲方差
 * @param {d_dT} deviation
 * @param {incompleteFrame} 
 * @return {*}
 */
void VCMJitterEstimator::EstimateRandomJitter(double d_dT,
                                              bool incompleteFrame) {
  uint64_t now = clock_->TimeInMicroseconds();
  if (_lastUpdateT != -1) {
    fps_counter_.AddSample(now - _lastUpdateT);
  }
  _lastUpdateT = now;

  if (_alphaCount == 0) {
    assert(false);
    return;
  }
  double alpha =
      static_cast<double>(_alphaCount - 1) / static_cast<double>(_alphaCount);
  _alphaCount++;
  if (_alphaCount > _alphaCountMax)
    _alphaCount = _alphaCountMax;

  // In order to avoid a low frame rate stream to react slower to changes,
  // scale the alpha weight relative a 30 fps stream.
  // alpha會在下面用於指數移動加權濾波器的權重系數, 用於對觀測噪聲進行更新
  // 此處會通過幀率去修改alpha的值，以30幀為目標，當幀率低於30幀的時候，此時
  // 來幀太慢，導致觀測噪聲更改的太慢，會通過30.0/幀率對alpha進行pow操作，讓alpha
  // 變得更小，從而讓觀測噪聲變化更傾向於當前
  double fps = GetFrameRate();
  if (fps > 0.0) {
    double rate_scale = 30.0 / fps;
    // At startup, there can be a lot of noise in the fps estimate.
    // Interpolate rate_scale linearly, from 1.0 at sample #1, to 30.0 / fps
    // at sample #kStartupDelaySamples.
    if (_alphaCount < kStartupDelaySamples) {
      rate_scale =
          (_alphaCount * rate_scale + (kStartupDelaySamples - _alphaCount)) /
          kStartupDelaySamples;
    }
    // 根據幀率調整alpha 大小使得EWMA中觀測噪聲調整不因為幀率低而慢
    alpha = pow(alpha, rate_scale);
  }

  // 和傳統卡爾曼濾波不同的是，此處觀測方程中噪聲R不是固定的，
  // 而是會根據deviation去跟新
  // 更新的方式是通過指數移動加權濾波(EWMA)對其進行更新，EWMA如下濾波如下
  // EWMA(i) = λY(i)+ ( 1-λ)EWMA(i-1) 
  // EWMA(i): i時刻的估計值
  // Y(i): i時刻的測量值
  // λ: 權重系數，[0,1]，越大說明歷史趨勢越重要，越小說明即時性更重要
  // 在這里alpha就是λ

  // EWMA對噪聲期望更新
  double avgNoise = alpha * _avgNoise + (1 - alpha) * d_dT;
  // EWMA對噪聲協方差更新
  double varNoise =
      alpha * _varNoise + (1 - alpha) * (d_dT - _avgNoise) * (d_dT - _avgNoise);

  if (!incompleteFrame || varNoise > _varNoise) {
    _avgNoise = avgNoise;
    _varNoise = varNoise;
  }
  if (_varNoise < 1.0) {
    // The variance should never be zero, since we might get stuck and consider
    // all samples as outliers.
    _varNoise = 1.0;
  }
}

2.3.2.3 卡爾曼濾波

更新完觀測噪聲的概率分布后，就開始卡爾曼濾波計算后驗期望和方差，卡爾曼濾波的計算過程基本就是公式$2.3.4$到公式$2.3.8$，只要把符號對好，並不復雜

// Updates Kalman estimate of the channel.
// The caller is expected to sanity check the inputs.
void VCMJitterEstimator::KalmanEstimateChannel(int64_t frameDelayMS,
                                               int32_t deltaFSBytes) {
  double Mh[2];         // theta_covˉ * H
  double hMh_sigma;     // H*theta_covˉH' + R
  double kalmanGain[2]; //卡爾曼增益k
  double measureRes;
  double t00, t01;

  // Kalman filtering

  // Prediction，先驗協方差預測
  // theta_covˉ(i) = theta_cov(i-1) + Q(i)
  _thetaCov[0][0] += _Qcov[0][0];
  _thetaCov[0][1] += _Qcov[0][1];
  _thetaCov[1][0] += _Qcov[1][0];
  _thetaCov[1][1] += _Qcov[1][1];

  // 卡爾曼增益k的更新，
  // k = H * theta_covˉ(i) / H * theta_covˉ(i) * H' + R
  Mh[0] = _thetaCov[0][0] * deltaFSBytes + _thetaCov[0][1];
  Mh[1] = _thetaCov[1][0] * deltaFSBytes + _thetaCov[1][1];
  // sigma weights measurements with a small deltaFS as noisy and
  // measurements with large deltaFS as good
  if (_maxFrameSize < 1.0) {
    return;
  }
  //sigma 觀測噪聲方差的計算使用指數濾波的方式從噪聲協方差獲得
  // deltaFs越大，噪聲就越小
  double sigma = (300.0 * exp(-fabs(static_cast<double>(deltaFSBytes)) /
                              (1e0 * _maxFrameSize)) +
                  1) *
                 sqrt(_varNoise);
  if (sigma < 1.0) {
    sigma = 1.0;
  }
  hMh_sigma = deltaFSBytes * Mh[0] + Mh[1] + sigma;
  if ((hMh_sigma < 1e-9 && hMh_sigma >= 0) ||
      (hMh_sigma > -1e-9 && hMh_sigma <= 0)) {
    assert(false);
    return;
  }
  kalmanGain[0] = Mh[0] / hMh_sigma;
  kalmanGain[1] = Mh[1] / hMh_sigma;

  // 后驗期望的計算
  // theta_hat(i) = theta_hatˉ(i) + k*(d(i) - H *theta_hatˉ(i))
  measureRes = frameDelayMS - (deltaFSBytes * _theta[0] + _theta[1]);//d(i) - H *theta_hatˉ(i)
  _theta[0] += kalmanGain[0] * measureRes;
  _theta[1] += kalmanGain[1] * measureRes;

  if (_theta[0] < _thetaLow) {
    _theta[0] = _thetaLow;
  }

  // 后驗方差的計算
  // theta_cov(i) = (I -kH) * theta_covˉ(i)
  t00 = _thetaCov[0][0];
  t01 = _thetaCov[0][1];
  _thetaCov[0][0] = (1 - kalmanGain[0] * deltaFSBytes) * t00 -
                    kalmanGain[0] * _thetaCov[1][0];
  _thetaCov[0][1] = (1 - kalmanGain[0] * deltaFSBytes) * t01 -
                    kalmanGain[0] * _thetaCov[1][1];
  _thetaCov[1][0] = _thetaCov[1][0] * (1 - kalmanGain[1]) -
                    kalmanGain[1] * deltaFSBytes * t00;
  _thetaCov[1][1] = _thetaCov[1][1] * (1 - kalmanGain[1]) -
                    kalmanGain[1] * deltaFSBytes * t01;

  // Covariance matrix, must be positive semi-definite.
  assert(_thetaCov[0][0] + _thetaCov[1][1] >= 0 &&
         _thetaCov[0][0] * _thetaCov[1][1] -
                 _thetaCov[0][1] * _thetaCov[1][0] >=
             0 &&
         _thetaCov[0][0] >= 0);
}

2.3.2.4 計算抖動

卡爾曼濾波處理完之后，此時就得到了一個相對准確的傳輸速率C(i)和網絡排隊延遲m(i), 則可以通過這兩個值取計算抖動了，在VCMJitterEstimator::UpdateEstimate( )的最后調用了PostProcessEstimate()，計算了jitter

void VCMJitterEstimator::PostProcessEstimate() {
  _filterJitterEstimate = CalculateEstimate();
}

// Calculates the current jitter estimate from the filtered estimates.
/**
 * @description: 計算當前的jitter
 * @param {*}
 * @return {jitter}值
 */
double VCMJitterEstimator::CalculateEstimate() {
  // 這里jitter的計算是一個保守的jitter
  // 使用_maxFrameSize - _avgFrameSize預估下一個幀最大可能增長作為deltaFs
  //計算其在當前預估的傳輸速率C下可能呆的最大抖動，
  double ret = _theta[0] * (_maxFrameSize - _avgFrameSize) + NoiseThreshold();

  // A very low estimate (or negative) is neglected.
  if (ret < 1.0) {
    if (_prevEstimate <= 0.01) {
      ret = 1.0;
    } else {
      ret = _prevEstimate;
    }
  }
  if (ret > 10000.0) {  // Sanity
    ret = 10000.0;
  }
  _prevEstimate = ret;
  return ret;
}

在估計過程中添加NoiseThreshold( )讓我費解，在這里，jitter的計算只是一種估計，基於當前的噪聲概率分布添加噪聲無可厚非，但是這個噪聲閾值的計算過程不好理解，_noiseStdDevs * sqrt(_varNoise)為1%噪聲上分位，而_noiseStdDevOffset的含義不太清楚，其值為30ms，下面的if分支做了判斷，僅做推測其含義為，噪聲不大的時候使用1.0作為噪聲，大於30ms的時候，使用原值

/**
 * @description: 計算噪聲閾值
 * @param {*}
 * @return {*}
 */
double VCMJitterEstimator::NoiseThreshold() const {
  // 而是使用一個基礎噪聲系數 * 噪聲標准差 -  差值
  double noiseThreshold = _noiseStdDevs * sqrt(_varNoise) - _noiseStdDevOffset;
  if (noiseThreshold < 1.0) {
    // 噪聲不大的時候使用1.0，大於30ms的時候，使用原值
    noiseThreshold = 1.0;
  }
  return noiseThreshold;
}

至此，預測的抖動計算完畢

2.3.2.5 外部獲取jitter

再回到GetNextFrame( )這個函數中，外部會通過VCMJitterEstimator::GetJitterEstimate( )去獲取jitter，獲取的jitter delay會傳遞到VCMTiming，用它來計算一個幀解碼時間，下面是外部獲取jitter delay的具體實現:

// Returns the current filtered estimate if available,
// otherwise tries to calculate an estimate.
int VCMJitterEstimator::GetJitterEstimate(
    double rttMultiplier,
    absl::optional<double> rttMultAddCapMs) {
  // 計算jitter
  double jitterMS = CalculateEstimate() + OPERATING_SYSTEM_JITTER;
  uint64_t now = clock_->TimeInMicroseconds();

  // 此處注意! nack在jitter模塊一分鍾才過期清零，影響下面jitter的變大
  // 可以優化
  if (now - _latestNackTimestamp > kNackCountTimeoutMs * 1000)
    _nackCount = 0;

  // 暫時不知道什么情況會觸發
  if (_filterJitterEstimate > jitterMS)
    jitterMS = _filterJitterEstimate;
  
  if (_nackCount >= _nackLimit) {
    // 丟包太多的時候，將jitterMs增大
    // 不太理解為什么添加額外處理丟包延時，按道理來說，卡爾曼濾波建模的
    // 傳輸速率的計算是deltaFs/delta 會考慮到平均丟包時間
    if (rttMultAddCapMs.has_value()) {
      jitterMS +=
          std::min(_rttFilter.RttMs() * rttMultiplier, rttMultAddCapMs.value());
    } else {
      jitterMS += _rttFilter.RttMs() * rttMultiplier;
    }
  }

  // 低幀率的時候允許降低jitter
  if (enable_reduced_delay_) {
    static const double kJitterScaleLowThreshold = 5.0;
    static const double kJitterScaleHighThreshold = 10.0;
    double fps = GetFrameRate();
    // Ignore jitter for very low fps streams.
    // 低幀率的時候，網絡jitter引起的抖動已經不明顯了
    // 這個時候的主要問題是卡頓，可以適當降低jitter了
    if (fps < kJitterScaleLowThreshold) {
      // 只有5幀時，每一幀都超過了200ms，網絡抖動沒有太大意義
      if (fps == 0.0) {
        return rtc::checked_cast<int>(std::max(0.0, jitterMS) + 0.5);
      }
      return 0;
    }

    // Semi-low frame rate; scale by factor linearly interpolated from 0.0 at
    // kJitterScaleLowThreshold to 1.0 at kJitterScaleHighThreshold.
    if (fps < kJitterScaleHighThreshold) {
      // 處於semid_low區間時，根據幀率調整
      jitterMS =
          (1.0 / (kJitterScaleHighThreshold - kJitterScaleLowThreshold)) *
          (fps - kJitterScaleLowThreshold) * jitterMS;
    }
  }

  return rtc::checked_cast<int>(std::max(0.0, jitterMS) + 0.5);
}

3. REF

1. A Google Congestion Control Algorithm for Real-Time Communication draft-alvestrand-rmcat-congestion-03

2. WebRtc Video Receiver(八)-基於Kalman filter模型的JitterDelay原理分析

3.從貝葉斯到卡爾曼濾波

4.WebRTC視頻JitterBuffer詳解

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 webrtc源碼分析(7)-fec webrtc源碼分析(5)-Api接口 WebRTC源碼分析：音頻模塊結構分析同網頁的WebRTC實現與源碼分析 webrtc源碼分析(4)-視頻發送流程 WebRTC 源碼分析（二）：安卓預覽 webrtc源碼分析(2)-線程任務管理 webrtc源碼分析(8)-擁塞控制(上)-碼率預估 webrtc源碼分析(9)-擁塞控制(下)-碼率分配 WebRTC 源碼分析（三）：安卓視頻硬編碼