【論文筆記】Fast Cost-Volume Filtering for Visual Correspondence and Beyond

本文轉載自查看原文 2020-01-26 11:46 945 計算機視覺算法/ 立體匹配

—— 一篇曾經在middlebury上達到SoT的論文，代價聚合用的guided filter，邏輯清晰，性能不夠快，但paper從更高的角度來看計算機視覺中的立體匹配、光流以及圖像分割，值得一讀

嗯，讀完了，有種任督二脈被打通的感覺，把這兩年多接觸到的詞，陌生的，熟悉的，全都連起來了

前言

作者指出以下幾個領域的算法均可總結為label-based：
- 立體匹配(stereo matching)
- 光流(optical flow)
- 圖像分割(image segmentation)
label-based類的方法算法本質：
- 構建三維代價立方體(cost volume)，存儲着圖像坐標為\((x, y)\)，分配label為\(l\)的代價；
  
  ——對於立體匹配來說，代價值就是左右視圖像素點對之間的相關性
- 而后，目標就是找到一個最優的label assignment，滿足以下條件：
- obeys the label costs
- spatially smooth
- label changes are aligned with edges in the image
- 為獲得滿足上述條件的最優的label assignment，通常會有以下手段：
  
  構建Conditional Markov Random Field將label costs編碼成一個由data term和pairwise smoothness term構成的能量函數，並基於諸如graph cut或者belief propagation等方法來最小化此能量函數
上述算法邏輯實際上是全局法求解的本質，特點是速度慢，難以處理大分辨率圖像和較大的label空間，因此，有以下兩種方式來做加速：
- 將問題轉化為凸優化：針對不同問題加額外的限制，來將問題轉化為凸優化問題，以在GPU上得到加速，但存在以下問題：
  - 增加額外的限制，使得模型難以cover住一些困難場景；
  - 為保證平滑項的convex性，可能會造成結果的過度平滑；
- 使用局部濾波來逼近：通過設計窗口形狀、大小以及濾波權重在窗口內實現局部最優代價聚合（窗口外的像素可看做權重為0），優點是速度快，缺點是結果為局部最優

Abstract

主要思想：

通過構建最小生成樹來完成最優代價聚合，樹節點為像素點，連接樹節點的邊權重由樹節點間的相似度決定
貢獻點：
1. 提出了一套filter-based的框架來求解諸如立體匹配，光流以及圖像分割等label-based問題
2. 應用到立體匹配，可以達到當時SoT的real-time效果
3. 應用到光流，可以處理fine motion structure和large displacement
4. 應用到圖像分割可以得到一個快速且高質量的interactive圖像分割結果
時間性能：
1. 硬件：1.8 GHz 酷睿i7 CPU，4 GB緩存
2. 數據集：middlebury
3. 耗時：90ms
future:

如guided filter原文所述，作者也點出，guided filter可以逼近全局法的一個原因就是：

the guided filter is one step of a conjugate gradient solver of a particular linear system.

又一次點出filter-based和energy-based之間的關系...
關鍵文獻：

【11】K. He, J. Sun, and X. Tang. Guided image filtering. In ECCV, 2010.

【31】K. Yoon and S. Kweon. Adaptive support-weight approach for correspondence search. PAMI, 2006.

研究背景

算法思想

We consider a general labeling problem, where the goal is to assign each pixel \(i\) with coordinates \((x, y)\) in the image \(I\) to a label \(l\) from the set \(L = {1,...,L}\).

代價聚合本質：

\[C_{i, l}^{'} = \sum_j{W_{i, j}(I)C_{j, l}}\tag{1} \]

本文權重計算方式使用的是guided filter：

\[W_{i, j} = \frac{1}{|w|^2}\sum_{k:(i, j) \isin w_k}(1 + \frac{(I_i - \mu_k)(I_j - \mu_k)}{\sigma_k^2 + \epsilon})\tag{2} \]

guided filter保邊平滑本質主要來源於:

\[\frac{(I_i - \mu_k)(I_k - \mu_k)}{\sigma_k^2 + \epsilon} \]

當\(\sigma_k^2 \ll \epsilon\)足夠大時，鄰域間的顏色相似度對權重的影響就會減小，整個權重分布接近於均勻分布，guided filter的表現就越接近於box filter

guided filter的矢量化形式(復雜度為\(O(N)\)的體現)：

\[W_{i, j} = \frac{1}{|w|^2}\sum_{k:(i, j) \isin w_k}(1 + (I_i - \mu_k)^T(\Sigma_k + \epsilon U)^{-1}(I_j - \mu_k))\tag{3} \]

應用

Stereo Matching

算法流程：

代價計算： SAD + gradient
代價聚合：本文方法
視差計算：WTA
后處理：
- 遮擋檢測和補洞：LR-Check, 並以非遮擋區域的最小視差進行填充
- weighted median filter：權重中值濾波，權重使用的是雙邊權重...
  
  \[W_{i, j}^{bf} = \frac{1}{K_i}exp(-\frac{|i - j|^2}{\sigma_s^2})exp(-\frac{|I(i) - I(j)|^2}{\sigma_c^2})\tag{4} \]

一波騷操作...

左右視圖的代價聚合是guided filter的權重是將左右視圖合成成一張6通道的圖來矢量化計算的...

Optical Flow

算法流程：

代價計算： SAD + gradient
代價聚合：本文方法
視差計算：WTA
后處理：
- 遮擋檢測和補洞：LR-Check, 用weighted median filter來補洞，權重使用的是guided filter的權重
- 亞像素增強(目前還不懂咋操作，先引用起作者原話，后面再來研究)：
1. Subpixel precision: To find sub-pixel accurate flow vectors, we follow [23] and simply upscale the input images using bicubic interpolation.
2. This increases the size of the cost volume in the label dimension (but not in the x and y dimensions) and hence raises the running time.
3. In practice, we found that smoothing the final flow vectors with the guided filter can compensate for a lower upscaling factor.
4. We empirically found that an upscaling factor of 4 gives vi- sually pleasing results, but in this paper we upscale by a factor of 8 to demonstrate the best possible performance.

Interactive Image Segmentation

代價計算(核心)
- 根據用戶輸入的前背景信息，構造前景和背景直方圖：\(\theta^F\) 和 \(\theta^B\)
- 每個直方圖包含\(K\)個bins，里邊用於統計像素點\(i\)為背景的情況
- 也可接受用戶輸入Bounding box，box內部的像素點用於構造\(\theta^F\)，box外部的像素點用於構造\(\theta^B\)

代價函數定義為:

\[C_{i,l} = 1 - \theta_{b(i)}^l \]

對於二分類問題，可將代價函數\(C_{i, l}\)降為兩維，\(C_i\)表示一個像素點屬於前景區域的代價:

\[C_i = 1 - \frac{\theta_{b(i)}^F}{\theta_{b(i)}^F + \theta_{b(i)}^B}\tag{5} \]

代價聚合：本文方法
圖像分割：\(C_i < 0.5\)，則將其置為前景，\(C_i > 0.5\)，則將其置為背景

對於Bounding box方法，作者用了5次迭代更新顏色模型來增強代價計算的精度

NOTE

實驗部分說：效果和graph cut相當，性能是 5ms vs 300ms(425ms)，驚呆了我和我的小伙伴...

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 論文筆記--Fast RCNN 論文筆記 : NCF( Neural Collaborative Filtering) 論文筆記《Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks》論文筆記：Neural Graph Collaborative Filtering（SIGIR 2019） Correlation Filter in Visual Tracking系列二：Fast Visual Tracking via Dense Spatio-Temporal Context Learning 論文筆記論文筆記之： Recurrent Models of Visual Attention 論文筆記：Fast Online Object Tracking and Segmentation: A Unifying Approach 論文筆記(2)：A fast learning algorithm for deep belief nets. 【論文筆記】Object detection with location-aware deformable convolution and backward attention filtering 論文筆記-Item2Vec- Neural Item Embedding for Collaborative Filtering