論文筆記：Deeper and Wider Siamese Networks for Real-Time Visual Tracking

本文轉載自查看原文 2019-03-14 20:01 617 論文閱讀/ 目標檢測與跟蹤/ Visual Tracking

Deeper and Wider Siamese Networks for Real-Time Visual Tracking
Updated on 2019-04-01 16:10:37

Paper (arXiv V3)：https://arxiv.org/pdf/1901.01660.pdf

Code：https://github.com/researchmm/SiamDW (The training and test code of SiamFC+ and SiamRPN+ have been released !!!)

My revised version: https://github.com/wangxiao5791509/SiamDW_tracker_revised (Original code can not run directly, it has some issues.)

1. Background and Motivation:

本文主要是很好的處理了跟蹤問題中一個很奇特的現象：“隨着網絡層數的層數（用現有的 ResNet, Inception 等網絡來替換常用的 Backbone net，例如 AlexNet），跟蹤結果不增反而降低的情況”。如下圖所示：

作者發現如下的幾個參數，對跟蹤結果的影響，非常巨大：* the receptive field size of neurons; * network stride; * feature padding 。

具體來說，感受野決定了用於計算 feature 的圖像區域。較大的感受野，提供了更好的 image context 信息，而一個較小的感受野可能無法捕獲目標的結構信息；

網絡的步長，影響了定位准確性的程度，特別是對小目標而言；與此同時，它也控制了輸出 feature map 的大小，從而影響了 feature 的判別性和檢測精度。

此外，對於一個全卷積的結構來說，feature padding 對卷積來說，會在模型訓練中，引入潛在的位置偏移，從而使得當一個目標移動到接近搜索范圍邊界的時候，很難做出准確的預測。這三個因素，同時造成了 Siamese Tracker 無法很好的從更頂尖的模型中收益。

本文中，作者嘗試從設計新的網絡結構的基礎上，來解決上述問題，從而使得 SiamNet 獲得更好的跟蹤性能。創新點主要在於：

1. 作者基於 the "boottleneck" residual block 來提出一組 cropping-inside residual (CIR) units。該模塊可以消除 padding 帶來的影響，從而組織卷積核學習 position bias；

2. 我們設計了兩種網絡結構，通過堆疊 the CIR units，稱為 Deeper and Wider networks。在這個網絡中，步長和神經感受野被用於增強定位的准確性；

3. 作者將所設計的 backbone network 用到 SiamFC 和 SiamRPN 網絡中。作者的實驗證明，在多個數據集上，都可以得到大幅度的提升。另外一個優勢是：本文所設計的網絡結構是輕量級的，允許跟蹤器可以實現實時跟蹤。

2. Background on Siamese Tracking:

關於孿生網絡的跟蹤器，可以參考其原始文章。

3. Analysis of Performance Degradation:

3.1 性能分析：

作者對不同 backbone 的網絡結構，作者發現不同的影響因子（包括：stride (STR), padding (PAD), receptive field (RF) of neurons in the last layers, and output feature size (OFS)）對跟蹤結果的影響不同，而且有些參數對結果的退化影響非常大，如下表所示：

作者得出如下的結論：

1). This illustrates that Siamese trackers prefer mid-level features (stride 4 or 8), which are more precise in object localization than high-level features (stride ≥ 16).

2). For the maximum size of receptive field (RF), the optima lies in a small range. In the cases of AlexNet, VGG-10 and ResNet-17, the optimal receptive field size is about 60%∼80% of the input exemplar image z size (e.g. 91 vs 127). It illustrates that the size of RF (感受野) is crucial for feature embedding in a Siamese framework.
3). only RF in a certain size range allows the feature to abstract the characteristics of the object, and its ideal size is closely related to the size of the exemplar image.

4). For the output feature size, it is observed that a small size (OFS ≤ 3) does not benefit tracking accuracy.

5). Network padding has a highly negative impact on the final performance.

上面表格 2，展示了 AlexNet 和 VGG-10 都不帶 padding，而 Inception 和 ResNet 都帶有 padding。

作者發現，這種 padding 會導致如下的問題：lead to inconsisitency between embeddingings of target object appearing at different positions in search images, and therefore, the matching similarity comparison degrades. 當一個物體移動到圖像邊緣時，其峰值不再能夠准確的反應目標的位置。當跟蹤器無法在上一幀准確定位時，這通常就會導致跟蹤器漂移。

3.2 Guidelines：

根據上述實驗和觀察，作者給出了如下的四個基礎的指南，來降低上述影響因子的干擾：

* Siamese trackers prefer a relatively small network stride.

* The receptive field of output features should be set based on its ratio to the size of the exemplar image.

* Network stride, receptive field and output feature size should be consisdered as a whole when designing a network architecture.

* For a fully convolutional Siamese matching network, it is critical to handle the problem of perceptual inconsistency between the two network streams.

4. Deeper and Wider Siamese Networks:

4.1 Cropping-Inside Residual (CIR) Units:

CIR Unit. 在原始版本的 Residual 單元中，是帶有 padding，而之前的觀測表明 padding 會導致 Siamese Tracker 位置偏移。所以，我們應該 remove 掉這個 padding 的過程，然后使其適應 Siamese Tracker。為了達到這個目的，我們用一個 cropping operation 來增強 residual unit，即：在特征相加完成后，加一個 crop 操作（下圖淡藍色標記）。這個 cropping 操作符移除了被 zero-padding signals 所影響的 feature。由於 bottleneck layer 的 padding size 是 1，僅僅最邊緣的 features 被刪除。這個簡單的操作極大的移除了殘差單元中的 padding-affected features。

Downsampling CIR (CIR-D) Unit. 下采樣殘差單元也是網絡設計中一個重要的構建模塊。其用於降低 feature map 的空間大小，同時使得 feature channels 變為兩倍。由於這個模塊中也包含 padding，所以也采用 crop 操作。作者將卷積的步長，由 2 設置為 1。這些改變的關鍵點在於：確保僅由於padding引起的feature被刪除，而內部模塊的結構不變。

CIR-Inception and CIR-NeXt Units. 作者也將這種結構用於構建 multi-branch structure，確保其可以構建 wide 的網絡。

4.2 Network Architecture：

作者將上述網絡結構，通過堆疊的方式，設計出了多個版本的 backbone，並在表格 3 中展示了 4 種不同深度的結構（16, 19, 22 and 43）。

此外，作者也設計了兩種 wide 的網絡結構，即表格 3 中的 CIResInception-22 and CIResNeXt-22。

5. Experiments：

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 論文筆記：Siamese Cascaded Region Proposal Networks for Real-Time Visual Tracking 論文筆記：Parallel Tracking and Verifying: A Framework for Real-Time and High Accuracy Visual Tracking 論文筆記：SiamRPN++: Evolution of Siamese Visual Tracking with Very Deep Networks 論文筆記--PCN:Real-Time Rotation-Invariant Face Detection with Progressive Calibration Networks 論文筆記之：Fully-Convolutional Siamese Networks for Object Tracking 論文筆記之：Visual Tracking with Fully Convolutional Networks 論文閱讀之：Deep Meta Learning for Real-Time Visual Tracking based on Target-Specific Feature Space 論文筆記-ERFNet: Efficient Residual Factorized ConvNet for Real-time Semantic Segmentation Perceptual Losses for Real-Time Style Transfer and Super-Resolution and Super-Resolution 論文筆記《Perceptual Losses for Real-Time Style Transfer and Super-Resolution》論文筆記