論文筆記：Learning Dynamic Memory Networks for Object Tracking

本文轉載自查看原文 2018-07-21 16:44 1693 論文閱讀/ Visual Tracking/ Dynamic Memory Network

Learning Dynamic Memory Networks for Object Tracking

ECCV 2018
Updated on 2018-08-05 16:36:30

Code: https://github.com/skyoung/MemTrack (Tensorflow Implementation)

【Note】This paper is developed based on Siamese Network and DNC（Nature-2016）, please check these two papers for details to better understand this paper.

DNC: https://www.cnblogs.com/wangxiaocvpr/p/5960027.html Paper: http://www.nature.com/nature/journal/vaop/ncurrent/pdf/nature20101.pdf

Siamese Network based tracker: https://www.cnblogs.com/wangxiaocvpr/p/5897461.html Paper: Fully-Convolutional Siamese Network for Object Tracking

Another tracking paper which also utilizes memory network: MAVOT: Memory-Augmented Video Object Tracking, arXiv

=================================

Motivation：想利用動態記憶網絡（Dynamic Memory Network）來動態的更新 target template，以使得基於孿生網絡的跟蹤算法可以更好的掌握目標的 feature，可以學習到更好的 appearance model，從而實現更加准確的定位。

Method：主要是基於 Dynamic Memory Network 來實現目標物體的准確更新。通過動態的存儲和讀寫 tracking results，來結合原始的 object patch，基於 Siamese Network Tracker 進行跟蹤，速度可以達到：50 FPS。

Approach Details：

Dynamic Memory Networks for Tracking：

1. Feature Extraction：

　　本文的特征提取方面，借鑒了 SiamFC；此處不細說。

2. Attention Scheme：

　　本文介紹 Attention 機制引入的動機為：Since the object information in the search image is needed to retrieve the related template for matching, but the object location is unknown at first, we apply an attention mechanism to make the input of LSTM concentrate more on the target. 簡單來講，就是為了更好的確定所要跟蹤的目標的位置，以更加方便的提取 proposals。

　　作者采用大小為 6*6*256 的 square patch 以滑動窗口的方式，對整個 search image 進行 patch 的划分。為了進一步的減少 square patch 的大小，我們采用了一種 average pooling 的方法：

那么，經過 attend 之后的 feature vector，可以看做是這些特征向量的加權組合（the weighted sum of the feature vectors）：

其中，L 是圖像塊的個數，加權的權重可以通過 softmax 函數計算出來，計算公式如下：

其中，這個就是 attention network，輸入是：LSTM 的 hidden state $h_{t-1}$，以及 a square patch。另外的 W 以及 b 都是可以學習的網絡權重和偏差。

下圖展示了相關的視覺效果：

3. LSTM Memory Controller

此處，該網絡的控制也是通過 lstm 來控制的，即：輸入是上一個時刻的 hidden state，以及當前時刻從 attention module 傳遞過來的 attended feature vector，輸出一個新的 hidden state 來計算 memory control signals，即：read key, read strength, bias gates, and decay rate。

4. Memory Reading && Memory Writting && Residual Template Learning：

==>> 我們可以從如下的這兩個視角來看點這個 read 和 write 的問題：

對於 Read，給定 LSTM 的輸入信號，我們可以獲得 Read Key 及其對應的 read strength，然后根據這個 vector 和 memory 中的記憶片段，進行 read weight 的計算，然后確定是否讀取對應的 template；

具體來說：

（1） read key 及其 read strength 的計算可以用如下的公式：

（2）read weight：

（3）the template is retrieved from memory:

（4）最終模板的學習，可以通過如下公式計算得出：

對於 Write，給定 LSTM 的輸入信號，我們可以計算 BiasGates 的三個值，從而知道衰減率（decay rate），可以計算出擦除因子（erase factor），我們根據獲得的 write weight，來控制是否將 new templates 寫入到 memory 中，以及寫入多少的問題。、

（1）The write weight：

（2）The write gate：

（3）The allocation weight:

（4）最終模板的寫入以及寫入多少的控制：

==>> Experimental Results:

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 論文筆記之：Fully-Convolutional Siamese Networks for Object Tracking [論文筆記] ：Temporal Graph Networks for Deep Learning on Dynamic Graphs 論文筆記：Learning regression and verification networks for long-term visual tracking 論文筆記之：Learning Multi-Domain Convolutional Neural Networks for Visual Tracking 論文筆記《Decoupled Dynamic Filter Networks》論文筆記之：Spatially Supervised Recurrent Convolutional Neural Networks for Visual Object Tracking 論文筆記之：Learning to Track: Online Multi-Object Tracking by Decision Making 論文筆記之：Visual Tracking with Fully Convolutional Networks 【論文筆記】Relation Networks for Object Detection [CVPR2015] Is object localization for free? – Weakly-supervised learning with convolutional neural networks論文筆記