Paper Read: Robust Deep Multi-modal Learning Based on Gated Information Fusion Network

本文轉載自查看原文 2018-07-27 14:53 634 論文閱讀/ 多模態問題(visible/ thermal)/ 深度學習

Robust Deep Multi-modal Learning Based on Gated Information Fusion Network

2018-07-27 14:25:26

Paper：https://arxiv.org/pdf/1807.06233.pdf

Related Papers:

1. Infrared and visible image fusion methods and applications: A survey 　　Paper

2. Chenglong Li, Xiao Wang, Lei Zhang, Jin Tang, Hejun Wu, and Liang Lin. WELD: Weighted Low-rank Decomposition or Robust Grayscale-Thermal Foreground Detection. IEEE Transactions on Circuits and Systems for Video Technology (T-CSVT), 27(4): 725-738, 2017. [Project page with Dataset and Code]

3. Chenglong Li, Xinyan Liang, Yijuan Lu, Nan Zhao, and Jin Tang. RGB-T Object Tracking: Benchmark and Baseline.[arXiv] [Dataset: Google drive, Baidu cloud] [Project page]

本文針對多模態融合問題（Multi-modal），提出一種基於 gate 機制的融合策略，能夠自適應的進行多模態信息的融合。作者將該方法用到了物體檢測上，其大致流程圖如下所示：

如上圖所示，作者分別用兩路 Network 來提取兩個模態的特征。該網絡是由標准的 VGG-16 和 8 extra convolutional layers 構成。另外，作者提出新的 GIF（Gated Information Fusion Network）網絡進行多個模態之間信息的融合，以取得更好的結果。動機當然就是多個模態的信息，是互補的，但是有的信息幫助會更大，有的可能就質量比較差，功效比較小，於是就可以自適應的來融合，達到更好的效果。

Gated Information Fusion Network (GIF)：

如上圖所示：

該 GIF 網絡的輸入是：已經提取的 CNN feature map，這里是 F1, F2. 然后，將這兩個 feature 進行 concatenate，得到 $F_G$. 該網絡包含兩個部分：

1. information fusion network（圖2，虛線框意外的部分）；

2. weight generation network （WG Network，即：圖2，虛線處）；

Weight Generation Network 分別用兩個 3*3*1 的卷積核對組合后的 feature map $F_G$ 進行操作，然后輸入到 sigmoid 函數中，即：gate layer，然后輸出對應的權重 $w_1$，$w_2$。

Information fusion network 分別用得到的兩個權重，點乘原始的 feature map，得到加權以后的特征圖，將兩者進行 concatenate 后，用 1*1*2k 的卷積核，得到最終的 feature map。

總結整個過程，可以歸納為：

== Done !

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 【論文閱讀】Multi-Modal Fusion Transformer for End-to-End Autonomous Driving MMGCN: Multi-modal Graph Convolution Network for Personalized Recommendation of Micro-video - 1 - 論文學習論文筆記之：Learning Cross-Modal Deep Representations for Robust Pedestrian Detection Deep Learning for Information Retrieval 《Multi-focus image fusion with a deep convolutional neural network》論文筆記 A Deep Learning-Based System for Vulnerability Detection Multi-Modal Domain Adaptation for Fine-Grained Action Recognition--論文 Monocular Real-time Hand Shape and Motion Capture using Multi-modal Data Paper | Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising Paper | PyTorch: An Imperative Style, High-Performance Deep Learning Library