論文解讀（MoCo v2）《Improved Baselines with Momentum Contrastive Learning》

本文轉載自查看原文 2021-07-18 23:33 139 論文解讀

　　論文題目：《Improved Baselines with Momentum Contrastive Learning》

　　論文作者： Xinlei Chen Haoqi Fan Ross Girshick Kaiming He　　

　　論文來源：arXiv

　　論文代碼：https://github.com/facebookresearch/moco

　　論文鏈接：https://arxiv.org/abs/2003.04297

1 概述

　　SimCLR對 end-to-end 的改進：

可以提供更多負樣本的大得多的批次(4k或8k)；
使用 MLP projection head 代替 $F_c$ projection head ；
更強的數據增強。

　　SimCLR 的批次為 4-8K，需要 TPU 支持。MoCo v2無需 SimCLR 一樣超大 batch size，普通 8-GPU 即可訓練。　　
　　MoCo V2 融合了 MoCo V1 和 SimCLR, 是二者的集大成者，並且全面超越SimCLR。
　　MoCo V2 吸收了 SimCLR 的兩個重要改進。

使用 MLP projection head 代替 $F_c$ projection head ；
使用更多的數據增強手段；

知識點：

　　Fully Connected(FC) Layer = 一層layer

　　MLP = 多層 FC layer 構成的NN

　　DNN = MLP 和 CNN的集合相並，通常包括多個卷積layer和FC layer

2 Improved designs

　　回顧 MoCo v1 模型：

　　MoCo 步驟：
　　對於每個batch x：

隨機增強出 $x^{q} 、 x^{k} $ 兩種 view ;
分別用 $f_{q} $ , $ f_{k} $ 對輸入進行編碼得到歸一化的 $q $ 和 $ \mathrm{k} $ , 並去掉 $\mathrm{k} $ 的梯度更新 ;
將 $\mathrm{q} $ 和 $\mathrm{k} $ 中的唯一一個正例做點積得 cosine相似度 ($\mathrm{Nx} 1$) , 再將 $\mathrm{q}$ 和隊列中存儲的K個負樣本做點積得 cosine相似度 ($\mathrm{NxK}$) , 拼接起來的到 $\mathrm{Nx}(1+\mathrm{K}) $ 大小的矩陣, 這時第一個元素就是正例，直接計算交叉摘損失, 更新 $f_{q}$ 的參數；
動量更新 $f_{k} $ 的參數： $ f_{k}=m * f_{k}+(1-m) * f_{q} $；
將 $ \mathrm{k}$ 加入隊列，把隊首的舊編碼出隊，負例最多時有 65536 個。

　　在MoCo框架中，大量的負樣本是現成的；MLP頭部和數據增強與對比學習的實例化方式是正交的。

　　MoCo v2 框架

　　將 MoCo 網絡結構中經過卷積層后的一層線性MLP擴展為兩層非線性的MLP，使用ReLU激活函數。該方法在 SimCLR 中使用。
　　在 Data Argumentation中，增加使用 Blur augmentation 來進行數據增光。但 color distortion 並未取得很好的效果。

3 實驗

　　Table 1. Ablation of MoCo baselines, evaluated by ResNet-50 for (i) ImageNet linear classification, and (ii) fine-tuning VOC object detection (mean of 5 trials). “MLP”: with an MLP head; “aug+”: with extra blur augmentation; “cos”: cosine learning rate schedule.
　　采用1：添加 MLP 2：添加數據增強（數據模糊） 3：采用余弦衰減學習率 4：調整 epochs 大小這4種策略，和 MoCo v1 做對比。在ImageNet 和 VOC 微調上得到顯著提升。

　　Table 2. MoCo vs. SimCLR: ImageNet linear classifier accuracy (ResNet-50, 1-crop 224×224), trained on features from unsupervised pre-training. “aug+” in SimCLR includes blur and stronger color distortion. SimCLR ablations are from Fig. 9 in [2] (we thank the authors for providing the numerical results).
　　MoCo vs. SimCLR 在ImageNet 線性分類的准確率對比，顯然 MoCo v2 勝出。

　　Table 3. Memory and time cost in 8 V100 16G GPUs, implemented in PyTorch. : based on our estimation.
　　比較 MoCo 和end-to-end 的效率。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 論文解讀《Momentum Contrast for Unsupervised Visual Representation Learning》俗稱 MoCo 『論文筆記』MoCo：Momentum Contrast for Unsupervised Visual Representation Learning 論文解讀（PCL）《Prototypical Contrastive Learning of Unsupervised Representations》論文解讀（SimCLR）《A Simple Framework for Contrastive Learning of Visual Representations》論文解讀（MLGCL）《Multi-Level Graph Contrastive Learning》論文解讀（MVGRL）Contrastive Multi-View Representation Learning on Graphs 論文解讀（GRACE）《Deep Graph Contrastive Representation Learning》 Momentum Contrast for Unsupervised Visual Representation Learning (MoCo) 論文解讀：EMNLP 2021-SimCSE: Simple Contrastive Learning of Sentence Embeddings 論文解讀（IDEC）《Improved Deep Embedded Clustering with Local Structure Preservation》