Saining——【arXiv2017】Aggregated Residual Transformations for Deep Neural Networks

作者和相關鏈接

作者

論文下載
代碼下載

主要思想

要解決的問題是什么？

　　對於ResNet，VGG，Inception等網絡，需要由一些重復的building block堆疊而成，而這些building block的濾波器個數，大小等不能任意設置，需要人工調整。由於其中有很多超參數需要調整，而且在不同的vision task甚至是不同的dataset上參數不能直接共享需要進行個性化定制，因此，這種需要為一定task或者dataset定制的module雖然效果好，但通用性太差。這篇文章介紹了一種新的building block，可以用來替換ResNet的building block，新的模型稱為ResNeXt。ResNeXt的最大優勢在於整個網絡的building block都是一樣的，不用在每個stage里再對每個building block的超參數進行調整，只用一個building block，重復堆疊即可形成整個網絡。實驗結果表明ResNeXt比ResNet在同樣模型大小的情況下效果更好。

解決思路？

　　將ResNet的blcok（如圖Figure 1的左圖所示）換成ResNeXt的block（如圖Figure 1的右圖所示），實際上是將左邊的64個卷積核分成了右邊32條不同path，每個path有4個卷積核，最后的32個path將輸出向量直接pixel-wise相加（所有通道對應位置點相加），再與Short Cut相加

Figure 1. Left: A block of ResNet [13]. Right: A block of ResNeXt with cardinality = 32, with roughly the same complexity. A layer is shown as (# in channels, filter size, # out channels)

Cardinality和Bottleneck

　　這篇文章提出了一種新的衡量模型容量（capacity，指的是模型擬合各種函數的能力）。在此之前，模型容量有寬度（width)和高度(height)這兩種屬性，本文提出的“Cardinality”指的是網絡結構中的building block的變換的集合大小（the size of the set of transformation）。如圖Figure 2所示，（a）、（b）、（c）三種結構是等價的，本文用的是圖（c）。實際上Cardinality指的就是Figure 2（b）中path數或Figure 2（c）中group數，即每一條path或者每一個group表示一種transformation，因此path數目或者group個數即為Cardinality數。Bottleneck指的是在每一個path或者group中，中間過渡形態的feature map的channel數目（或者卷積核個數），如Figure 2（a）中，在每一條path中，對於輸入256維的向量，使用了4個1*1*256的卷積核進行卷積后得到了256*4的feature map，即4個channel，每個channel的feature map大小為256維，因此，Bottleneck即為4。

Figure 2. Equivalent building blocks of ResNeXt. (a): Aggregated residual transformations, the same as Fig. 1 right. (b): A block equivalent to (a), implemented as early concatenation. (c): A block equivalent to (a,b), implemented as grouped convolutions [23]. Notations in bold text highlight the reformulation changes. A layer is denoted as (# input channels, filter size, # output channels).

ResNet和ResNeXt對比

網絡結構對比

　　圖Figure 2所示表示的depth=3的情況下ResNet和ResNeXt的building block的對比。

具體配置對比

　　ResNet-50和ResNeXt-50的building block的配置對比如Table 1所示，圖中C=32即表示Cardinality=32，Bottleneck= 4，即如圖Figure 2中所示。

Table 1. (Left) ResNet-50. (Right) ResNeXt-50 with a 32×4d template (using the reformulation in Fig. 3(c)). Inside the brackets are the shape of a residual block, and outside the brackets is the number of stacked blocks on a stage. “C=32” suggests grouped convolutions [23] with 32 groups. The numbers of parameters and FLOPs are similar between these two models.

模型大小計算

　　以圖Figure 3為例，ResNet的參數個數為256 · 64 + 3 · 3 · 64 · 64 + 64 · 256 ≈ 70k 。

ResNeXt的參數個數為C · (256 · d + 3 · 3 · d · d + d · 256），其中，C表示Cardinality=32，d表示bottleneck=4，因此參數總數 ≈ 70k 。

Figure 3. Left: A block of ResNet [13]. Right: A block of ResNeXt with cardinality = 32, with roughly the same complexity. A layer is shown as (# in channels, filter size, # out channels)

實驗結果對比
- 證明ResNeXt比ResNet更好，而且Cardinality越大效果越好

Table 2. Ablation experiments on ImageNet-1K. (Top): ResNet-50 with preserved complexity (∼4.1 billion FLOPs); (Bottom): ResNet-101 with preserved complexity ∼7.8 billion FLOPs). The error rate is evaluated on the single crop of 224×224 pixels.

- 證明增大Cardinality比增大模型的width或者depth效果更好

Table 3. Comparisons on ImageNet-1K when the number of FLOPs is increased to 2× of ResNet-101’s. The error rate is evaluated on the single crop of 224×224 pixels. The highlighted factors are the factors that increase complexity.

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 [是模板哦] 快速讀入快速讀入、輸出，及其他模板快速讀入輸出模板基礎網絡-ResNet/ResNeXt/DenseNet/DPN/SENet Res-Family: From ResNet to SE-ResNeXt PHP如何快速讀取大文件 Matlab簡單快速讀取nc文件 PHP如何快速讀取大文件使用JS快速讀取TXT文件 c++ 快速讀入輸出

【速讀】——ResNeXt

Saining——【arXiv2017】Aggregated Residual Transformations for Deep Neural Networks

目錄

作者和相關鏈接

主要思想

ResNet和ResNeXt對比

免責聲明！