1. 論文思想
- factorized convolutions and aggressive regularization.
- 本文給出了一些網絡設計的技巧。
2. 結果
- 用5G的計算量和25M的參數。With an ensemble of 4 models and multi-crop evaluation, we report 3.5% top-5 error and 17.3% top-1 error.
3. Introduction
- scaling up convolution network in efficient ways.
4. General Design Principles
-
Avoid representational bottlenecks, especially early in the network.(簡單說就是feature map的大小要慢慢的減小。)
-
Higher dimensional representations are easier to process locally within a network. Increasing the activations per tile in a convolutional network allows for more disentangled features. The resulting networks will train faster.(在網絡較深層應該利用更多的feature map,有利於容納更多的分解特征。這樣可以加速訓練)
-
Spatial aggregation can be done over lower dimensional embeddings without much or any loss in representational power.(也就是bottleneck layer的設計)
-
Balance the width and depth of the network.(Increasing both the width and the depth of the network can contribute to higher quality networks.同時增加網絡的深度和寬度)
5. Factorizing Convolution With Large Filter Size
- 分解較大filter size的卷積。
5.1. Factorization into smaller convolutions
- 一個5x5的卷積可以分解為兩個3x3的卷積。
- 實驗表明,將一個卷積分解為兩個卷積的時候,在第一個卷積之后利用ReLU會提升准確率。也就是說線性分解性能會差一些。
5.2 Spatial Factorization into Asymmetric Convolutions
- 將3x3的卷積分解成31和13的卷積,可以減少33%計算量,如果將3x3分解為兩個2x2,可以減少11%計算量,而且利用非對稱卷積的效果還更好。
- 實踐表明,不要過早的使用這種分解操作,在feature map 大小為(12 ~ 20)之間,使用它,效果是比較好的。
6. Utility of Auxiliary Classifier
7. Efficient Grid Size Reduction
- 左邊引入了 representational bottleneck,右邊的會增加大量的計算量,最佳的做法就是減少feature map大小的同時增大channel的數目。
- 以上才是正確的方式。