Mobilenet V1



參考博客: https://cuijiahua.com/blog/2018/02/dl_6.html

1. Depth Separable Convolution

A standard convolution both filters and combines inputs into a new set of outputs in one step. The depthwise separable convolution splits this into two layers, a separate layer for filtering and a separate layer for combining.

一個卷積核處理輸入數據時的計算量為(有Padding):

\(D_K ∗D_K ∗M∗D_F ∗D_F\)

  • M為輸入的通道數

  • \(D_K\)為卷積核的寬和高

  • \(D_F\)為輸入feature map的寬和高

在某一層如果使用N個卷積核,這一個卷積層的計算量為:

\(D_K ∗D_K ∗M∗N*D_F ∗D_F\)

如果采用 Depthwise Convolutional Filters,標准交卷:

而深度可分離卷積:

一組和輸入通道數相同的2D卷積核的運算量為:

\(D_K*D_k*M*D_F*D_F\)

3D的1x1卷積核的計算量為:

\(N*M*D_F*D_F\)

因此這種組合方式的計算量:

\(D_K*D_k*M*D_F*D_F+N*M*D_F*D_F\)

相比較,Depthwise Separable Convolution 的計算量:

舉一個具體的例子,給定輸入圖像的為 3 通道的 224x224 的圖像,VGG16網絡的第3個卷積層conv2_1輸入的是尺寸為 112 的特征圖,通道數為 64 ,卷積核尺寸為 3,卷積核個數為 128,傳統卷積運算量就是:

而將傳統3D卷積替換為deep-wise結果1x1方式的卷積,計算量為:

可見這一層,計算量比例:

2. 網絡結構

傳統的3D卷積常見的使用方式如下圖左側所示,deep-wise卷積的使用方式如下圖右邊所示:

  • deepwise的卷積和后面的1x1卷積被當成了兩個獨立的模塊,都在輸出結果的部分加入了Batch Normalization和非線性激活單元。

Deepwise結合 1x1 的卷積方式代替傳統卷積不僅在理論上會更高效,而且由於大量使用 1x1 的卷積,可以直接使用高度優化的數學庫來完成這個操作。以Caffe為例,如果要使用這些數學庫,要首先使用 im2col 的方式來對數據進行重新排布,從而確保滿足此類數學庫的輸入形式;但是 1x1 方式的卷積不需要這種預處理。

在MobileNet中,有95%的計算量和75%的參數屬於1x1卷積:

3. 寬度因子和分辨率因子

寬度因子

寬度因子α是一個屬於(0,1]之間的數,附加於網絡的通道數。簡單來說就是新網絡中每一個模塊要使用的卷積核數量相較於標准的MobileNet比例。對於deep-wise結合1x1方式的卷積核,計算量為:

分辨率因子

分辨率因子β的取值范圍在(0,1]之間,是作用於每一個模塊輸入尺寸的約減因子,簡單來說就是將輸入數據以及由此在每一個模塊產生的特征圖都變小了,結合寬度因子α,deep-wise結合1x1方式的卷積核計算量為:

4. 代碼實現

在代碼中,並沒有實現分辨率因子,而是多了一個depth_multiplier參數:

"""MobileNet v1 models for Keras.
MobileNet is a general architecture and can be used for multiple use cases.
Depending on the use case, it can use different input layer size and
different width factors. This allows different width models to reduce
the number of multiply-adds and thereby
reduce inference cost on mobile devices.
MobileNets support any input size greater than 32 x 32, with larger image sizes
offering better performance.
The number of parameters and number of multiply-adds
can be modified by using the `alpha` parameter,
which increases/decreases the number of filters in each layer.
By altering the image size and `alpha` parameter,
all 16 models from the paper can be built, with ImageNet weights provided.
The paper demonstrates the performance of MobileNets using `alpha` values of
1.0 (also called 100 % MobileNet), 0.75, 0.5 and 0.25.
For each of these `alpha` values, weights for 4 different input image sizes
are provided (224, 192, 160, 128).
The following table describes the size and accuracy of the 100% MobileNet
on size 224 x 224:
----------------------------------------------------------------------------
Width Multiplier (alpha) | ImageNet Acc |  Multiply-Adds (M) |  Params (M)
----------------------------------------------------------------------------
|   1.0 MobileNet-224    |    70.6 %     |        529        |     4.2     |
|   0.75 MobileNet-224   |    68.4 %     |        325        |     2.6     |
|   0.50 MobileNet-224   |    63.7 %     |        149        |     1.3     |
|   0.25 MobileNet-224   |    50.6 %     |        41         |     0.5     |
----------------------------------------------------------------------------
The following table describes the performance of
the 100 % MobileNet on various input sizes:
------------------------------------------------------------------------
      Resolution      | ImageNet Acc | Multiply-Adds (M) | Params (M)
------------------------------------------------------------------------
|  1.0 MobileNet-224  |    70.6 %    |        529        |     4.2     |
|  1.0 MobileNet-192  |    69.1 %    |        529        |     4.2     |
|  1.0 MobileNet-160  |    67.2 %    |        529        |     4.2     |
|  1.0 MobileNet-128  |    64.4 %    |        529        |     4.2     |
------------------------------------------------------------------------
The weights for all 16 models are obtained and translated
from Tensorflow checkpoints found at
https://github.com/tensorflow/models/blob/master/slim/nets/mobilenet_v1.md
# Reference
- [MobileNets: Efficient Convolutional Neural Networks for
   Mobile Vision Applications](https://arxiv.org/pdf/1704.04861.pdf))
"""
from keras.models import Model
from keras.layers import Input, Activation, Dropout, Reshape, BatchNormalization, GlobalAveragePooling2D, GlobalMaxPooling2D
from keras.layers import Conv2D, DepthwiseConv2D
from keras.utils import  plot_model
from keras import backend as K
 
 
def relu6(x):
    return K.relu(x, max_value=6)
 
 
def _make_divisiable(v, divisor=8, min_value=8):
    """分段函數,保證能夠被divisor整除,最小數是min_value"""
    if min_value is None:
        min_value = divisor
    new_v = max(min_value, int(v + divisor/2) // divisor * divisor)
    # Make sure that round down does not go down by more than 10%.
    if new_v < 0.9 * v:
        new_v += divisor
    return new_v
 
 
def _conv_bolck(inputs, filters, alpha, kernel=(3, 3), strides=(1, 1), bn_epsilon=1e-3,
                bn_momentum=0.99, block_id=1):
    """ Adds an initial convolution layer (with batch normalization and relu6).
 
    Args:
        inputs: Input tensor of shape `(rows, cols, 3)` (with `channels_last` data format)
                or (3, rows, cols) (with `channels_first` data format).
                It should have exactly 3 inputs channels, and width and height should be no smaller than 32.
                E.g. `(224, 224, 3)` would be one valid value.
        filters: Integer, the dimensionality of the output space.
                (i.e. the number output of filters in the convolution).
        alpha: controls the width of the network.
                - If `alpha` < 1.0, proportionally decreases the number of filters in each layer.
                - If `alpha` > 1.0, proportionally increases the number of filters in each layer.
                - If `alpha` = 1, default number of filters from the paper are used at each layer.
        kernel: An integer or tuple/list of 2 integers, specifying the width and height of the 2D convolution window.
                Can be a single integer to specify the same value for all spatial dimensions.
        strides: An integer or tuple/list of 2 integers, specifying the strides of the convolution along the width and height.
                 Can be a single integer to specify the same value for all spatial dimensions.
                 Specifying any stride value != 1 is incompatible with specifying any `dilation_rate` value != 1.
        bn_epsilon: Epsilon value for BatchNormalization
        bn_momentum: Momentum value for BatchNormalization
        block_id: Integer, a unique identification designating the block number.
 
    Returns:
        Output tensor of block
 
    Input shape:
        4D tensor with shape: `(samples, channels, rows, cols)` if data_format='channels_first'
                           or `(samples, rows, cols, channels)` if data_format='channels_last'.
    Output shape:
        4D tensor with shape: `(samples, filters, new_rows, new_cols)` if data_format='channels_first'
                           or  `(samples, new_rows, new_cols, filters)` if data_format='channels_last'.
                          `rows` and `cols` values might have changed due to stride.
 
    """
    channel_axis = 1 if K.image_data_format() == 'channels_first' else -1
 
    filters = _make_divisiable(filters * alpha)  # 乘以寬度因子后的卷積核數量,可能不能被divisor=8整除
 
    x = Conv2D(filters, kernel, use_bias=False, strides=strides, name='conv{}'.format(block_id))(inputs)
    x = BatchNormalization(axis=channel_axis, momentum=bn_momentum, epsilon=bn_epsilon, name='conv{}_bn'.format(block_id))(x)
 
    return Activation(relu6, name='conv{}_relu'.format(block_id))(x)
 
 
def _depthwise_conv_block(inputs, pointwise_conv_filters, alpha, depth_multiplier=1,
                          strides=(1, 1), bn_epsilon=1e-3, block_id=1):
    """Adds a depthwise convolution block.
    A depthwise convolution block consists of
    a depthwise conv, batch normalization, relu6,
    pointwise convolution, batch normalization and relu6
 
    Args:
        inputs: Input tensor of shape `(rows, cols, channels)`(with `channels_last` data format)
                or (channels, rows, cols)(with `channels_first` data format)
        pointwise_conv_filters: Integer, the dimensionality of the output space
                                (i.e. the number output of filters in the pointwise convolution).
        alpha: controls the width of the network.
            - If `alpha` < 1.0, proportionally decreases the number of filters in each layer.
            - If `alpha` > 1.0, proportionally increases the number of filters in each layer.
            - If `alpha` = 1, default number of filters from the paper are used at each layer.
        depth_multiplier: The number of depthwise convolution output channels for each channel.
                        The total number of depthwise convolution output channels
                        will be equal to `filters_in * depth_multiplier`. 每個通道的深度卷積輸出通道的數量
        strides:  An integer or tuple/list of 2 integers,
                specifying the strides of the convolution along the width and height.
                Can be a single integer to specify the same value for all spatial dimensions.
                Specifying any stride value != 1 is incompatible with specifying any `dilation_rate` value != 1.
        bn_epsilon: Epsilon value for BatchNormalization
        block_id: Integer, a unique identification designating the block number.
 
    Returns:
        Output tensor of block
 
    Input shape:
         4D tensor with shape: `(batch, channels, rows, cols)` if data_format='channels_first'
                                or `(batch, rows, cols, channels)` if data_format='channels_last'.
    Output shape:
        4D tensor with shape: `(batch, filters, new_rows, new_cols)` if data_format='channels_first'
                                or `(batch, new_rows, new_cols, filters)` if data_format='channels_last'.
         `rows` and `cols` values might have changed due to stride.
 
    """
    channel_axis = 1 if K.image_data_format() == 'channels_first' else -1
    pointwise_conv_filters = _make_divisiable(pointwise_conv_filters * alpha)
 
    # Depthwise Conv2D
    # 只有depth_multiplier個卷積核,其將卷積操作分解,實際上卷積核shape: 3 x 3 x input_channels x depth_multiplier
    # 以下面為例,DepthwiseConv2D輸出的tensor的shape: (batch, rows, cols, input_channels * depth_multiplier)
    x = DepthwiseConv2D(kernel_size=(3, 3),
                        padding='same',
                        depth_multiplier=depth_multiplier,
                        strides=strides,
                        use_bias=False,
                        name='conv_dw_{}'.format(block_id))(inputs)
    x = BatchNormalization(axis=channel_axis, epsilon=bn_epsilon, name='conv_dw_{}_bn'.format(block_id))(x)
    x = Activation(relu6, name='conv_dw_{}_relu'.format(block_id))(x)
 
    # Pointwise Conv2D  pointwise_conv_filters控制最終out_channels
    x = Conv2D(pointwise_conv_filters,
               kernel_size=(1, 1),
               padding='same',
               use_bias=False,
               strides=(1, 1),
               name='conv_pw_{}'.format(block_id))(x)
    x = BatchNormalization(axis=channel_axis, epsilon=bn_epsilon, name='conv_pw_{}_bn'.format(block_id))(x)
 
    return Activation(relu6, name='conv_pw_{}_relu'.format(block_id))(x)
 
 
def mobilenetv1(input_shape,
                alpha=1.0,
                depth_multiplier=1,
                dropout=1e-3,
                classes=1000):
    """Instantiates the MobileNet architecture.
 
 
    Args:
        input_shape: optional shape tuple, only to be specified if `include_top` is False.
                    (otherwise the input shape has to be `(224, 224, 3)` (with `channels_last` data format)
                    or (3, 224, 224) (with `channels_first` data format).
                    It should have exactly 3 inputs channels, and width and height should be no smaller than 32.
                    E.g. `(200, 200, 3)` would be one valid value.
        alpha: controls the width of the network.
                - If `alpha` < 1.0, proportionally decreases the number of filters in each layer.
                - If `alpha` > 1.0, proportionally increases the number of filters in each layer.
                - If `alpha` = 1, default number of filters from the paper are used at each layer.
        depth_multiplier: depth multiplier for depthwise convolution
        dropout: dropout rate
        classes: optional number of classes to classify images into
 
    Returns:
        A Keras model instance.
 
    Raises:
        ValueError: in case of invalid argument for `weights`, or invalid input shape.
        RuntimeError: If attempting to run this model with a backend that does not support separable convolutions.
    """
 
    x_input = Input(shape=input_shape)
    x = _conv_bolck(x_input, 32, alpha, strides=(2, 2))
 
    x = _depthwise_conv_block(x, 64, alpha, depth_multiplier,
                              block_id=1)
    x = _depthwise_conv_block(x, 128, alpha, depth_multiplier,
                              strides=(2, 2), block_id=2)
    x = _depthwise_conv_block(x, 128, alpha, depth_multiplier,
                              block_id=3)
    x = _depthwise_conv_block(x, 256, alpha, depth_multiplier,
                              strides=(2, 2), block_id=4)
    x = _depthwise_conv_block(x, 256, alpha, depth_multiplier,
                              block_id=5)
    x = _depthwise_conv_block(x, 512, alpha, depth_multiplier,
                              strides=(2, 2),block_id=6)
    x = _depthwise_conv_block(x, 512, alpha, depth_multiplier,
                              block_id=7)
    x = _depthwise_conv_block(x, 512, alpha, depth_multiplier,
                              block_id=8)
    x = _depthwise_conv_block(x, 512, alpha, depth_multiplier,
                              block_id=9)
    x = _depthwise_conv_block(x, 512, alpha, depth_multiplier,
                              block_id=10)
    x = _depthwise_conv_block(x, 512, alpha, depth_multiplier,
                              block_id=11)
    x = _depthwise_conv_block(x, 512, alpha, depth_multiplier,
                              strides=(2, 2),block_id=12)
    x = _depthwise_conv_block(x, 1024, alpha, depth_multiplier, block_id=13)
 
    shape = (1, 1, int(1024 * alpha))
 
    x  = GlobalAveragePooling2D()(x)
    x = Reshape(shape, name='reshape_1')(x)
    x = Dropout(dropout, name='dropout')(x)
    x = Conv2D(classes, (1, 1), padding='same', name='conv_preds')(x)
    x = Activation('softmax', name='act_sotmax')(x)
    x = Reshape((classes,), name='reshape_2')(x)
 
    return Model(x_input, x)
 
 
if __name__ == '__main__':
    alpha = 1
    depth_multiplier = 1
    mobilenet = mobilenetv1(input_shape=(224, 224, 3), alpha=alpha, depth_multiplier=depth_multiplier)
    mobilenet.summary()
    plot_model(mobilenet, show_shapes=True, to_file='mobilenet_alpha{}_depth_multiplier_{}.png'.format(alpha, depth_multiplier))


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM