tensorflow+inceptionv3圖像分類網絡結構的解析與代碼實現

DenseNet傳送門：DenseNet

深度學習的火熱，使得越來越多的科研人員投入到其中。而作為各種應用類型的網絡基礎，圖像分類的網絡結構有許多，從AlexNet開始,到VGG-Net,到GoogleNet,到ResNet,denseNet等。網絡結構在不斷地改進，也在不斷地趨於穩定。新的單純地圖像分類結構越來越少（可能是分類效果已經達到了一定的需求）。本文主要講解GoogleNet改進后的Inceptionv3網絡結構。其網絡結構如下所示:

該網絡在ILSVRC 2012的分類挑戰上能獲得5.6%的top-5 error。在參數量方面遠小於VGG-Net，所以能有更塊地訓練速度以及不錯的分類精度。文章中提到了4個通用的網絡設計原則。

簡單來講就是：1、不要在網絡的一開始使用過大的filter size,這會導致圖像信息的丟失；2、高維數據的表示更容易在網絡內進行局部處理，添加激活函數可以獲得更多的disentangled features （不知道怎么翻譯，有知道的大佬可否在評論底下說說?)；3、空間聚合可以通過低維嵌入來完成，其表示能力沒有太多或任何損失。（這里講的就是網絡中inception模塊的分成4個branch最后聚合在一起所使用的原則）；4、平衡網絡的寬度和深度。

卷積核的分解

文章的核心部分在於其inception modules。而inception modules中又用到了factorization(將的filter size 分解成多個小的filter size)，其原理可以用如下的圖表示：

假設有一個5x5的feature map，我們可以直接用一個5x5的filter對其做卷積得到1個值，也可以通過兩個3x3的filter對其做卷積得到1個值，但相較於前者，后者有更少地參數:3x3x2=18。前者為5x5=25。可以減少的參數量為：(25-18)/25=28%。

在此基礎上，論文又提出可以使用使用非對稱的卷積核來替代較大的卷積核。如下圖所示：

對於一個3x3的卷積核，可以使用一個1x3和一個3x1的組合來替代。一般化地話，可以使用1xn和nx1替代nxn的卷積核。

輔助分類器

輔助分類器即除了主分類器之外，還在網絡結構中的某一層，論文中為17x17x768的那一層，添加了一個分支用來做輔助分類。其思想來源於GoogleNet(Going deeper with convolutions) 。

網絡尺寸的有效減少

在論文中給出的網絡結構中,3xInception和5xInception以及5xInception和2xInception有一個尺寸的減少，其具體實現方法為如下所示：

這里一並給出相關的代碼實現：

def inception_grid_reduction_1(input,name=None):
 
    with tf.variable_scope(name) as scope:
        with tf.variable_scope("Branch_0"):
            branch_0=conv_inception(input,shape = [1,1,288,384],name = '0a_1x1')
            branch_0=conv_inception(branch_0,shape = [3,3,384,384],stride = [1,2,2,1],padding = 'VALID',name = '0b_3x3')
        with tf.variable_scope('Branch_1'):
            branch_1=conv_inception(input,shape = [1,1,288,64],name = '0b_1x1')
            branch_1=conv_inception(branch_1,shape = [3,3,64,96],name = '0b_3x3')
            branch_1=conv_inception(branch_1,shape = [3,3,96,96],stride = [1,2,2,1],padding = 'VALID',name = '0c_3x3')
        with tf.variable_scope('Branch_2'):
            branch_2=tf.nn.max_pool(input,ksize = (1,3,3,1),strides = [1,2,2,1],padding = 'VALID',name = 'maxpool_0a_3x3')
        inception_out=tf.concat([branch_0,branch_1,branch_2],3)
        c=1 # for debug
        return inception_out

其中conv_inception函數定義如下：

def conv_inception(input, shape, stride= [1,1,1,1], activation = True, padding = 'SAME', name = None):
    in_channel = shape[2]
    out_channel = shape[3]
    k_size = shape[0]
    with tf.variable_scope(name) as scope:
        kernel = _variable('conv_weights', shape = shape)
        conv = tf.nn.conv2d(input = input, filter = kernel, strides = stride, padding = padding)
        biases = _variable('biases', [out_channel])
        bias = tf.nn.bias_add(conv, biases)
        if activation is True:
            conv_out = tf.nn.relu(bias, name = 'relu')
        else:
            conv_out = bias
        return conv_out

_variable定義如下：

def _variable(name, shape):
  """Helper to create a Variable stored on CPU memory.
  Args:
    name: name of the variable
    shape: list of ints
  Returns:
    Variable Tensor
  """
  with tf.device('/gpu:0'):
    var = tf.get_variable(name, shape)
  return var

下面給出網絡中每一部分的解釋以及實現：

文章中的卷積部分就不講了，基本操作。主要講講inception部分怎么做。論文中共用到了三種Inception modules，即傳統的inception（如GoogleNet所示)，以及使用了非對稱分解卷積核的inception,以及加入了filter expanded的inception。先說說傳統的，如圖所示：

這里Base的input size在網絡中對應為35x35x288，有4個分支，其中pool為平均池化-avgpool，最后將4個分支串到一起，其代碼實現如下：

def inception_block_tradition(input, name=None):
 
    with tf.variable_scope(name) as scope:
        with tf.variable_scope("Branch_0"):
            branch_0=conv_inception(input,shape = [1,1,288,64],name = '0a_1x1')
        with tf.variable_scope('Branch_1'):
            branch_1=conv_inception(input,shape = [1,1,288,48],name = '0a_1x1')
            branch_1=conv_inception(branch_1,shape = [5,5,48,64],name = '0b_5x5')
        with tf.variable_scope("Branch_2"):
            branch_2=conv_inception(input,shape = [1,1,288,64],name = '0a_1x1')
            branch_2=conv_inception(branch_2,shape = [3,3,64,96],name = '0b_3x3')
        with tf.variable_scope('Branch_3'):
            branch_3=tf.nn.avg_pool(input,ksize = (1,3,3,1),strides = [1,1,1,1],padding = 'SAME',name = 'Avgpool_0a_3x3')
            branch_3=conv_inception(branch_3,shape = [1,1,288,64],name = '0b_1x1')
        inception_out=tf.concat([branch_0,branch_1,branch_2,branch_3],3)
        b=1 # for debug
        return inception_out

接下來是使用了非對稱分解的Inception moduels，如下圖所示:

這里n=7，Base為17x17x768；pool為 3x3 stride為1的avgpool（同上）；其代碼實現如下：

def inception_block_factorization(input,name=None):
 
    with tf.variable_scope(name) as scope:
        with tf.variable_scope('Branch_0'):
            branch_0=conv_inception(input,shape = [1,1,768,192],name = '0a_1x1')
        with tf.variable_scope('Branch_1'):
            branch_1=conv_inception(input,shape = [1,1,768,128],name = '0a_1x1')
            branch_1=conv_inception(branch_1,shape = [1,7,128,128],name = '0b_1x7')
            branch_1=conv_inception(branch_1,shape = [7,1,128,128],name = '0c_7x1')
            branch_1=conv_inception(branch_1,shape = [1,7,128,128],name = '0d_1x7')
            branch_1=conv_inception(branch_1,shape = [7,1,128,192],name = '0e_7x1')
        with tf.variable_scope('Branch_2'):
            branch_2=conv_inception(input,shape = [1,1,768,128],name = '0a_1x1')
            branch_2=conv_inception(branch_2,shape = [1,7,128,128],name = '0b_1x7')
            branch_2=conv_inception(branch_2,shape = [7,1,128,192],name = '0c_7x1')
        with tf.variable_scope('Branch_3'):
            branch_3=tf.nn.avg_pool(input,ksize = (1,3,3,1),strides = [1,1,1,1],padding = 'SAME',name = 'Avgpool_0a_3x3')
            branch_3=conv_inception(branch_3,shape = [1,1,768,192],name = '0b_1x1')
        inception_out=tf.concat([branch_0,branch_1,branch_2,branch_3],3)
        d=1 # for debug
        return inception_out

接下來使用了filter expanded的inception，如圖所示:

也是4個分支，pool同上。其代碼實現如下：

def inception_block_expanded(input,name=None):
    with tf.variable_scope(name) as scope:
        with tf.variable_scope('Branch_0'):
            branch_0=conv_inception(input,shape = [1,1,2048,320],name = '0a_1x1')
        with tf.variable_scope('Branch_1'):
            branch_1=conv_inception(input,shape = [1,1,2048,448],name = '0a_1x1')
            branch_1=conv_inception(branch_1,shape = [3,3,448,384],name = '0b_3x3')
            branch_1=tf.concat([conv_inception(branch_1,shape = [1,3,384,384],name = '0c_1x3'),
                                conv_inception(branch_1,shape = [3,1,384,384],name = '0d_3x1')],3)
        with tf.variable_scope('Branch_2'):
            branch_2=conv_inception(input,shape = [1,1,2048,384],name = '0a_1x1')
            branch_2=tf.concat([conv_inception(branch_2,shape = [1,3,384,384],name = '0b_1x3'),
                                conv_inception(branch_2,shape = [3,1,384,384],name = '0c_3x1')],3)
        with tf.variable_scope('Branch_3'):
            branch_3=tf.nn.avg_pool(input,ksize = (1,3,3,1),strides = [1,1,1,1],padding = 'SAME',name = 'Avgpool_0a_3x3')
            branch_3=conv_inception(branch_3,shape = [1,1,2048,192],name = '0b_1x1')
        inception_out=tf.concat([branch_0,branch_1,branch_2,branch_3],3)
        e=1 # for debug
        return inception_out

經過上述操作可得到8x8x2048的feature maps，根據論文中的結構，對其做池化操作並加入1x1的卷積得到我們最終需要的1x1xnum_class即可，其實現如下（不唯一）：

        with tf.variable_scope('Logits'):
            net=tf.nn.avg_pool(net,ksize = [8,8,2048,2048],strides = [1,1,1,1],padding = 'VALID',name = 'Avgpool_1a_8x8')
 
            # 1x1x2048
            net=tf.nn.dropout(net,keep_prob = dropout_keep_prob,name = 'Dropout_1b')
            end_points['PreLogits']=net
            # 2048
            logits=conv_inception(net,shape = [1,1,2048,num_classes],activation = False,name = 'conv_1c_1x1')
 
            end_points['Logits']=logits
            end_points['Predictions']=tf.nn.softmax(logits,name = 'Predictions')
            return logits,end_points

論文中提及到的優化方法有SGD和RMSProp。可以隨便選擇，論文中得到的最佳模型為使用了RMSProp的方法。

附上代碼下載地址：

tensorflow+inceptionv3

PS：數據集需要自行提供。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 tensorflow訓練自己的數據集實現CNN圖像分類1 基於Tensorflow + Opencv 實現CNN自定義圖像分類【深度學習】keras + tensorflow 實現貓和狗圖像分類 textRNN & textCNN的網絡結構與代碼實現！ Tensorflow最簡單實現ResNet50殘差神經網絡，進行圖像分類，速度超快 TensorFlow2.0 教程8：圖像分類大話深度學習：B站Up主麥叔教你零代碼實現圖像分類神經網絡 Caffe訓練好的網絡對圖像分類基於神經網絡的圖像分類代碼訓練3，圖像分類模型代碼