//20201018 update
寫在前面:
前幾天上完了NG的卷積神經網絡第二章,並完成了相應的作業,在這里總結一下,作業是用Tensorflow2實現ResNet殘差網絡,本文主要說一下殘差網絡的架構以及實現方法(本人初學者,如若有寫的不對的地方還請大家指出/拜托/拜托)
1.ResNets殘差網絡簡介
首先,非常深的神經網絡是很難訓練的,因為存在梯度消失和梯度爆炸的問題。ResNets是由殘差塊(Residual block)構建的
關於殘差塊(殘差網絡的核心):簡單的來說就是在原本前向傳播的進程中插入一個shortcut(捷徑),捷徑會傳遞原本需要通過前向傳播的參數的“副本”,然后在捷徑的盡頭和前向傳播的“源本”疊加(函數表達式抽象為H(x) = F(x) + x);原理:簡單來說,就是在原本前向傳播函數上加一個x參數,這樣求導的時候,無論如何都會存在一個1,就可以解決深層網絡多次迭代后梯度趨近零而導致梯度下降緩慢的問題
2.Tensorflow2 實現代碼(IDE使用PyCharm)
import tensorflow as tf from tensorflow import keras from tensorflow.keras import layers,Sequential import numpy as np import matplotlib.pyplot as plt ''' 繼承Layer基本屬性自定義類 ''' class BasicBlock(layers.Layer): def __init__(self,filter_num,stride = 1): super(BasicBlock,self).__init__() self.conv1 = layers.Conv2D(filter_num,(3,3),strides = stride,padding = 'same') self.bn1 = layers.BatchNormalization() self.ac1 = layers.Activation('relu') self.conv2 = layers.Conv2D(filter_num,(3,3),strides = 1,padding = 'same') self.bn2 = layers.BatchNormalization() # 控制通過捷徑的參數和通過普通道路的參數尺寸一樣 if stride != 1: self.downsample = Sequential() self.downsample.add(layers.Conv2D(filter_num,(1,1),strides = stride)) else: self.downsample = lambda x:x self.ac2 = layers.Activation('relu') def call(self,inputs,training=None): out = self.conv1(inputs) out = self.bn1(out) out = self.ac1(out) out = self.conv2(out) out = self.bn2(out) # 如果輸入中的步長為1,則identity = inputs,否則,需要經過一層卷積網絡調整size identity = self.downsample(inputs) output = self.ac2(identity+out) return output ''' 繼承Model基本屬性創造基本模型 ''' class ResNet(keras.Model): def __init__(self,layer_dims,num_classes=6): super(ResNet,self).__init__() self.prev = Sequential([ layers.Conv2D(64,(3,3),strides=(1,1)), layers.BatchNormalization(), layers.Activation('relu'), layers.MaxPooling2D(pool_size=(2,2),strides=(1,1)) ]) self.layer1 = self.build_resblock(64,layer_dims[0]) self.layer2 = self.build_resblock(128,layer_dims[1],stride = 2) self.layer3 = self.build_resblock(256,layer_dims[2],stride = 2) self.layer4 = self.build_resblock(512,layer_dims[3],stride = 2) self.avgpool = layers.GlobalAveragePooling2D() self.fc = layers.Dense(num_classes) def call(self,inputs,training=None): x = self.prev(inputs)# initialize data x = self.layer1(x) x = self.layer2(x) x = self.layer3(x) x = self.layer4(x) x = self.avgpool(x) output = self.fc(x) return output# 到此為止整個殘差網絡構建完畢 def build_resblock(selfself,filter_num,blocks,stride=1): res_blocks = Sequential() res_blocks.add(BasicBlock(filter_num,stride)) for i in range(1,blocks): res_blocks.add(BasicBlock(filter_num,stride = 1)) return res_blocks def resnet18(): return ResNet([2,2,2,2]) model = resnet18() model.build(input_shape=(64,32,32,3)) print(model.summary())
基本流程簡介:
- 首先創建殘差塊類(需繼承layer類)——也就是自定義layer模塊,這里需要注意的就是,如果在過程中參數矩陣的規格有變化的話,會導致走捷徑的x和原本前向傳播的x規格不匹配,所以在規格變化時需要在捷徑中加上Conv層來控制size,最后合並的時候,根據殘差塊模型累加layer就可
- 然后創建殘差網絡模塊(需繼承model類)——也就是自定義Model模塊,定義好之后,在調用函數中按照傳入參數將殘差塊累加在一起就好
這里只輸出了summary,沒有代入具體環境
summary如下:
以上
希望對大家有所幫助