VGG是2014年ILSVRC圖像分類競賽的第二名,相比當年的冠軍GoogleNet在可擴展性方面更勝一籌,此外,它也是從圖像中提取特征的CNN首選算法,VGG的各種網絡模型結構如下:
今天代碼的原型是基於VGG13,也就是上圖的B類,可以看到它的參數量是很可觀的。
因為設備和時間問題,網絡並沒有訓練完成,但是已經看到參數變化的效果。(畢竟VGG團隊在最初訓練時使用4塊顯卡並行計算還訓練了2-3周,雖然當今顯卡性能已經有了明顯的提升,但是只能CPU訓練的小可憐實在不敢繼續下去了)
直接上代碼吧
import tensorflow as tf from tensorflow import keras import os os.environ['TF_CPP_MIN_LOG'] = '2' conv_layers = [ # part 1 keras.layers.Conv2D(64,kernel_size=[3,3],padding='same',activation=tf.nn.relu), keras.layers.Conv2D(64,kernel_size=[3,3],padding='same',activation=tf.nn.relu), keras.layers.MaxPool2D(pool_size=[2,2],strides=2,padding='same'), # part 2 keras.layers.Conv2D(128,kernel_size=[3,3],padding='same',activation=tf.nn.relu), keras.layers.Conv2D(128,kernel_size=[3,3],padding='same',activation=tf.nn.relu), keras.layers.MaxPool2D(pool_size=[2,2],strides=2,padding='same'), # part 3 keras.layers.Conv2D(256,kernel_size=[3,3],padding='same',activation=tf.nn.relu), keras.layers.Conv2D(256,kernel_size=[3,3],padding='same',activation=tf.nn.relu), keras.layers.MaxPool2D(pool_size=[2,2],strides=2,padding='same'), # part 4 keras.layers.Conv2D(512,kernel_size=[3,3],padding='same',activation=tf.nn.relu), keras.layers.Conv2D(512,kernel_size=[3,3],padding='same',activation=tf.nn.relu), keras.layers.MaxPool2D(pool_size=[2,2],strides=2,padding='same'), # part 5 keras.layers.Conv2D(512,kernel_size=[3,3],padding='same',activation=tf.nn.relu), keras.layers.Conv2D(512,kernel_size=[3,3],padding='same',activation=tf.nn.relu), keras.layers.MaxPool2D(pool_size=[2,2],strides=2,padding='same'), ] fc_layers =[ keras.layers.Dense(4096,activation = tf.nn.relu), keras.layers.Dense(4096,activation = tf.nn.relu), keras.layers.Dense(10) ] def preprocess(x,y): x = tf.cast(x,dtype=tf.float32)/255. y = tf.cast(y,dtype=tf.int32) return x,y (x,y),(x_test,y_test) = keras.datasets.cifar100.load_data() y = tf.squeeze(y,axis=1) y_test = tf.squeeze(y_test,axis=1) print(x.shape,y.shape,x_test.shape,y_test.shape) train_db = tf.data.Dataset.from_tensor_slices((x,y)) train_db = train_db.shuffle(1000).map(preprocess).batch(64) test_db = tf.data.Dataset.from_tensor_slices((x_test,y_test)) test_db = train_db.map(preprocess).batch(64) def main(): conv_net = keras.Sequential(conv_layers) conv_net.build(input_shape=[None,32,32,3]) fc_net = keras.Sequential(fc_layers) fc_net.build(input_shape=[None,512]) optimizer = keras.optimizers.Adam(lr=1e-4) for epoch in range(50): for step,(x,y) in enumerate(train_db): with tf.GradientTape() as tape: out = conv_net(x) out = tf.reshape(out,[-1,512]) logits = fc_net(out) y_onehot = tf.one_hot(y,depth=10) loss = tf.losses.categorical_crossentropy(y_onehot,logits,from_logits=True) loss = tf.reduce_mean(loss) gradient = tape.gradient(loss,conv_net.trainable_variables + fc_net.trainable) optimizer.apply_gradients(zip(gradient,conv_net.trainable_variables + fc_net.trainable)) if step % 100 == 0: print(epoch,step,'loss:',float(loss)) total_num = 0 total_correct = 0 for x,y in test_db: out = conv_net(x) out = tf.reshape(out,[-1,512]) logits = fc_net(out) prob = tf.nn.softmax(logits,axis=1) pred = tf.argmax(prob,axis=1) pred = tf.cast(pred,dtype=tf.int32) correct = tf.cast(tf.equal(pred,y),dtype=tf.int32) correct = tf.reduce_sum(correct) total_num += x.shape[0] total_correct += correct acc = total_correct/total_num print("acc:",acc) if __name__ == '__main__': main()
通過這樣一個網絡模型的搭建,確實又加深了我對神經網絡的認識以及tensorflow使用的熟練度,果然上機才是最佳學習方式!