Global Average Pooling(簡稱GAP,全局池化層)技術最早提出是在這篇論文(第3.2節)中,被認為是可以替代全連接層的一種新技術。在keras發布的經典模型中,可以看到不少模型甚至拋棄了全連接層,轉而使用GAP,而在支持遷移學習方面,各個模型幾乎都支持使用Global Average Pooling和Global Max Pooling(GMP)。 然而,GAP是否真的可以取代全連接層?其背后的原理何在呢?本文來一探究竟。
一、什么是GAP?
先看看原論文的定義:
In this paper, we propose another strategy called global average pooling to replace the traditional fully connected layers in CNN. The idea is to generate one feature map for each corresponding category of the classification task in the last mlpconv layer. Instead of adding fully connected layers on top of the feature maps, we take the average of each feature map, and the resulting vector is fed directly into the softmax layer. One advantage of global average pooling over the fully connected layers is that it is more native to the convolution structure by enforcing correspondences between feature maps and categories. Thus the feature maps can be easily interpreted as categories confidence maps. Another advantage is that there is no parameter to optimize in the global average pooling thus overfitting is avoided at this layer. Futhermore, global average pooling sums out the spatial information, thus it is more robust to spatial translations of the input.
簡單來說,就是在卷積層之后,用GAP替代FC全連接層。有兩個有點:一是GAP在特征圖與最終的分類間轉換更加簡單自然;二是不像FC層需要大量訓練調優的參數,降低了空間參數會使模型更加健壯,抗過擬合效果更佳。
我們再用更直觀的圖像來看GAP的工作原理:
假設卷積層的最后輸出是h × w × d 的三維特征圖,具體大小為6 × 6 × 3,經過GAP轉換后,變成了大小為 1 × 1 × 3 的輸出值,也就是每一層 h × w 會被平均化成一個值。
二、 GAP在Keras中的定義
GAP的使用一般在卷積層之后,輸出層之前:
x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block5_pool')(x) #卷積層最后一層 x = layers.GlobalAveragePooling2D()(x) #GAP層 prediction = Dense(10, activation='softmax')(x) #輸出層
再看看GAP的代碼具體實現:
@tf_export('keras.layers.GlobalAveragePooling2D', 'keras.layers.GlobalAvgPool2D') class GlobalAveragePooling2D(GlobalPooling2D): """Global average pooling operation for spatial data. Arguments: data_format: A string, one of `channels_last` (default) or `channels_first`. The ordering of the dimensions in the inputs. `channels_last` corresponds to inputs with shape `(batch, height, width, channels)` while `channels_first` corresponds to inputs with shape `(batch, channels, height, width)`. It defaults to the `image_data_format` value found in your Keras config file at `~/.keras/keras.json`. If you never set it, then it will be "channels_last". Input shape: - If `data_format='channels_last'`: 4D tensor with shape: `(batch_size, rows, cols, channels)` - If `data_format='channels_first'`: 4D tensor with shape: `(batch_size, channels, rows, cols)` Output shape: 2D tensor with shape: `(batch_size, channels)` """ def call(self, inputs): if self.data_format == 'channels_last': return backend.mean(inputs, axis=[1, 2]) else: return backend.mean(inputs, axis=[2, 3])
實現很簡單,對寬度和高度兩個維度的特征數據進行平均化求值。如果是NHWC結構(數量、寬度、高度、通道數),則axis=[1, 2];反之如果是CNHW,則axis=[2, 3]。
三、GAP VS GMP VS FC
在驗證GAP技術可行性前,我們需要准備訓練和測試數據集。我在牛津大學網站上找到了17種不同花類的數據集,地址為:http://www.robots.ox.ac.uk/~vgg/data/flowers/17/index.html 。該數據集每種花有80張圖片,共計1360張圖片,我對花進行了分類處理,抽取了部分數據作為測試數據,這樣最終訓練和測試數據的數量比為7:1。
我將數據集上傳到我的百度網盤: https://pan.baidu.com/s/1YDA_VOBlJSQEijcCoGC60w ,大家可以下載使用。
在Keras經典模型中,若支持遷移學習,不但有GAP,還有GMP,而默認是自己組建FC層,一個典型的實現為:
if include_top: # Classification block x = layers.Flatten(name='flatten')(x) x = layers.Dense(4096, activation='relu', name='fc1')(x) x = layers.Dense(4096, activation='relu', name='fc2')(x) x = layers.Dense(classes, activation='softmax', name='predictions')(x) else: if pooling == 'avg': x = layers.GlobalAveragePooling2D()(x) elif pooling == 'max': x = layers.GlobalMaxPooling2D()(x)
本文將在同一數據集條件下,比較GAP、GMP和FC層的優劣,選取測試模型為VGG19和InceptionV3兩種模型的遷移學習版本。
先看看在VGG19模型下,GAP、GMP和FC層在各自迭代50次后,驗證准確度和損失度的比對。代碼如下:
import keras from keras.preprocessing.image import ImageDataGenerator from keras.models import Model from keras.applications.vgg19 import VGG19from keras.layers import Dense, Flatten from matplotlib import pyplot as plt import numpy as np # 為保證公平起見,使用相同的隨機種子 np.random.seed(7) batch_size = 32 # 迭代50次 epochs = 50 # 依照模型規定,圖片大小被設定為224 IMAGE_SIZE = 224 # 17種花的分類 NUM_CLASSES = 17 TRAIN_PATH = '/home/yourname/Documents/tensorflow/images/17flowerclasses/train' TEST_PATH = '/home/yourname/Documents/tensorflow/images/17flowerclasses/test' FLOWER_CLASSES = ['Bluebell', 'ButterCup', 'ColtsFoot', 'Cowslip', 'Crocus', 'Daffodil', 'Daisy', 'Dandelion', 'Fritillary', 'Iris', 'LilyValley', 'Pansy', 'Snowdrop', 'Sunflower', 'Tigerlily', 'tulip', 'WindFlower'] def model(mode='fc'): if mode == 'fc': # FC層設定為含有512個參數的隱藏層 base_model = VGG19(input_shape=(IMAGE_SIZE, IMAGE_SIZE, 3), include_top=False, pooling='none') x = base_model.output x = Flatten()(x) x = Dense(512, activation='relu')(x) prediction = Dense(NUM_CLASSES, activation='softmax')(x) elif mode == 'avg': # GAP層通過指定pooling='avg'來設定 base_model = VGG19(input_shape=(IMAGE_SIZE, IMAGE_SIZE, 3), include_top=False, pooling='avg') x = base_model.output prediction = Dense(NUM_CLASSES, activation='softmax')(x) else: # GMP層通過指定pooling='max'來設定 base_model = VGG19(input_shape=(IMAGE_SIZE, IMAGE_SIZE, 3), include_top=False, pooling='max') x = base_model.output prediction = Dense(NUM_CLASSES, activation='softmax')(x) model = Model(input=base_model.input, output=prediction) model.summary() opt = keras.optimizers.rmsprop(lr=0.0001, decay=1e-6) model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy']) # 使用數據增強 train_datagen = ImageDataGenerator() train_generator = train_datagen.flow_from_directory(directory=TRAIN_PATH, target_size=(IMAGE_SIZE, IMAGE_SIZE), classes=FLOWER_CLASSES) test_datagen = ImageDataGenerator() test_generator = test_datagen.flow_from_directory(directory=TEST_PATH, target_size=(IMAGE_SIZE, IMAGE_SIZE), classes=FLOWER_CLASSES) # 運行模型 history = model.fit_generator(train_generator, epochs=epochs, validation_data=test_generator) return history fc_history = model('fc') avg_history = model('avg') max_history = model('max') # 比較多種模型的精確度 plt.plot(fc_history.history['val_acc']) plt.plot(avg_history.history['val_acc']) plt.plot(max_history.history['val_acc']) plt.title('Model accuracy') plt.ylabel('Validation Accuracy') plt.xlabel('Epoch') plt.legend(['FC', 'AVG', 'MAX'], loc='lower right') plt.grid(True) plt.show() # 比較多種模型的損失率 plt.plot(fc_history.history['val_loss']) plt.plot(avg_history.history['val_loss']) plt.plot(max_history.history['val_loss']) plt.title('Model loss') plt.ylabel('Loss') plt.xlabel('Epoch') plt.legend(['FC', 'AVG', 'MAX'], loc='upper right') plt.grid(True) plt.show()
各自運行50次迭代后,我們看看准確度比較:
再看看損失度比較:
可以看出,首先GMP在本模型中表現太差,不值一提;而FC在前40次迭代時表現尚可,但到了40次后發生了劇烈變化,出現了過擬合現象(運行20次左右時的模型相對較好,但准確率不足70%,模型還是很差);三者中表現最好的是GAP,無論從准確度還是損失率,表現都較為平穩,抗過擬合化效果明顯(但最終的准確度70%,模型還是不行)。
我們再轉向另一個模型InceptionV3,代碼稍加改動如下:
import keras from keras.preprocessing.image import ImageDataGenerator from keras.models import Model from keras.applications.inception_v3 import InceptionV3, preprocess_input from keras.layers import Dense, Flatten from matplotlib import pyplot as plt import numpy as np # 為保證公平起見,使用相同的隨機種子 np.random.seed(7) batch_size = 32 # 迭代50次 epochs = 50 # 依照模型規定,圖片大小被設定為224 IMAGE_SIZE = 224 # 17種花的分類 NUM_CLASSES = 17 TRAIN_PATH = '/home/hutao/Documents/tensorflow/images/17flowerclasses/train' TEST_PATH = '/home/hutao/Documents/tensorflow/images/17flowerclasses/test' FLOWER_CLASSES = ['Bluebell', 'ButterCup', 'ColtsFoot', 'Cowslip', 'Crocus', 'Daffodil', 'Daisy', 'Dandelion', 'Fritillary', 'Iris', 'LilyValley', 'Pansy', 'Snowdrop', 'Sunflower', 'Tigerlily', 'tulip', 'WindFlower'] def model(mode='fc'): if mode == 'fc': # FC層設定為含有512個參數的隱藏層 base_model = InceptionV3(input_shape=(IMAGE_SIZE, IMAGE_SIZE, 3), include_top=False, pooling='none') x = base_model.output x = Flatten()(x) x = Dense(512, activation='relu')(x) prediction = Dense(NUM_CLASSES, activation='softmax')(x) elif mode == 'avg': # GAP層通過指定pooling='avg'來設定 base_model = InceptionV3(input_shape=(IMAGE_SIZE, IMAGE_SIZE, 3), include_top=False, pooling='avg') x = base_model.output prediction = Dense(NUM_CLASSES, activation='softmax')(x) else: # GMP層通過指定pooling='max'來設定 base_model = InceptionV3(input_shape=(IMAGE_SIZE, IMAGE_SIZE, 3), include_top=False, pooling='max') x = base_model.output prediction = Dense(NUM_CLASSES, activation='softmax')(x) model = Model(input=base_model.input, output=prediction) model.summary() opt = keras.optimizers.rmsprop(lr=0.0001, decay=1e-6) model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy']) # 使用數據增強 train_datagen = ImageDataGenerator() train_generator = train_datagen.flow_from_directory(directory=TRAIN_PATH, target_size=(IMAGE_SIZE, IMAGE_SIZE), classes=FLOWER_CLASSES) test_datagen = ImageDataGenerator() test_generator = test_datagen.flow_from_directory(directory=TEST_PATH, target_size=(IMAGE_SIZE, IMAGE_SIZE), classes=FLOWER_CLASSES) # 運行模型 history = model.fit_generator(train_generator, epochs=epochs, validation_data=test_generator) return history fc_history = model('fc') avg_history = model('avg') max_history = model('max') # 比較多種模型的精確度 plt.plot(fc_history.history['val_acc']) plt.plot(avg_history.history['val_acc']) plt.plot(max_history.history['val_acc']) plt.title('Model accuracy') plt.ylabel('Validation Accuracy') plt.xlabel('Epoch') plt.legend(['FC', 'AVG', 'MAX'], loc='lower right') plt.grid(True) plt.show() # 比較多種模型的損失率 plt.plot(fc_history.history['val_loss']) plt.plot(avg_history.history['val_loss']) plt.plot(max_history.history['val_loss']) plt.title('Model loss') plt.ylabel('Loss') plt.xlabel('Epoch') plt.legend(['FC', 'AVG', 'MAX'], loc='upper right') plt.grid(True) plt.show()
先看看准確度的比較:
再看看損失度的比較:
很明顯,在InceptionV3模型下,FC、GAP和GMP都表現很好,但可以看出GAP的表現依舊最好,其准確度普遍在90%以上,而另兩種的准確度在80~90%之間。
四、結論
從本實驗看出,在數據集有限的情況下,采用經典模型進行遷移學習時,GMP表現不太穩定,FC層由於訓練參數過多,更易導致過擬合現象的發生,而GAP則表現穩定,優於FC層。當然具體情況具體分析,我們拿到數據集后,可以在幾種方式中多訓練測試,以尋求最優解決方案。