《python深度學習》筆記---5.4-3、卷積網絡可視化-熱力圖


《python深度學習》筆記---5.4-3、卷積網絡可視化-熱力圖

一、總結

一句話總結:

【一張圖像的哪一部分讓卷積神經網絡做出了最終的分類決策】:可視化類激活的熱力圖,它有助於了解一張圖像的哪一部分讓卷積神經網絡做出了 最終的分類決策。這有助於對卷積神經網絡的決策過程進行調試,特別是出現分類錯誤的情況下。 這種方法還可以定位圖像中的特定目標。
【類激活圖】:這種通用的技術叫作類激活圖(CAM,class activation map)可視化,它是指對輸入圖像生 成類激活的熱力圖。
【每個位置對該類別的重要程度】:類激活熱力圖是與特定輸出類別相關的二維分數網格,對任何輸入圖像的 每個位置都要進行計算,它表示每個位置對該類別的重要程度。

 

 

1、類激活的熱力圖解決本實例中的問題?

網絡為什么會認為這張圖像中包含一頭非洲象?
非洲象在圖像中的什么位置?

 

 

 

二、5.4-3、卷積網絡可視化-熱力圖

博客對應課程的視頻位置:

Visualizing heatmaps of class activation

We will introduce one more visualization technique, one that is useful for understanding which parts of a given image led a convnet to its final classification decision. This is helpful for "debugging" the decision process of a convnet, in particular in case of a classification mistake. It also allows you to locate specific objects in an image.

This general category of techniques is called "Class Activation Map" (CAM) visualization, and consists in producing heatmaps of "class activation" over input images. A "class activation" heatmap is a 2D grid of scores associated with an specific output class, computed for every location in any input image, indicating how important each location is with respect to the class considered. For instance, given a image fed into one of our "cat vs. dog" convnet, Class Activation Map visualization allows us to generate a heatmap for the class "cat", indicating how cat-like different parts of the image are, and likewise for the class "dog", indicating how dog-like differents parts of the image are.

The specific implementation we will use is the one described in Grad-CAM: Why did you say that? Visual Explanations from Deep Networks via Gradient-based Localization. It is very simple: it consists in taking the output feature map of a convolution layer given an input image, and weighing every channel in that feature map by the gradient of the class with respect to the channel. Intuitively, one way to understand this trick is that we are weighting a spatial map of "how intensely the input image activates different channels" by "how important each channel is with regard to the class", resulting in a spatial map of "how intensely the input image activates the class".

We will demonstrate this technique using the pre-trained VGG16 network again:

In [24]:
from keras.applications.vgg16 import VGG16 K.clear_session() # Note that we are including the densely-connected classifier on top; # all previous times, we were discarding it. model = VGG16(weights='imagenet') 
Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels.h5
548380672/553467096 [============================>.] - ETA: 0s

Let's consider the following image of two African elephants, possible a mother and its cub, strolling in the savanna (under a Creative Commons license):

elephants

Let's convert this image into something the VGG16 model can read: the model was trained on images of size 224x244, preprocessed according to a few rules that are packaged in the utility function keras.applications.vgg16.preprocess_input. So we need to load the image, resize it to 224x224, convert it to a Numpy float32 tensor, and apply these pre-processing rules.

In [27]:
from keras.preprocessing import image from keras.applications.vgg16 import preprocess_input, decode_predictions import numpy as np # The local path to our target image img_path = '/Users/fchollet/Downloads/creative_commons_elephant.jpg' # `img` is a PIL image of size 224x224 img = image.load_img(img_path, target_size=(224, 224)) # `x` is a float32 Numpy array of shape (224, 224, 3) x = image.img_to_array(img) # We add a dimension to transform our array into a "batch" # of size (1, 224, 224, 3) x = np.expand_dims(x, axis=0) # Finally we preprocess the batch # (this does channel-wise color normalization) x = preprocess_input(x) 
In [29]:
preds = model.predict(x) print('Predicted:', decode_predictions(preds, top=3)[0]) 
Predicted: [('n02504458', 'African_elephant', 0.90942144), ('n01871265', 'tusker', 0.08618243), ('n02504013', 'Indian_elephant', 0.0043545929)]

The top-3 classes predicted for this image are:

  • African elephant (with 92.5% probability)
  • Tusker (with 7% probability)
  • Indian elephant (with 0.4% probability)

Thus our network has recognized our image as containing an undetermined quantity of African elephants. The entry in the prediction vector that was maximally activated is the one corresponding to the "African elephant" class, at index 386:

In [30]:
np.argmax(preds[0]) 
Out[30]:
386

To visualize which parts of our image were the most "African elephant"-like, let's set up the Grad-CAM process:

In [31]:
# This is the "african elephant" entry in the prediction vector
african_elephant_output = model.output[:, 386] # The is the output feature map of the `block5_conv3` layer, # the last convolutional layer in VGG16 last_conv_layer = model.get_layer('block5_conv3') # This is the gradient of the "african elephant" class with regard to # the output feature map of `block5_conv3` grads = K.gradients(african_elephant_output, last_conv_layer.output)[0] # This is a vector of shape (512,), where each entry # is the mean intensity of the gradient over a specific feature map channel pooled_grads = K.mean(grads, axis=(0, 1, 2)) # This function allows us to access the values of the quantities we just defined: # `pooled_grads` and the output feature map of `block5_conv3`, # given a sample image iterate = K.function([model.input], [pooled_grads, last_conv_layer.output[0]]) # These are the values of these two quantities, as Numpy arrays, # given our sample image of two elephants pooled_grads_value, conv_layer_output_value = iterate([x]) # We multiply each channel in the feature map array # by "how important this channel is" with regard to the elephant class for i in range(512): conv_layer_output_value[:, :, i] *= pooled_grads_value[i] # The channel-wise mean of the resulting feature map # is our heatmap of class activation heatmap = np.mean(conv_layer_output_value, axis=-1) 

For visualization purpose, we will also normalize the heatmap between 0 and 1:

In [32]:
heatmap = np.maximum(heatmap, 0) heatmap /= np.max(heatmap) plt.matshow(heatmap) plt.show() 

Finally, we will use OpenCV to generate an image that superimposes the original image with the heatmap we just obtained:

In [ ]:
import cv2 # We use cv2 to load the original image img = cv2.imread(img_path) # We resize the heatmap to have the same size as the original image heatmap = cv2.resize(heatmap, (img.shape[1], img.shape[0])) # We convert the heatmap to RGB heatmap = np.uint8(255 * heatmap) # We apply the heatmap to the original image heatmap = cv2.applyColorMap(heatmap, cv2.COLORMAP_JET) # 0.4 here is a heatmap intensity factor superimposed_img = heatmap * 0.4 + img # Save the image to disk cv2.imwrite('/Users/fchollet/Downloads/elephant_cam.jpg', superimposed_img) 

elephant cam

This visualisation technique answers two important questions:

  • Why did the network think this image contained an African elephant?
  • Where is the African elephant located in the picture?

In particular, it is interesting to note that the ears of the elephant cub are strongly activated: this is probably how the network can tell the difference between African and Indian elephants.

 

 

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM