Saliency Maps
一張saliency map告訴了我們在圖片中的每個像素點對於這張圖片最后的預測得分的影響程度。為了計算它,我們要計算正確的那個類的未歸一化的打分對於圖片中每個像素點的梯度。如果圖片的尺寸是(H,W,3),那么梯度的尺寸也應該是(H,W,3);對於圖片中的每個像素點,梯度值反映了如果某個像素點的值改變一點點,分類的打分(score)會改變的程度大小。為了計算saliency map, 我們用梯度的絕對值,然后在3個channel上面求最大值,因此最后的saliency map的形狀應該是(H,W),並且所有的值都是非負數。
def compute_saliency_maps(X, y, model):
"""
Compute a class saliency map using the model for images X and labels y.
Input:
- X: Input images, numpy array of shape (N, H, W, 3)
- y: Labels for X, numpy of shape (N,)
- model: A SqueezeNet model that will be used to compute the saliency map.
Returns:
- saliency: A numpy array of shape (N, H, W) giving the saliency maps for the
input images.
"""
saliency = None
# Compute the score of the correct class for each example.
# This gives a Tensor with shape [N], the number of examples.
#
# Note: this is equivalent to scores[np.arange(N), y] we used in NumPy
# for computing vectorized losses.
correct_scores = tf.gather_nd(model.scores,
tf.stack((tf.range(X.shape[0]), model.labels), axis=1))
###############################################################################
# TODO: Produce the saliency maps over a batch of images. #
# #
# 1) Compute the “loss” using the correct scores tensor provided for you. #
# (We'll combine losses across a batch by summing) #
# 2) Use tf.gradients to compute the gradient of the loss with respect #
# to the image (accessible via model.image). #
# 3) Compute the actual value of the gradient by a call to sess.run(). #
# You will need to feed in values for the placeholders model.image and #
# model.labels. #
# 4) Finally, process the returned gradient to compute the saliency map. #
###############################################################################
#(1)(2) 分數對於輸入圖像的梯度
saliency_grad = tf.gradients(correct_scores,model.image)
#(3) 運算求值
saliency = sess.run(saliency_grad,feed_dict = {model.image:X,model.labels:y})[0]
#(4) 處理
saliency = np.absolute(saliency) #求絕對值
saliency = np.amax(saliency,axis = -1) #求三個channel上最大的值
##############################################################################
# END OF YOUR CODE #
##############################################################################
return saliency
Fooling Images
我們也可以用圖像梯度來生成一些”fooling images”,正如[3]中討論的那樣。 給定了一張圖片和一個目標的類,我們可以在圖片上做梯度上升來最大化目標類的分數,直到神經網絡把這個圖片預測為目標類位置。
def make_fooling_image(X, target_y, model):
"""
Generate a fooling image that is close to X, but that the model classifies
as target_y.
Inputs:
- X: Input image, a numpy array of shape (1, 224, 224, 3)
- target_y: An integer in the range [0, 1000)
- model: Pretrained SqueezeNet model
Returns:
- X_fooling: An image that is close to X, but that is classifed as target_y
by the model.
"""
# Make a copy of the input that we will modify
X_fooling = X.copy()
# Step size for the update
learning_rate = 1
##############################################################################
# TODO: Generate a fooling image X_fooling that the model will classify as #
# the class target_y. Use gradient *ascent* on the target class score, using #
# the model.scores Tensor to get the class scores for the model.image. #
# When computing an update step, first normalize the gradient: #
# dX = learning_rate * g / ||g||_2 #
# #
# You should write a training loop, where in each iteration, you make an #
# update to the input image X_fooling (don't modify X). The loop should #
# stop when the predicted class for the input is the same as target_y. #
# #
# HINT: It's good practice to define your TensorFlow graph operations #
# outside the loop, and then just make sess.run() calls in each iteration. #
# #
# HINT 2: For most examples, you should be able to generate a fooling image #
# in fewer than 100 iterations of gradient ascent. You can print your #
# progress over iterations to check your algorithm. #
##############################################################################
score = model.scores[0, target_y]
dX = tf.gradients(score, model.image)[0]
dX = dX / tf.norm(dX)
for i in range(100):
ascent_step, scores = sess.run([dX, model.scores], feed_dict={model.image:X_fooling})
if np.argmax(scores, axis=1) == target_y:
break
X_fooling += learning_rate * ascent_step
##############################################################################
# END OF YOUR CODE #
##############################################################################
return X_fooling
Class visualization
我們可以合成一張圖片來最大化一個特定類的打分;這可以給我們一些直觀感受,來看看模型在判斷圖片是當前這個類的時候它在關注的是圖片的哪些部分。
通過產生一個隨機噪聲的圖片,然后在目標類上做梯度上升,我們就可以生成一張模型會認為是目標類的圖片了。
def create_class_visualization(target_y, model, **kwargs):
"""
Generate an image to maximize the score of target_y under a pretrained model.
Inputs:
- target_y: Integer in the range [0, 1000) giving the index of the class
- model: A pretrained CNN that will be used to generate the image
Keyword arguments:
- l2_reg: Strength of L2 regularization on the image
- learning_rate: How big of a step to take
- num_iterations: How many iterations to use
- blur_every: How often to blur the image as an implicit regularizer
- max_jitter: How much to gjitter the image as an implicit regularizer
- show_every: How often to show the intermediate result
"""
l2_reg = kwargs.pop('l2_reg', 1e-3)
learning_rate = kwargs.pop('learning_rate', 25)
num_iterations = kwargs.pop('num_iterations', 100)
blur_every = kwargs.pop('blur_every', 10)
max_jitter = kwargs.pop('max_jitter', 16)
show_every = kwargs.pop('show_every', 25)
# We use a single image of random noise as a starting point
X = 255 * np.random.rand(224, 224, 3)
X = preprocess_image(X)[None]
########################################################################
# TODO: Compute the loss and the gradient of the loss with respect to #
# the input image, model.image. We compute these outside the loop so #
# that we don't have to recompute the gradient graph at each iteration #
# #
# Note: loss and grad should be TensorFlow Tensors, not numpy arrays! #
# #
# The loss is the score for the target label, target_y. You should #
# use model.scores to get the scores, and tf.gradients to compute #
# gradients. Don't forget the (subtracted) L2 regularization term! #
########################################################################
loss = None # scalar loss
grad = None # gradient of loss with respect to model.image, same size as model.image
pass
loss = model.scores[0,target_y]
grad = tf.gradients(loss,model.image)[0]
grad -= 2*l2_reg*model.image
############################################################################
# END OF YOUR CODE #
############################################################################
for t in range(num_iterations):
# Randomly jitter the image a bit; this gives slightly nicer results
ox, oy = np.random.randint(-max_jitter, max_jitter+1, 2)
X = np.roll(np.roll(X, ox, 1), oy, 2)
########################################################################
# TODO: Use sess to compute the value of the gradient of the score for #
# class target_y with respect to the pixels of the image, and make a #
# gradient step on the image using the learning rate. You should use #
# the grad variable you defined above. #
# #
# Be very careful about the signs of elements in your code. #
########################################################################
dX = sess.run(grad,feed_dict={model.image:X})
X += learning_rate * dX
############################################################################
# END OF YOUR CODE #
############################################################################
# Undo the jitter
X = np.roll(np.roll(X, -ox, 1), -oy, 2)
# As a regularizer, clip and periodically blur
X = np.clip(X, -SQUEEZENET_MEAN/SQUEEZENET_STD, (1.0 - SQUEEZENET_MEAN)/SQUEEZENET_STD)
if t % blur_every == 0:
X = blur_image(X, sigma=0.5)
# Periodically show the image
if t == 0 or (t + 1) % show_every == 0 or t == num_iterations - 1:
plt.imshow(deprocess_image(X[0]))
class_name = class_names[target_y]
plt.title('%s\nIteration %d / %d' % (class_name, t + 1, num_iterations))
plt.gcf().set_size_inches(4, 4)
plt.axis('off')
plt.show()
return X