卷積神經網絡的簡單可視化
本次將進行卷積神經網絡權重的簡單可視化。
在本篇教程的前半部分,我們會首先定義一個及其簡單的 CNN 模型,並手工指定一些過濾器權重參數,作為卷積核參數。
后半部分,我們會使用 FashionMNIST 數據集,並且定義一個 2 層的 CNN 模型,將模型訓練至准確率在 85% 以上,再進行模型卷積核的可視化。
1. 簡單卷積網絡模型的可視化
1.1 指定過濾器卷積層的可視化
在下面的練習中,我們將手動定義幾個類似索比爾算子的過濾器,並將它們指定給一個極其簡單地卷積神經網絡模型。然后可視化卷積層 4 個過濾器的輸出(即 feature maps)。
加載目標圖像
import cv2
import matplotlib.pyplot as plt
%matplotlib inline
img_path = 'images/udacity_sdc.png'
bgr_img = cv2.imread(img_path)
gray_img = cv2.cvtColor(bgr_img, cv2.COLOR_BGR2GRAY)
gray_img = gray_img.astype("float32")/255
plt.imshow(gray_img, cmap='gray')
plt.show()

手動定義過濾器
import numpy as np
filter_vals = np.array([[-1, -1, 1, 1], [-1, -1, 1, 1], [-1, -1, 1, 1], [-1, -1, 1, 1]])
# 變化產生更豐富的過濾器
filter_1 = filter_vals
filter_2 = -filter_1
filter_3 = filter_1.T
filter_4 = -filter_3
filters = np.array([filter_1, filter_2, filter_3, filter_4])
fig = plt.figure(figsize=(10, 5))
for i in range(4):
ax = fig.add_subplot(1, 4, i+1, xticks=[], yticks=[])
ax.imshow(filters[i], cmap='gray')
ax.set_title('Filter %s' % str(i+1))
width, height = filters[i].shape
for x in range(width):
for y in range(height):
ax.annotate(str(filters[i][x][y]), xy=(y,x),
horizontalalignment='center',
verticalalignment='center',
color='white' if filters[i][x][y] < 0 else 'black')

定義簡單卷積神經網絡
import torch
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self, weight):
super(Net, self).__init__()
k_height, k_width = weight.shape[2:]
self.conv = nn.Conv2d(1, 4, kernel_size=(k_height, k_width), bias=False)
self.conv.weight = torch.nn.Parameter(weight)
self.pool = nn.MaxPool2d(4,4)
def forward(self, x):
conv_x = self.conv(x)
activated_x = F.relu(conv_x)
pooled_x = self.pool(activated_x)
return conv_x, activated_x, pooled_x
# filters 的大小為 4 4 4
# weight 的大小被增加為 4 1 4 4,1 的維度是針對輸入的一個通道
weight = torch.from_numpy(filters).unsqueeze(1).type(torch.FloatTensor)
model = Net(weight)
print('Filters shape: ', filters.shape)
print('weights shape: ', weight.shape)
print(model)
Filters shape: (4, 4, 4)
weights shape: torch.Size([4, 1, 4, 4])
Net(
(conv): Conv2d(1, 4, kernel_size=(4, 4), stride=(1, 1), bias=False)
(pool): MaxPool2d(kernel_size=4, stride=4, padding=0, dilation=1, ceil_mode=False)
)
可視化卷積輸出
定義一個函數 viz_layer,在這個方法可以可視化某一層卷積的輸出。
def viz_layer(layer, n_filters=4):
fig = plt.figure(figsize=(20, 20))
for i in range(n_filters):
ax = fig.add_subplot(1, n_filters, i+1, xticks=[], yticks=[])
ax.imshow(np.squeeze(layer[0,i].data.numpy()), cmap='gray')
ax.set_title('Output %s' % str(i+1))
# 輸出原圖
plt.imshow(gray_img, cmap='gray')
# 格式化輸出過濾器(卷積核)
fig = plt.figure(figsize=(12, 6))
fig.subplots_adjust(left=0, right=1.5, bottom=0.8, top=1, hspace=0.05, wspace=0.05)
for i in range(4):
ax = fig.add_subplot(1, 4, i+1, xticks=[], yticks=[])
ax.imshow(filters[i], cmap='gray')
ax.set_title('Filter %s' % str(i+1))
# 為 gray img 添加 1 個 batch 維度,以及 1 個 channel 維度,並轉化為 tensor
gray_img_tensor = torch.from_numpy(gray_img).unsqueeze(0).unsqueeze(1)
print(gray_img.shape)
print(gray_img_tensor.shape)
# 將輸入圖傳入模型,獲得輸出
conv_layer, activated_layer, pooled_layer = model(gray_img_tensor)
# 可視化卷積輸出
viz_layer(conv_layer)
(213, 320)
torch.Size([1, 1, 213, 320])



# 可視化卷積后激活函數后的輸出
viz_layer(activated_layer)

1.2 指定過濾器池化層的可視化
下面可視化池化層后的輸出。
# 可視化池化層后的輸出
viz_layer(pooled_layer)

2. 多層卷積網絡模型的可視化
在下面的練習中,我們將定義一個相對復雜點的神經網絡,並使用 FashionMNIST 數據集訓練至 85% 以上的准確率,其后再對神經網絡進行可視化分析。
2.1 加載 FashionMNIST 數據集
FashionMNIST 相當於一種對 MNIST 數據集的升級。MNIST 數據集的數字識別在目前來說,模式比較簡單,可能作為深度神經網絡模型的目標數據集稍顯簡單。FashionMNIST 將圖像內容變為“時尚衣物”,圖像格式不變,使用起來幾乎與 MNIST 無異,且比 MNIST 更能考驗模型對數據模式的學習能力。
FashionMNIST 的類別列表:
0:T-shirt/top(T恤)
1:Trouser(褲子)
2:Pullover(套衫)
3:Dress(裙子)
4:Coat(外套)
5:Sandal(涼鞋)
6:Shirt(汗衫)
7:Sneaker(運動鞋)
8:Bag(包)
加載 FashionMNIST 數據集
import torch
import torchvision
from torchvision.datasets import FashionMNIST
from torch.utils.data import DataLoader
from torchvision import transforms
data_transform = transforms.ToTensor()
train_data = FashionMNIST(root='./data', train=True,
download=False, transform=data_transform)
test_data = FashionMNIST(root='./data', train=False,
download=False, transform=data_transform)
# Print out some stats about the training and test data
print('Train data, number of images: ', len(train_data))
print('Test data, number of images: ', len(test_data))
Train data, number of images: 60000
Test data, number of images: 10000
創建數據加載器
batch_size = 20
train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=True)
# specify the image classes
classes = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
可視化目標數據集的部分數據
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
dataiter = iter(train_loader)
images, labels = dataiter.next()
images = images.numpy()
# plot the images in the batch, along with the corresponding labels
fig = plt.figure(figsize=(25, 4))
for idx in np.arange(batch_size):
ax = fig.add_subplot(2, batch_size/2, idx+1, xticks=[], yticks=[])
ax.imshow(np.squeeze(images[idx]), cmap='gray')
ax.set_title(classes[labels[idx]])#### 加載 FashionMNIST 數據集

2.2 訓練多層卷積模型
定義模型
下面定義一個具有兩層卷積的模型,加入的 dropout 在一定程度上起到防止過擬合的作用。
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 16, 3, padding=1)
self.pool1 = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(16, 32, 3, padding=1)
self.pool2 = nn.MaxPool2d(2, 2)
self.activation_l = nn.ReLU()
self.fc = nn.Linear(32 * 7 * 7, 24)
self.out = nn.Linear(24, 10)
self.dropout = nn.Dropout(p=0.5)
self.activation_out = nn.Softmax(dim=1)
def forward(self, x):
x = self.activation_l(self.conv1(x))
x = self.pool1(x)
x = self.activation_l(self.conv2(x))
x = self.pool2(x)
x = x.view(x.size(0), -1)
x = self.activation_l(self.fc(x))
x = self.dropout(x)
x = self.activation_out(self.out(x))
return x
訓練模型
import torch.optim as optim
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters())
def train(n_epochs):
for epoch in range(n_epochs):
running_loss = 0.0
for batch_i, data in enumerate(train_loader):
inputs, labels = data
optimizer.zero_grad()
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
if batch_i % 1000 == 999:
print('Epoch: {}, Batch: {}, Avg. Loss: {}'.format(epoch + 1, batch_i+1, running_loss/1000))
running_loss = 0.0
print('Finished Training')
n_epochs = 10
train(n_epochs)
model_dir = 'saved_models/'
model_name = 'model_best.pt'
torch.save(net.state_dict(), model_dir+model_name)
加載訓練的模型
net = Net()
net.load_state_dict(torch.load('saved_models/model_best.pt'))
print(net)
Net(
(conv1): Conv2d(1, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv2): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(pool2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(activation_l): ReLU()
(fc): Linear(in_features=1568, out_features=24, bias=True)
(out): Linear(in_features=24, out_features=10, bias=True)
(dropout): Dropout(p=0.5)
(activation_out): Softmax()
)
在測試數據集上測試模型
test_loss = torch.zeros(1)
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
print(class_correct)
print(test_loss)
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
tensor([ 0.])
net.eval()
criterion = torch.nn.CrossEntropyLoss()
for batch_i, data in enumerate(test_loader):
inputs, labels = data
output = net(inputs)
loss = criterion(outputs, labels)
# update average test loss
test_loss = test_loss + ( (torch.ones(1) / (batch_i+1)) * (loss.data - test_loss) )
_, predicted = torch.max(output.data, 1)
correct = np.squeeze(predicted.eq(labels.data.view_as(predicted)))
for i in range(batch_size):
label = labels.data[i]
class_correct[label] += correct[i].item()
class_total[label] += 1
print('Test Loss: {:.6f}\n'.format(test_loss.numpy()[0]))
for i in range(10):
if class_total[i] > 0:
print('Test Accuracy of %5s: %2d%% (%2d/%2d)' % (
classes[i], 100 * class_correct[i] / class_total[i],
np.sum(class_correct[i]), np.sum(class_total[i])))
else:
print('Test Accuracy of %5s: N/A (no training examples)' % (classes[i]))
print('\nTest Accuracy (Overall): %2d%% (%2d/%2d)' % (
100. * np.sum(class_correct) / np.sum(class_total),
np.sum(class_correct), np.sum(class_total)))
Test Loss: 2.362950
Test Accuracy of T-shirt/top: 85% (850/1000)
Test Accuracy of Trouser: 96% (963/1000)
Test Accuracy of Pullover: 84% (842/1000)
Test Accuracy of Dress: 91% (911/1000)
Test Accuracy of Coat: 85% (856/1000)
Test Accuracy of Sandal: 98% (989/1000)
Test Accuracy of Shirt: 49% (495/1000)
Test Accuracy of Sneaker: 94% (948/1000)
Test Accuracy of Bag: 97% (978/1000)
Test Accuracy of Ankle boot: 93% (930/1000)
Test Accuracy (Overall): 87% (8762/10000)
2.3 特征可視化
模型得到訓練並且在測試數據上可以達到 87% 的准確率,下面讓我們進行可視化。
可視化策略是從模型中將各卷積層的參數提取出來,作為獨立的過濾器,使用 OpenCV 的 filter2D 函數,施加在一張從測試集抽樣出的圖像中。觀察過濾器對圖像起到的作用,並嘗試去解釋當前過濾器對原圖起到了怎樣的濾波作用。
從數據集中抽取單張圖片
dataiter = iter(test_loader)
images, labels = dataiter.next()
images = images.numpy()
idx = 15
img = np.squeeze(images[idx])
import cv2
plt.imshow(img, cmap='gray')
<matplotlib.image.AxesImage at 0x124832a90>

進行第一層卷積核的可視化
weights = net.conv1.weight.data
w = weights.numpy()
print(w.shape)
fig = plt.figure(figsize=(30, 10))
columns = 4 * 2
row = 4
for i in range(0, columns * row):
fig.add_subplot(row, columns, i+1)
if ((i%2)==0):
plt.imshow(w[int(i/2)][0], cmap='gray')
else:
c = cv2.filter2D(img, -1, w[int((i-1)/2)][0])
plt.imshow(c, cmap='gray')
plt.show()
(16, 1, 3, 3)

進行第一層卷積核的可視化
weights = net.conv2.weight.data
w = weights.numpy()
print(w.shape)
fig = plt.figure(figsize=(30, 20))
columns = 4 * 2
row = 8
for i in range(0, columns * row):
fig.add_subplot(row, columns, i+1)
if ((i%2)==0):
plt.imshow(w[int(i/2)][0], cmap='gray')
else:
c = cv2.filter2D(img, -1, w[int((i-1)/2)][0])
plt.imshow(c, cmap='gray')
plt.show()
(32, 16, 3, 3)

可以看到一些卷積核起到了邊緣檢測的功能,不同的卷積核對不同方向,不同的紋理,或者說不同的圖像內容敏感。
感覺這種人以主觀想法可視化卷積的方法還不夠豐滿,可能這就算是簡單的神經網絡的可視化方法。除了卷積核的可視化,還可以進行全連接層的可視化。
關於全連接層的可視化,有教程表示是通過可視化類似類別間不同數據單例的“嵌入向量”距離進行可視化的,可能還需要對全連接層產生的“嵌入向量”進行 T-SNE 將為后再進行可視化。如果后續遇到了相關內容,會在本文中再補上。
后記
本文內容參考自 Udacity 計算機視覺納米學位練習,官方源碼連接:
https://github.com/udacity/CVND_Exercises/tree/master/1_5_CNN_Layers
