CS231N Assignment4 Two Layer Net
Begin
本文主要介紹CS231N系列課程的第四項作業,寫一個兩層神經網絡訓練模型。
課程主頁:網易雲課堂CS231N系列課程
語言:Python3.6

1神經網絡
神經網絡理解起來比較簡單,在線形分類器的基礎上加一個非線性激活函數,使其可以表示非線性含義,再增加
多層分類器就成為多層神經網絡,如下圖所示,由輸入X經過第一層計算得到W1X,在后再用隱含層的激活函數max(0,s)
得到隱含層的輸出。到輸出層乘以W2得到輸出層,最后的分類計分。
下圖中最左側為3072代表每幅圖像有3072個特征,經過第一層網絡到達中間層叫做隱藏層,隱含層變為100個特征了,在經過第二層計算到輸出層最終得到10個類的得分。此神經網絡叫做兩層的神經網絡(包含W1、W2)也叫有一個隱含層的神經網絡。
對於激活函數只有在隱含層計算時有激活函數。

對於激活函數,有很多種,如下所示,上述中我們采用的是RELU

2編寫一個兩層神經網絡
類似於之前我們書寫的SVM等,編寫任何一個訓練器需要包含以下幾部分
1、LOSS損失函數(前向傳播)與梯度(后向傳播)計算
2、訓練函數
3、預測函數
4、參數訓練
2.1 loss函數
損失函數計算采用softmaxu損失方法
1、首先計算前向傳輸,計算分數,就是上面那三個公式的調用
##############################
#Computing the class scores of the input
##############################
Z1 = X.dot(W1) + b1#第一層
S1 = np.maximum(0,Z1)#隱藏層激活函數
score = S1.dot(W2) + b2#輸出層
2、計算完之后,插入一句話,當沒有y參數時,直接輸出分數,主要用在計算預測函數時需要計算分數。
if Y is None:
return score
loss = None
3、之后計算損失softmax計算,具體計算可以參考我的作業3
###############################
#TODO:forward pass
#computing the loss of the net
################################
exp_scores = np.exp(score)
probs = exp_scores / np.sum(exp_scores,axis=1,keepdims=True)
#數據損失
data_loss = -1.0/ N * np.log(probs[np.arange(N),Y]).sum()
#正則損失
reg_loss = 0.5*reg*(np.sum(W1*W1) + np.sum(W2*W2))
#總損失
loss = data_loss + reg_loss
4、計算后向傳播梯度
################################
#TODO:backward pass
#computing the gradient
################################
grads = {}
dscores = probs
dscores[np.arange(N),Y] -= 1
dscores /= N
#更新W2B2
grads['W2'] = S1.T.dot(dscores) + reg *W2
grads['b2'] = np.sum(dscores,axis = 0)
#第二層
dhidden = dscores.dot(W2.T)
dhidden[S1<=0] = 0
grads['W1'] = X.T.dot(dhidden) + reg *W1
grads['b1'] = np.sum(dhidden,axis = 0)
代碼如下:
def loss(self,X,Y=None,reg=0.0):
'''
計算損失函數
'''
W1, b1 = self.params['W1'], self.params['b1']
W2, b2 = self.params['W2'], self.params['b2']
N, D = X.shape
##############################
#Computing the class scores of the input
##############################
Z1 = X.dot(W1) + b1#第一層
S1 = np.maximum(0,Z1)#隱藏層激活函數
score = S1.dot(W2) + b2#輸出層
if Y is None:
return score
loss = None
###############################
#TODO:forward pass
#computing the loss of the net
################################
exp_scores = np.exp(score)
probs = exp_scores / np.sum(exp_scores,axis=1,keepdims=True)
#數據損失
data_loss = -1.0/ N * np.log(probs[np.arange(N),Y]).sum()
#正則損失
reg_loss = 0.5*reg*(np.sum(W1*W1) + np.sum(W2*W2))
#總損失
loss = data_loss + reg_loss
################################
#TODO:backward pass
#computing the gradient
################################
grads = {}
dscores = probs
dscores[np.arange(N),Y] -= 1
dscores /= N
#更新W2B2
grads['W2'] = S1.T.dot(dscores) + reg *W2
grads['b2'] = np.sum(dscores,axis = 0)
#第二層
dhidden = dscores.dot(W2.T)
dhidden[S1<=0] = 0
grads['W1'] = X.T.dot(dhidden) + reg *W1
grads['b1'] = np.sum(dhidden,axis = 0)
return loss,grads
2.2 訓練函數
訓練參數依然是
學習率learning_rate
正則系數reg
訓練步數num_iters
每次訓練的采樣數量batch_size
1、進入循環中,首先采樣一定數據,batch_inx = np.random.choice(num_train, batch_size)
代表從0-》num_train中隨機產生batch_size 個數,這些數據其實反應這采樣樣本的索引
值,然后我們用X_batch = X[batch_inx,:]可以獲取到該索引所對應的數據
for it in range(num_iters):
X_batch = None
y_batch = None
#########################################################################
# TODO: Create a random minibatch of training data and labels, storing #
# them in X_batch and y_batch respectively. #
#########################################################################
batch_inx = np.random.choice(num_train, batch_size)
X_batch = X[batch_inx,:]
y_batch = y[batch_inx]
2、取樣數據后需要計算損失值和梯度。
# Compute loss and gradients using the current minibatch
loss, grads = self.loss(X_batch, Y=y_batch, reg=reg)
loss_history.append(loss)
3、計算完損失之后,需要根據梯度值去更新參數W1、W2、b1、b2。
梯度反映着它的最大變化方向,如果梯度是正的表示增長,我們應該反方向去調控,所以在其基礎上
減去學習率乘以梯度值。
#########################################################################
# TODO: Use the gradients in the grads dictionary to update the #
# parameters of the network (stored in the dictionary self.params) #
# using stochastic gradient descent. You'll need to use the gradients #
# stored in the grads dictionary defined above. #
#########################################################################
self.params['W1'] -= learning_rate * grads['W1']
self.params['b1'] -= learning_rate * grads['b1']
self.params['W2'] -= learning_rate * grads['W2']
self.params['b2'] -= learning_rate * grads['b2']
4、實時驗證
在神經網絡訓練中我們加入一個實時驗證,沒訓練一次,我們比較以下訓練集與預測值的真實程度,
驗證集與預測值的真實程度。在最后時可以將這條曲線繪制觀測一下。
# Every epoch, check train and val accuracy and decay learning rate.
if it % iterations_per_epoch == 0:
# Check accuracy
train_acc = (self.predict(X_batch) == y_batch).mean()
val_acc = (self.predict(X_val) == y_val).mean()
train_acc_history.append(train_acc)
val_acc_history.append(val_acc)
# Decay learning rate
learning_rate *= learning_rate_decay
最終總代碼如下所示:
def train(self, X, y, X_val, y_val,
learning_rate=1e-3, learning_rate_decay=0.95,
reg=1e-5, num_iters=100,
batch_size=200, verbose=False):
"""
Train this neural network using stochastic gradient descent.
Inputs:
- X: A numpy array of shape (N, D) giving training data.
- y: A numpy array f shape (N,) giving training labels; y[i] = c means that
X[i] has label c, where 0 <= c < C.
- X_val: A numpy array of shape (N_val, D) giving validation data.
- y_val: A numpy array of shape (N_val,) giving validation labels.
- learning_rate: Scalar giving learning rate for optimization.
- learning_rate_decay: Scalar giving factor used to decay the learning rate
after each epoch.
- reg: Scalar giving regularization strength.
- num_iters: Number of steps to take when optimizing.
- batch_size: Number of training examples to use per step.
- verbose: boolean; if true print progress during optimization.
"""
self.hyper_params = {}
self.hyper_params['learning_rate'] = learning_rate
self.hyper_params['reg'] = reg
self.hyper_params['batch_size'] = batch_size
self.hyper_params['hidden_size'] = self.params['W1'].shape[1]
self.hyper_params['num_iter'] = num_iters
num_train = X.shape[0]
iterations_per_epoch = max(num_train / batch_size, 1)
# Use SGD to optimize the parameters in self.model
loss_history = []
train_acc_history = []
val_acc_history = []
for it in range(num_iters):
X_batch = None
y_batch = None
#########################################################################
# TODO: Create a random minibatch of training data and labels, storing #
# them in X_batch and y_batch respectively. #
#########################################################################
batch_inx = np.random.choice(num_train, batch_size)
X_batch = X[batch_inx,:]
y_batch = y[batch_inx]
#########################################################################
# END OF YOUR CODE #
#########################################################################
# Compute loss and gradients using the current minibatch
loss, grads = self.loss(X_batch, Y=y_batch, reg=reg)
loss_history.append(loss)
#########################################################################
# TODO: Use the gradients in the grads dictionary to update the #
# parameters of the network (stored in the dictionary self.params) #
# using stochastic gradient descent. You'll need to use the gradients #
# stored in the grads dictionary defined above. #
#########################################################################
self.params['W1'] -= learning_rate * grads['W1']
self.params['b1'] -= learning_rate * grads['b1']
self.params['W2'] -= learning_rate * grads['W2']
self.params['b2'] -= learning_rate * grads['b2']
#########################################################################
# END OF YOUR CODE #
#########################################################################
if verbose and it % 100 == 0:
print ('iteration %d / %d: loss %f' % (it, num_iters, loss))
# Every epoch, check train and val accuracy and decay learning rate.
if it % iterations_per_epoch == 0:
# Check accuracy
train_acc = (self.predict(X_batch) == y_batch).mean()
val_acc = (self.predict(X_val) == y_val).mean()
train_acc_history.append(train_acc)
val_acc_history.append(val_acc)
# Decay learning rate
learning_rate *= learning_rate_decay
return {
'loss_history': loss_history,
'train_acc_history': train_acc_history,
'val_acc_history': val_acc_history,
}
訓練時間可能稍微較長,等待一段時間后可以看到如下結果

2.3 predict函數
預測和之前類似,將數據帶入損失,找分數最大值即可
def predict(self, X):
y_pred = None
scores = self.loss(X)
y_pred = np.argmax(scores, axis=1)
return y_pred
訓練結果如下所示

2.4 可視化結果
訓練完之后我們可以進行可視化觀察,我們把訓練時的loss顯示出來,還有實時比較的誤差拿出來看看。
測試代碼如下:
#step1 數據裁剪
#數據量太大,我們重新整理數據,提取一部分訓練數據、測試數據、驗證數據
num_training = 49000#訓練集數量
num_validation = 1000#驗證集數量
num_test = 1000#測試集數量
num_dev = 500
Data = load_CIFAR10()
CIFAR10_Data = './'
X_train,Y_train,X_test,Y_test = Data.load_CIFAR10(CIFAR10_Data)#load the data
#從訓練集中截取一部分數據作為驗證集
mask = range(num_training,num_training + num_validation)
X_val = X_train[mask]
Y_val = Y_train[mask]
#訓練集前一部分數據保存為訓練集
mask = range(num_training)
X_train = X_train[mask]
Y_train = Y_train[mask]
#訓練集數量太大,我們實驗只要一部分作為開發集
mask = np.random.choice(num_training,num_dev,replace = False)
X_dev = X_train[mask]
Y_dev = Y_train[mask]
#測試集也太大,變小
mask = range(num_test)
X_test = X_test[mask]
Y_test = Y_test[mask]
#step2 數據預處理
#所有數據准變為二位數據,方便處理
X_train = np.reshape(X_train,(X_train.shape[0],-1))
X_val = np.reshape(X_val,(X_val.shape[0],-1))
X_test = np.reshape(X_test,(X_test.shape[0],-1))
X_dev = np.reshape(X_dev,(X_dev.shape[0],-1))
print('Traing data shape', X_train.shape)
print('Validation data shape',X_val.shape)
print('Test data shape',X_test.shape)
print('Dev data shape',X_dev.shape)
#step3訓練數據
input_size = 32*32*3
hidden_size = 50
num_classes = 10
net = TwoLayerNet(input_size,hidden_size,num_classes)
#訓練
sta = net.train(X_train,Y_train,X_val,Y_val,num_iters=1000,batch_size=200,learning_rate=4e-4,learning_rate_decay=0.95,reg=0.7,verbose=True)
#step4預測數據
val = (net.predict(X_val) == Y_val).mean()
print(val)
#step5可視化效果
plt.subplot(2,1,1)
plt.plot(sta['loss_history'])
plt.ylabel('loss')
plt.xlabel('Iteration')
plt.title('Loss_History')
plt.subplot(2,1,2)
plt.plot(sta['train_acc_history'],label = 'train')
plt.plot(sta['val_acc_history'],label = 'val')
plt.xlabel('epoch')
plt.ylabel('Classfication accuracy')
plt.show()


