動手學深度學習14- pytorch Dropout 實現與原理


針對深度學習中的過擬合問題,通常使用丟棄法(dropout),丟棄法有很多的變體,本文提高的丟棄法特指倒置丟棄法(inverted dorpout)。

方法

在會議多層感知機的圖3.3描述了一個單隱藏層的多層感知機。其中輸入個數為4,隱藏單元個數為5,且隱藏單元\(h_{i}(1,2,3,4,5)\)的計算表達式為
\(h_{i} = \varphi(x_{1}w_{1i}+x_{2}w_{2i}+x_{1}w_{3i}+x_{1}w_{4i}+b_{i})\)
這個里的\(\varphi\)是激活函數,\(x_{1},x_{2},x_{3},x_{4}是輸入,隱層單元i的權重參數為w_{1i},w_{2i},w_{3i},w_{4i},偏置參數為b_{i}\),當對該隱藏層使用丟棄法是,該層的隱藏單元將有一定概率的被丟棄掉。設丟棄的概率為P,那么有p的概率hi會被清零,有1-p的概率hi會除以1-p做拉伸。丟棄概率是丟棄法的超參數。具體來說,設隨機變量\(\xi為0和1的概率分別是p和1-p。使用丟棄法時,我們使用計算新的隱藏單元h_{i}^{'}\)
\(h_{i}^{'} = \frac{\xi_{i}}{1-p}h_{i}\)
由於$E(\xi_{i}) = 1-p \(,因此 \)E(\xi_{i}^{'}) = \frac{E(\xi_{i})}{1-p}h_{i} = h_{i}$
丟棄法不改變其輸入的期望值。

如圖3.5所示,其中,h2和h5被清零,這時輸出值的計算不再依賴h2和h5,在反向傳播時,與這兩個隱藏單元相關的權重的梯度均為0。由於訓練匯總隱藏層的神經元丟棄是隨機的,即h1,h2,h3,h4,h5都可能被清零,輸出層計算無法過度依賴h1,h2,h3..h5中的任一個,從而在訓練模型時起到的正則化的作用,可以用來應付過擬合。在測試模型時,我們為了拿到更加確定的結果,一般不適用丟棄法。

從零開始實現

%matplotlib inline
import torch
import torch.nn as nn
import numpy as np
import sys
sys.path.append('..')
import d2lzh_pytorch as d2l

def dropout(X,drop_prob):
    X = X.float()
    assert 0<=drop_prob<=1
    keep_prob = 1-drop_prob
    if keep_prob==0:
        return torch.torch.zeros_like(X)
    mask = (torch.rand(X.shape)<keep_prob).float()
    # 均勻分布的的張量,torch.rand(*sizes,out=None) → Tensor
    # 返回一個張量,包含了從區間(0,1)的均勻分布中隨機抽取的一組隨機數。
    #print(mask)
    return mask * X / keep_prob
X = torch.arange(16).view(2,8)
dropout(X,0)
tensor([[ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11., 12., 13., 14., 15.]])
dropout(X,0.5)
tensor([[ 0.,  0.,  0.,  0.,  0., 10.,  0., 14.],
        [16., 18.,  0.,  0.,  0., 26., 28., 30.]])
dropout(X,1)
tensor([[0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0.]])
定義模型參數
num_inputs,num_outputs, num_hidden1,num_hidden2 = 784,10,256,256
W1 = torch.tensor(np.random.normal(0,0.01,size=(num_inputs,num_hidden1)),dtype =torch.float32,requires_grad=True )
b1 = torch.zeros(num_hidden1,requires_grad=True)

W2 = torch.tensor(np.random.normal(0,0.01,size=(num_hidden1,num_hidden2)),dtype =torch.float32,requires_grad=True )
b2 = torch.zeros(num_hidden2,requires_grad=True)


W3 = torch.tensor(np.random.normal(0,0.01,size=(num_hidden2,num_outputs)),dtype =torch.float32,requires_grad=True )
b3 = torch.zeros(num_outputs,requires_grad=True)


params = [W1,b1,W2,b2,W3,b3]

網絡
drop_prob1,drop_prob2  = 0.2,0.5
def net(X,is_training=True):
    X = X.view(-1,num_inputs)
    H1 = (torch.matmul(X,W1)+b1).relu()
    if is_training:
        H1 = dropout(H1,drop_prob1)
        
    H2 = (torch.matmul(H1,W2)+b2).relu()
    if is_training:
        H2 = dropout(H2,drop_prob2)
    return torch.matmul(H2,W3)+b3

評估函數
def evaluate_accuracy(data_iter,net):
    acc_sum ,n = 0.0,0
    for X,y in data_iter:
        if isinstance(net,torch.nn.Module):  #如果是torch.nn里簡潔的實現的模型
            net.eval()  # 評估模式,這時會關閉Dropout
            acc_sum+=(net(X).argmax(dim=1)==y).float().sum().item()
            net.train()  # 改回訓練模式
        else:  # 自己定義的模型
            if ('is_training' in net.__code__.co_varnames):   #  如果有訓練這個參數
                # 將is_training 設置為False 
                acc_sum +=(net(X,is_training=False).argmax(dim=1)==y).float().sum().item()
            else:
                acc_sum+=(net(X),argmax(dim=1)==y).float().sum().item()
        n+= y.shape[0]
    return acc_sum/n     
    
優化方法
def sgd(params,lr,batch_size):
    for param in params:
#         param.data -=lr* param.grad/batch_size   
        param.data-= lr* param.grad   # 計算loss使用的是pytorch的交叉熵
# 這個梯度可以不用除以batch_size,pytorch 在計算loss的時候已經除過一次了,
定義損失函數
loss = torch.nn.CrossEntropyLoss()
數據提取與訓練評估
num_epochs,lr,batch_size=15,0.3,256
batch_size = 256
train_iter,test_iter = d2l.get_fahsion_mnist(batch_size)
def train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size,
              params=None, lr=None, optimizer=None):
    for epoch in range(num_epochs):
        train_l_sum, train_acc_sum, n = 0.0, 0.0, 0
        for X, y in train_iter:
            y_hat = net(X)
            l = loss(y_hat, y).sum()

            # 梯度清零
            if optimizer is not None:
                optimizer.zero_grad()
            elif params is not None and params[0].grad is not None:
                for param in params:
                    param.grad.data.zero_()

            l.backward()
            if optimizer is None:
                sgd(params, lr, batch_size)
            else:
                optimizer.step()  # “softmax回歸的簡潔實現”一節將用到


            train_l_sum += l.item()
            train_acc_sum += (y_hat.argmax(dim=1) == y).sum().item()
            n += y.shape[0]
        test_acc = evaluate_accuracy(test_iter, net)
        print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f'
              % (epoch + 1, train_l_sum / n, train_acc_sum / n, test_acc))


train_ch3(net,train_iter,test_iter,loss,num_epochs,batch_size,params,lr)
epoch 1, loss 0.0049, train acc 0.513, test acc 0.693
epoch 2, loss 0.0024, train acc 0.776, test acc 0.781
epoch 3, loss 0.0020, train acc 0.818, test acc 0.780
epoch 4, loss 0.0018, train acc 0.835, test acc 0.846
epoch 5, loss 0.0017, train acc 0.846, test acc 0.843
epoch 6, loss 0.0016, train acc 0.855, test acc 0.843
epoch 7, loss 0.0015, train acc 0.861, test acc 0.843
epoch 8, loss 0.0015, train acc 0.863, test acc 0.855
epoch 9, loss 0.0014, train acc 0.870, test acc 0.861
epoch 10, loss 0.0014, train acc 0.872, test acc 0.845
epoch 11, loss 0.0013, train acc 0.874, test acc 0.853
epoch 12, loss 0.0013, train acc 0.878, test acc 0.848
epoch 13, loss 0.0013, train acc 0.880, test acc 0.859
epoch 14, loss 0.0013, train acc 0.882, test acc 0.858
epoch 15, loss 0.0012, train acc 0.885, test acc 0.863

pytorch簡潔實現

net = nn.Sequential(
d2l.FlattenLayer(),
    nn.Linear(num_inputs,num_hidden1),
    nn.ReLU(),
    nn.Dropout(drop_prob1),
    nn.Linear(num_hidden1,num_hidden2),
    nn.ReLU(),
    nn.Dropout(drop_prob2),
    nn.Linear(num_hidden2,num_outputs)
)
for param in net.parameters():
    nn.init.normal_(param,mean=0,std=0.01)
optimizer = torch.optim.SGD(net.parameters(),lr=0.3)
train_ch3(net,train_iter,test_iter,loss,num_epochs,batch_size,None,None,optimizer)
epoch 1, loss 0.0048, train acc 0.525, test acc 0.725
epoch 2, loss 0.0024, train acc 0.779, test acc 0.787
epoch 3, loss 0.0020, train acc 0.818, test acc 0.771
epoch 4, loss 0.0018, train acc 0.836, test acc 0.834
epoch 5, loss 0.0017, train acc 0.847, test acc 0.848
epoch 6, loss 0.0016, train acc 0.855, test acc 0.855
epoch 7, loss 0.0015, train acc 0.859, test acc 0.850
epoch 8, loss 0.0014, train acc 0.863, test acc 0.853
epoch 9, loss 0.0014, train acc 0.868, test acc 0.848
epoch 10, loss 0.0014, train acc 0.872, test acc 0.837
epoch 11, loss 0.0013, train acc 0.876, test acc 0.849
epoch 12, loss 0.0013, train acc 0.879, test acc 0.872
epoch 13, loss 0.0013, train acc 0.880, test acc 0.847
epoch 14, loss 0.0013, train acc 0.883, test acc 0.862
epoch 15, loss 0.0012, train acc 0.886, test acc 0.865
小結
  • 可以使用Dropout應對過擬合
  • 丟棄法只能在訓練模型時使用


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM