(pytorch-深度學習系列)pytorch避免過擬合-dropout丟棄法的實現-學習筆記


pytorch避免過擬合-dropout丟棄法的實現

對於一個單隱藏層的多層感知機,其中輸入個數為4,隱藏單元個數為5,且隱藏單元\(h_i\)\(i=1, \ldots, 5\))的計算表達式為:

\[h_i = \phi\left(x_1 w_{1i} + x_2 w_{2i} + x_3 w_{3i} + x_4 w_{4i} + b_i\right) \]

這里\(\phi\)是激活函數,\(x_1, \ldots, x_4\)是輸入,隱藏單元\(i\)的權重參數為\(w_{1i}, \ldots, w_{4i}\),偏差參數為\(b_i\)。當對該隱藏層使用丟棄法時,該層的隱藏單元將有一定概率被丟棄掉。設丟棄概率為\(p\),那么有\(p\)的概率\(h_i\)會被清零,有\(1-p\)的概率\(h_i\)會除以\(1-p\)做拉伸。丟棄概率是丟棄法的超參數。具體來說,設隨機變量\(\xi_i\)為0和1的概率分別為\(p\)\(1-p\)。使用丟棄法時我們計算新的隱藏單元\(h_i'\)

\[h_i' = \frac{\xi_i}{1-p} h_i \]

(這個公式就表示 \(h_i'\)\(1-p\)的概率為\(h_i\)

由於\(E(\xi_i) = 1-p\),因此

\[E(h_i') = \frac{E(\xi_i)}{1-p}h_i = h_i \]

即丟棄法不改變其輸入的期望值。我們對隱藏層使用丟棄法,一種可能的結果是\(h_2\)\(h_5\)被清零。這時輸出值的計算不再依賴\(h_2\)\(h_5\),在反向傳播時,與這兩個隱藏單元相關的權重的梯度均為0。由於在訓練中隱藏層神經元的丟棄是隨機的,即\(h_1, \ldots, h_5\)都有可能被清零,輸出層的計算無法過度依賴\(h_1, \ldots, h_5\)中的任一個,從而在訓練模型時起到正則化的作用,並可以用來應對過擬合。在測試模型時,我們為了拿到更加確定性的結果,一般不使用丟棄法。

開始實現drop丟棄法避免過擬合

定義dropout函數:

%matplotlib inline
import torch
import torch.nn as nn
import numpy as np

def dropout(X, drop_prob):
    X = X.float()
    assert 0 <= drop_prob <= 1
    keep_prob = 1 - drop_prob
    # 這種情況下把全部元素都丟棄
    if keep_prob == 0:
        return torch.zeros_like(X)
    mask = (torch.rand(X.shape) < keep_prob).float()

    return mask * X / keep_prob

定義模型參數:

num_inputs, num_outputs, num_hiddens1, num_hiddens2 = 784, 10, 256, 256

W1 = torch.tensor(np.random.normal(0, 0.01, size=(num_inputs, num_hiddens1)), dtype=torch.float, requires_grad=True)
b1 = torch.zeros(num_hiddens1, requires_grad=True)
W2 = torch.tensor(np.random.normal(0, 0.01, size=(num_hiddens1, num_hiddens2)), dtype=torch.float, requires_grad=True)
b2 = torch.zeros(num_hiddens2, requires_grad=True)
W3 = torch.tensor(np.random.normal(0, 0.01, size=(num_hiddens2, num_outputs)), dtype=torch.float, requires_grad=True)
b3 = torch.zeros(num_outputs, requires_grad=True)

params = [W1, b1, W2, b2, W3, b3]

定義模型將全連接層和激活函數ReLU串起來,並對每個激活函數的輸出使用丟棄法。分別設置各個層的丟棄概率。通常的建議是把靠近輸入層的丟棄概率設得小一點。在這個實驗中,我們把第一個隱藏層的丟棄概率設為0.2,把第二個隱藏層的丟棄概率設為0.5。我們可以通過參數is_training來判斷運行模式為訓練還是測試,並只在訓練模式下使用丟棄法。

drop_prob1, drop_prob2 = 0.2, 0.5

def net(X, is_training=True):
    X = X.view(-1, num_inputs)
    H1 = (torch.matmul(X, W1) + b1).relu()
    if is_training:  # 只在訓練模型時使用丟棄法
        H1 = dropout(H1, drop_prob1)  # 在第一層全連接后添加丟棄層
    H2 = (torch.matmul(H1, W2) + b2).relu()
    if is_training:
        H2 = dropout(H2, drop_prob2)  # 在第二層全連接后添加丟棄層
    return torch.matmul(H2, W3) + b3

def evaluate_accuracy(data_iter, net):
    acc_sum, n = 0.0, 0
    for X, y in data_iter:
        if isinstance(net, torch.nn.Module):
            net.eval() # 評估模式, 這會關閉dropout
            acc_sum += (net(X).argmax(dim=1) == y).float().sum().item()
            net.train() # 改回訓練模式
        else: # 自定義的模型
            if('is_training' in net.__code__.co_varnames): # 如果有is_training這個參數
                # 將is_training設置成False
                acc_sum += (net(X, is_training=False).argmax(dim=1) == y).float().sum().item() 
            else:
                acc_sum += (net(X).argmax(dim=1) == y).float().sum().item() 
        n += y.shape[0]
    return acc_sum / n

訓練和測試模型:

num_epochs, lr, batch_size = 5, 100.0, 256
loss = torch.nn.CrossEntropyLoss()

def load_data_fashion_mnist(batch_size, resize=None, root='~/Datasets/FashionMNIST'):
    """Download the fashion mnist dataset and then load into memory."""
    trans = []
    if resize:
        trans.append(torchvision.transforms.Resize(size=resize))
    trans.append(torchvision.transforms.ToTensor())
    
    transform = torchvision.transforms.Compose(trans)
    mnist_train = torchvision.datasets.FashionMNIST(root=root, train=True, download=True, transform=transform)
    mnist_test = torchvision.datasets.FashionMNIST(root=root, train=False, download=True, transform=transform)
    if sys.platform.startswith('win'):
        num_workers = 0  # 0表示不用額外的進程來加速讀取數據
    else:
        num_workers = 4
    train_iter = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, shuffle=True, num_workers=num_workers)
    test_iter = torch.utils.data.DataLoader(mnist_test, batch_size=batch_size, shuffle=False, num_workers=num_workers)

    return train_iter, test_iter

def train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size,
              params=None, lr=None, optimizer=None):
    for epoch in range(num_epochs):
        train_l_sum, train_acc_sum, n = 0.0, 0.0, 0
        for X, y in train_iter:
            y_hat = net(X)
            l = loss(y_hat, y).sum()
            
            # 梯度清零
            if optimizer is not None:
                optimizer.zero_grad()
            elif params is not None and params[0].grad is not None:
                for param in params:
                    param.grad.data.zero_()
            
            l.backward()
            if optimizer is None:
                sgd(params, lr, batch_size)
            else:
                optimizer.step()  # “softmax回歸的簡潔實現”一節將用到
            
            
            train_l_sum += l.item()
            train_acc_sum += (y_hat.argmax(dim=1) == y).sum().item()
            n += y.shape[0]
        test_acc = evaluate_accuracy(test_iter, net)
        print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f'
              % (epoch + 1, train_l_sum / n, train_acc_sum / n, test_acc))
train_iter, test_iter = load_data_fashion_mnist(batch_size)
train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size, params, lr)


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM