Task5.PyTorch實現L1,L2正則化以及Dropout


1.了解知道Dropout原理  

  深度學習網路中,參數多,可能出現過擬合及費時問題。為了解決這一問題,通過實驗,在2012年,Hinton在其論文《Improving neural networks by preventing co-adaptation of feature detectors》中提出Dropout。證明了其能有效解決過擬合的能力。

dropout 是指在深度學習網絡的訓練過程中,按照一定的概率將一部分神經網絡單元暫時從網絡中丟棄,相當於從原始的網絡中找到一個更瘦的網絡示意圖如下:

  其實現是以某種概率分布使得一些神經元為0,一些為1.這樣在有N個神經元的神經網絡中,其參數搭配可能有2^N種。
具體介紹 見論文(我也不是很懂 實現得見)
適用情況:
1 Dropout主要用在數據量不夠,容易過擬合,需要dropout。

 

L1及L2可以使得結構化風險最小
其中:
L1的參數具有稀疏性(具有更多的0或1)
L2的參數趨近於分散化 ,其參數值趨向於選擇更簡單(趨於0的參數),因此比較平滑


2.用代碼實現正則化(L1、L2、Dropout)

 

L1范數

 

  L1范數是參數矩陣W中元素的絕對值之和,L1范數相對於L0范數不同點在於,L0范數求解是NP問題,而L1范數是L0范數的最優凸近似,求解較為容易。L1常被稱為LASSO.

1 regularization_loss = 0
2 for param in model.parameters():
3     regularization_loss += torch.sum(abs(param))
4 
5 for epoch in range(EPOCHS):
6     y_pred = model(x_train)
7     classify_loss = criterion(y_pred, y_train.float().view(-1, 1))
8     loss = classify_loss + 0.001 * regularization_loss  # 引入L1正則化項

L2范數

  L2范數是參數矩陣W中元素的平方之和,這使得參數矩陣中的元素更稀疏,與前兩個范數不同的是,它不會讓參數變為0,而是使得參數大部分都接近於0。L1追求稀疏化,從而丟棄了一部分特征(參數為0),而L2范數只是使參數盡可能為0,保留了特征。L2被稱為Rigde.

 

1 criterion  = torch.nn.BCELoss() #定義損失函數
2 optimizer = torch.optim.SGD(model.parameters(),lr = 0.01, momentum=0, dampening=0,weight_decay=0) #weight_decay 表示使用L2正則化

3.Dropout的numpy實現

 1 import numpy as np
 2 
 3 X = np.array([ [0,0,1],[0,1,1],[1,0,1],[1,1,1] ])
 4 
 5 y = np.array([[0,1,1,0]]).T
 6 
 7 alpha,hidden_dim,dropout_percent,do_dropout = (0.5,4,0.2,True)
 8 
 9 synapse_0 = 2*np.random.random((3,hidden_dim)) - 1
10 
11 synapse_1 = 2*np.random.random((hidden_dim,1)) - 1
12 
13 for j in xrange(60000):
14 
15     layer_1 = (1/(1+np.exp(-(np.dot(X,synapse_0)))))
16 
17     if(do_dropout):
18 
19         layer_1 *= np.random.binomial([np.ones((len(X),hidden_dim))],1-dropout_percent)[0] * (1.0/(1-dropout_percent))
20 
21     layer_2 = 1/(1+np.exp(-(np.dot(layer_1,synapse_1))))
22 
23     layer_2_delta = (layer_2 - y)*(layer_2*(1-layer_2))
24 
25     layer_1_delta = layer_2_delta.dot(synapse_1.T) * (layer_1 * (1-layer_1))
26 
27     synapse_1 -= (alpha * layer_1.T.dot(layer_2_delta))
28 
29     synapse_0 -= (alpha * X.T.dot(layer_1_delta))

4.完整代碼

 

 1 import torch
 2 from torch import nn
 3 from torch.autograd import Variable
 4 import torch.nn.functional as F
 5 import torch.nn.init as init
 6 import math
 7 from sklearn import datasets
 8 from sklearn.model_selection import train_test_split
 9 from sklearn.metrics import classification_report
10 import numpy as np
11 import pandas as pd
12 %matplotlib inline
13 
14 # 導入數據
15 data = pd.read_csv(r'C:\Users\betty\Desktop\pytorch學習\data.txt')
16 x, y = data.ix[:,:8],data.ix[:,-1]
17 
18 #測試集為30%,訓練集為80%
19 x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=0)
20 
21 x_train = Variable(torch.from_numpy(np.array(x_train)).float())
22 y_train = Variable(torch.from_numpy(np.array(y_train).reshape(-1, 1)).float())    
23 
24 x_test = Variable(torch.from_numpy(np.array(x_test)).float())
25 y_test= Variable(torch.from_numpy(np.array(y_test).reshape(-1,1)).float())    
26 
27 
28 print(x_train.data.shape)
29 print(y_train.data.shape)
30 
31 print(x_test.data.shape)
32 print(y_test.data.shape)
33 
34 class Model(torch.nn.Module):
35     def __init__(self):
36         super(Model, self).__init__()
37         self.l1 = torch.nn.Linear(8, 200)
38         self.l2 = torch.nn.Linear(200, 50)
39         self.l3 = torch.nn.Linear(50, 1)
40 
41     def forward(self, x):
42         out1 = F.relu(self.l1(x))
43         out2 = F.dropout(out1, p= 0.5)
44         out3 = F.relu(self.l2(out2))
45         out4 = F.dropout(out3, p=0.5)
46         y_pred = F.sigmoid(self.l3(out3))
47         return y_pred
48 
49 model = Model()
50 
51 criterion = torch.nn.BCELoss()
52 optimizer = torch.optim.Adam(model.parameters(), lr=0.001, weight_decay=0.1)
53 
54 Loss=[]
55 for epoch in range(2000):
56         y_pred = model(x_train)
57         loss = criterion(y_pred, y_train)
58         if epoch % 400 == 0:
59             print("epoch =", epoch, "loss", loss.item())
60             Loss.append(loss.item())
61         optimizer.zero_grad()
62         loss.backward()
63         optimizer.step()
64 
65 # 模型評估
66 def label_flag(data):
67     for i in range(len(data)):
68         if(data[i]>0.5):
69             data[i] = 1.0
70         else:
71             data[i] = 0.0
72     return data
73 
74 y_pred = label_flag(y_pred)  
75 print(classification_report(y_train.detach().numpy(), y_pred.detach().numpy()))
76 
77 # 測試
78 y_test_pred = model(x_test)
79 y_test_pred = label_flag(y_test_pred)       
80 print(classification_report(y_test.detach().numpy(), y_test_pred.detach().numpy()))

 

數據集下載鏈接:鏈接:https://pan.baidu.com/s/1LrJktjVQ1OM9mYt_cuE-FQ
提取碼:hatv

 

 

原文鏈接:https://blog.csdn.net/wehung/article/details/89283583


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM