菜鳥之路——機器學習之BP神經網絡個人理解及Python實現

本文轉載自查看原文 2018-08-25 18:02 1838 機器學習/ 菜鳥之路/ 神經網絡/ Python

關鍵詞：

輸入層（Input layer）。隱藏層（Hidden layer）。輸出層（Output layer）

理論上如果有足夠多的隱藏層和足夠大的訓練集，神經網絡可以模擬出任何方程。隱藏層多的時候就是深度學習啦

沒有明確的規則來設計最好有多少個隱藏層，可以根據實驗測試的誤差以及准確度來實驗測試並改進。

交叉驗證方法（cross -validation）：把樣本分為K份，取一份為測試集，其他為訓練集。共取K次，然后取其平均值

BP的步驟

1、初始化權重（weight）以及偏向（bias），隨機初始化在[-1.1]之間或[-0.5,0.5之間，每個神經元有一個偏向]

2、對每個特征向量進行I_j=∑w_ijO_i+θ_j。O_j=1/(1+e^-Ij)。（激活函數）

對於輸出層：Errj=Oj（1-Oj）（Tj-Oj）。Tj：真實值，Oj：預測值。

對於隱藏層：Errj=Oj（1-Oj）ΣErr_kwjk

權重更新：Δwij=（l）ErrjOj。（l）是學習速率一般取[0,1]

偏向更新：Δθ_{j=（l）Errj}

3、終止條件

權重的更新低於某個閾值

預測的錯誤率低於某個閾值

達到預設的一定的循環次數

激活函數一般為以下兩個

1，雙曲函數：tanhx=（e^x-e^-x）/（e^x+e^-x）其一階導數為1-tanh²x

2，邏輯函數：f（x）=1/(1+e^-x) 其一階導數為f（x）（1-f（x））

下面是代碼

 1 import numpy as np
 2 
 3 def tanh(x):
 4     return np.tanh(x)
 5 
 6 def tanh_derivative(x):
 7     return 1.0-np.tanh(x)*np.tanh(x)#tanh(x)的一階導數
 8 
 9 def logistic(x):
10     return 1/(1+np.exp(x))
11 
12 def logistic_derivative(x):
13     return logistic(x)*(1-logistic(x))
14 
15 
16 class NeuralNetwork:         #面向對象編程
17     def __init__(self,layers,activation='tanh'):#構造函數
18         #layers，一個list，神經網絡的層數以及每層神經元的個數，有幾個數字就是有幾層，每個數字就是每層的神經元的個數
19         #activation：用戶定義用哪個激活函數。默認為tanh
20         if activation=='logistic':
21             self.activation=logistic
22             self.activation_deriv=logistic_derivative
23         elif activation=='tanh':
24             self.activation=tanh
25             self.activation_deriv=tanh_derivative
26         self.weight=[]
27         for i in range(1,len(layers)-1):
28             #print(layers[i - 1] + 1, layers[i] + 1)
29             #print(layers[i] + 1, layers[i + 1] + 1)
30             self.weight.append((2 * np.random.random((layers[i - 1] +1, layers[i] +1)) - 1) * 0.25)
31             self.weight.append((2 * np.random.random((layers[i] +1, layers[i + 1])) - 1) * 0.25)
32 
33         #print("weight:", self.weight)
34 
35     def fit(self,X,Y,learning_rate=0.2,epochs=10000):#數據集，目標標記，學習速率，最大學習次數  #神經網絡是在抽樣去訓練。從數據集里隨便抽一個去訓練。
36         X=np.atleast_2d(X)#確認數據集至少是二維的。
37         temp=np.ones([X.shape[0],X.shape[1]+1]) #shape返回矩陣的行數與列數。對bias偏向定義初值
38         temp[:,0:-1]=X
39         X=temp
40         #print("X_:",X)
41         Y=np.array(Y)
42         #print("Y_:", Y)
43         for k in range(epochs):
44             i=np.random.randint(X.shape[0])
45             a=[X[i]]#隨機抽取一個實例
46             #print("a_:", a)
47             for l in range(len(self.weight)):
48                 a.append(self.activation(np.dot(a[l],self.weight[l])))#矩陣相乘然后代入激活函數
49                 #print("l,a[l],weight[l]:",l, a[l],self.weight[l])
50                 #print('a__',a)
51                 #print("error:", l, Y[i], a[-1])
52             error =Y[i]-a[-1]
53             deltas =[error * self.activation_deriv(a[-1])]
54 
55             for l in range(len(a)-2,0,-1):
56                 deltas.append(deltas[-1].dot(self.weight[l].T)*self.activation_deriv(a[l]))  #算每一層的權重變化量
57             deltas.reverse()#翻轉一下
58             for i in range(len(self.weight)):
59                 layer = np.atleast_2d([a[i]])
60                 delta = np.atleast_2d([deltas[i]])
61                 self.weight[i]+=learning_rate*layer.T.dot(delta)  #.T就是對矩陣的轉置
62 
63     def prdict(self,x):
64         x=np.array(x)
65         temp = np.ones(x.shape[0]+1)
66         temp[0:-1]=x
67         a=temp
68         for l in range(0,len(self.weight)):
69             a=self.activation(np.dot(a,self.weight[l]))
70         return a
71

這是寫了一個BP神經網絡的對象。

其中有一個地方，編寫的時候我一直不懂，后來研究了一下

第

30 self.weight.append((2 * np.random.random((layers[i - 1] +1, layers[i] +1)) - 1) * 0.25)

31 self.weight.append((2 * np.random.random((layers[i] +1, layers[i + 1])) - 1) * 0.25)

37 temp=np.ones([X.shape[0],X.shape[1]+1]) #shape返回矩陣的行數與列數。對bias偏向定義初值

38 temp[:,0:-1]=X

39 X=temp
這里把特征向量為什么要加一列1。經過我的研究發現加上一列1，為了防止特征值全為0，而標記卻不為零的情況。因為全為0的矩陣乘以任何權向量都是0.會導致訓練不成功

然后就可以寫代碼進行訓練了

 1 from main import NeuralNetwork#導入剛才寫的對象
 2 import numpy as np
 3 
 4 nn=NeuralNetwork([2,2,1],'tanh')
 5 
 6 X = np.array([[0,0],[0,1],[1,0],[1,1]])
 7 print("X:",X)
 8 Y = np.array([1,0,0,1])    #也就是或運算嘛
 9 print("Y:",Y)
10 nn.fit(X,Y)
11 #print("nn:",nn)
12 for i in [[0,0],[0,1],[1,0],[1,1]]:
13     print(i,nn.prdict(i))

運行結果為

X: [[0 0]
[0 1]
[1 0]
[1 1]]
Y: [1 0 0 1]
[0, 0] [0.99875886]
[0, 1] [0.00025754]
[1, 0] [1.91186633e-05]
[1, 1] [0.99868908]

然后又寫了一個手寫數字識別的程序

 1 import numpy as np
 2 from sklearn.datasets import load_digits
 3 from sklearn.metrics import confusion_matrix,classification_report #對結果的衡量
 4 from sklearn.preprocessing import LabelBinarizer                   #將[0,9]轉化為如果是為1，不是就為0的樣子
 5 from main import NeuralNetwork
 6 from sklearn.model_selection import train_test_split               #划分訓練集與測試集
 7 
 8 digits=load_digits()
 9 X=digits.data
10 Y=digits.target
11 X-=X.min()
12 X/=X.max()
13 
14 nn=NeuralNetwork([64,100,10])
15 X_train,X_test,Y_train,Y_test=train_test_split(X,Y)
16 labels_train = LabelBinarizer().fit_transform(Y_train)
17 labels_test = LabelBinarizer().fit_transform(Y_test)
18 print("start fitting")
19 nn.fit(X_train,labels_train,epochs=3000)
20 prdictions=[]
21 for i in range(X_test.shape[0]):
22     #print(X_test[i])
23     o=nn.prdict(X_test[i])
24     prdictions.append(np.argmax(o))  #最大的概率對應的那個數
25 #print(Y_test,prdictions)
26 print(confusion_matrix(Y_test,prdictions))
27 print(classification_report(Y_test,prdictions))

運行結果

start fitting

[[32 0 2 4 1 0 0 0 1 0]

[ 1 17 31 2 0 0 0 0 2 0]

[ 0 0 55 0 0 0 0 0 0 0]

[ 0 0 8 42 0 0 0 0 0 0]

[ 0 1 1 0 32 0 0 0 2 0]

[ 1 7 9 21 1 1 0 0 1 0]

[ 0 0 6 0 12 0 18 0 8 0]

[ 1 2 36 3 1 0 0 8 0 0]

[ 0 0 14 2 0 0 0 0 18 1]

[ 9 2 4 11 0 0 0 1 6 12]]

precision recall f1-score support

0 0.73 0.80 0.76 40

1 0.59 0.32 0.41 53

2 0.33 1.00 0.50 55

3 0.49 0.84 0.62 50

4 0.68 0.89 0.77 36

5 1.00 0.02 0.05 41

6 1.00 0.41 0.58 44

7 0.89 0.16 0.27 51

8 0.47 0.51 0.49 35

9 0.92 0.27 0.41 45

avg / total 0.70 0.52 0.48 450

大家注意到，第十四行，教程里面選用的“logistic”激活函數，但是如果用“logistic”得出來的結果為

start fitting

C:\Users\admin\PycharmProjects\BP\main.py:10: RuntimeWarning: overflow encountered in exp

return 1/(1+np.exp(x))

D:\Anaconda3\lib\site-packages\sklearn\metrics\classification.py:1135: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples.

'precision', 'predicted', average, warn_for)

[[51 0 0 0 0 0 0 0 0 0]

[53 0 0 0 0 0 0 0 0 0]

[48 0 0 0 0 0 0 0 0 0]

[52 0 0 0 0 0 0 0 0 0]

[35 0 0 0 0 0 0 0 0 0]

[54 0 0 0 0 0 0 0 0 0]

[42 0 0 0 0 0 0 0 0 0]

[40 0 0 0 0 0 0 0 0 0]

[34 0 0 0 0 0 0 0 0 0]

[41 0 0 0 0 0 0 0 0 0]]

precision recall f1-score support

0 0.11 1.00 0.20 51

1 0.00 0.00 0.00 53

2 0.00 0.00 0.00 48

3 0.00 0.00 0.00 52

4 0.00 0.00 0.00 35

5 0.00 0.00 0.00 54

6 0.00 0.00 0.00 42

7 0.00 0.00 0.00 40

8 0.00 0.00 0.00 34

9 0.00 0.00 0.00 41

avg / total 0.01 0.11 0.02 450

這個結果嚴重錯誤，上一篇我說過怎么看這個矩陣。

我找出來這個的原因在於激活函數，如果用“logistic”，預測出來的Y全部為【0.5,0.5,0.5，0.5,0.5,0.5，0.5,0.5,0.5】然后經過np.argmax(o)就變成了0。也就解釋了為什么矩陣中所有的數字都在第一列。

但我把激活函數換成“tanhx”就沒問題了。我也不知道為啥。

看來選激活函數也是一門學問。

到今天，監督學習的分類問題大致學完了，很淺，只是入門，接下來學回歸，等把機器學習入門完了，就往深了學。

import numpy as np
from sklearn.datasets import load_digits
from sklearn.metrics import confusion_matrix,classification_report #對結果的衡量
from sklearn.preprocessing import LabelBinarizer                   #將[0,9]轉化為如果是為1，不是就為0的樣子
from main import NeuralNetwork
from sklearn.model_selection import train_test_split               #划分訓練集與測試集

digits=load_digits()
X=digits.data
Y=digits.target
X-=X.min()
X/=X.max()

nn=NeuralNetwork([64,100,10])
X_train,X_test,Y_train,Y_test=train_test_split(X,Y)
labels_train = LabelBinarizer().fit_transform(Y_train)
labels_test = LabelBinarizer().fit_transform(Y_test)
print("start fitting")
nn.fit(X_train,labels_train,epochs=3000)
prdictions=[]
for i in range(X_test.shape[0]):
    #print(X_test[i])
o=nn.prdict(X_test[i])
    prdictions.append(np.argmax(o))  #最大的概率對應的那個數
#print(Y_test,prdictions)
print(confusion_matrix(Y_test,prdictions))
print(classification_report(Y_test,prdictions))

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 機器學習(4):BP神經網絡原理及其python實現機器學習：python使用BP神經網絡示例機器學習入門學習筆記：（一）BP神經網絡原理推導及程序實現【機器學習】BP神經網絡實現手寫數字識別機器學習實戰—搭建BP神經網絡實現手寫數字識別 python機器學習——BP（反向傳播）神經網絡算法機器學習之神經網絡及python實現機器學習（一）：梯度下降、神經網絡、BP神經網絡簡單易學的機器學習算法——神經網絡之BP神經網絡《機器學習(周志華)》筆記--神經網絡（4）--誤差逆傳播算法(BP)：BP算法、BP算法推導