練習推導一個最簡單的BP神經網絡訓練過程【個人作業/數學推導】


寫在前面

  各式資料中關於BP神經網絡的講解已經足夠全面詳盡,故不在此過多贅述。本文重點在於由一個“最簡單”的神經網絡練習推導其訓練過程,和大家一起在練習中一起更好理解神經網絡訓練過程。

一、BP神經網絡

1.1 簡介

  BP網絡(Back-Propagation Network) 是1986年被提出的,是一種按誤差逆向傳播算法訓練的
  多層前饋網絡,是目前應用最廣泛的神經網絡模型之一,用於函數逼近、模型識別分類、數據壓縮和時間序列預測等。
  一個典型的BP網絡應該包括三層:輸入層、隱藏層和輸出層。各層之間全連接,同層之間無連接。隱藏層可以有很多層。
Fig 1 數字圖像與對應矩陣圖示

Fig 1 數字圖像與對應矩陣圖示


1.2 訓練(學習)過程

  每一次迭代(Interation)意味着使用一批(Batch)數據對模型進行一次更新過程,被稱為“一次訓練”,包含一個正向過程和一個反向過程。
具體過程可以概括為如下過程:

  1. 准備樣本信息(數據&標簽)、定義神經網絡(結構、初始化參數、選取激活函數等)
  2. 將樣本輸入,正向計算各節點函數輸出
  3. 計算損失函數
  4. 求損失函數對各權重的偏導數,采用適當方法進行反向過程優化
  5. 重復2~4直至達到停止條件
   以下訓練將使用均值平方差(Mean Squared Error, MSE)作為損失函數,sigmoid函數作為激活函數、梯度下降法作為優化權重方法進行推導

二、實例推導練習作業

2.1 准備工作

Fig 2 所定義神經網絡、初始化參數、樣本信息等

Fig 2 所定義神經網絡、初始化參數、樣本信息等


  1. 第一層是輸入層,包含兩個神經元: i1, i2 和偏置b1
  2. 第二層是隱藏層,包含兩個神經元: h1, h2 和偏置項b2
  3. 第三層是輸出: o1, o2
  4. 每條線上標的 wi 是層與層之間連接的權重
  5. 激活函數是 sigmod 函數
  6. 我們用 z 表示某神經元的加權輸入和,用 a 表示某神經元的輸出

2.2 第一次正向過程【個人推導】

  根據上述信息,我們可以得到另一種表達一次迭代的“環形”過程的圖示如下:
Fig 3 bp神經網絡數量關系“環”圖示

Fig 3 bp神經網絡數量關系“環”圖示


  我們做一次正向過程(由於需多次迭代,因此我們將第一次正向過程標記為t=0),得各項數值如下:

Fig 4 初始數值“環”圖示(附函數關系表達式)

Fig 4 初始數值“環”圖示(附函數關系表達式)

  由此我們可得損失函數值為MSE=0.298371109,假設這超出了我們對損失值的要求,那么我們就需要對各個權重(wi,t=0)進行更新, 作為t=1的初始權重。

2.3 推導計算∂/∂wi【個人推導】

2.3.1 均值平方差損失函數的全微分推導

\(dMSE=\frac{\partial MSE}{\partial ao_{1}}dao_{1}+\frac{\partial MSE}{\partial ao_{2}}dao_{2}\)
\({\color{white}{dMSE}}=\frac{\partial MSE}{\partial ao_{1}}\frac{\partial ao_{1}}{\partial zo_{1}} dzo_{1} +\frac{\partial MSE}{\partial ao_{2}}\frac{\partial ao_{2}}{\partial zo_{2}} dzo_{2}\)
\({\color{white}{dMSE}}=\frac{\partial MSE}{\partial ao_{1}}\frac{\partial ao_{1}}{\partial zo_{1}}\left ( \frac{\partial zo_{1}}{\partial ah_{1}}dah_{1} +\frac{\partial zo_{1}}{\partial ah_{2}}dah_{2} + \frac{\partial zo_{1}}{\partial w\omega _{5}}d\omega _{5} +\frac{\partial zo_{1}}{\partial w\omega _{6}}d\omega _{6} \right )\)
\({\color{white}{dMSE=}}+\frac{\partial MSE}{\partial ao_{2}}\frac{\partial ao_{2}}{\partial zo_{2}}\left ( \frac{\partial zo_{2}}{\partial ah_{1}}dah_{1} +\frac{\partial zo_{2}}{\partial ah_{2}}dah_{2} + \frac{\partial zo_{2}}{\partial w\omega _{7}}d\omega _{7} +\frac{\partial zo_{2}}{\partial w\omega _{8}}d\omega _{8} \right )\)
\({\color{white}{dMSE}}=\frac{\partial MSE}{\partial ao_{1}}\frac{\partial ao_{1}}{\partial zo_{1}}\left ( \frac{\partial zo_{1}}{\partial ah_{1}}\frac{\partial ah_{1}}{\partial zh_{1}}dzh_{1} +\frac{\partial zo_{1}}{\partial ah_{2}}\frac{\partial ah_{2}}{\partial zh_{2}}dzh_{2} + \frac{\partial zo_{1}}{\partial w\omega _{5}}d\omega _{5} +\frac{\partial zo_{1}}{\partial w\omega _{6}}d\omega _{6} \right )\)
\({\color{white}{dMSE=}}+\frac{\partial MSE}{\partial ao_{2}}\frac{\partial ao_{2}}{\partial zo_{2}}\left ( \frac{\partial zo_{2}}{\partial ah_{1}}\frac{\partial ah_{1}}{\partial zh_{1}}dzh_{1} +\frac{\partial zo_{2}}{\partial ah_{2}}\frac{\partial ah_{2}}{\partial zh_{2}}dzh_{2} + \frac{\partial zo_{2}}{\partial w\omega _{7}}d\omega _{7} +\frac{\partial zo_{2}}{\partial w\omega _{8}}d\omega _{8} \right )\)
\({\color{white}{dMSE}}=\frac{\partial MSE}{\partial ao_{1}}\frac{\partial ao_{1}}{\partial zo_{1}}\left [ \frac{\partial zo_{1}}{\partial ah_{1}}\frac{\partial ah_{1}}{\partial zh_{1}}\left ( \frac{\partial zh_{1}}{\partial \omega _{1}}d\omega _{1}+\frac{\partial zh_{1}}{\partial \omega _{2}}d\omega _{2} \right ) +\frac{\partial zo_{1}}{\partial ah_{2}}\frac{\partial ah_{2}}{\partial zh_{2}}\left ( \frac{\partial zh_{2}}{\partial \omega _{3}}d\omega _{3}+\frac{\partial zh_{2}}{\partial \omega _{4}}d\omega _{4} \right ) + \frac{\partial zo_{1}}{\partial w\omega _{5}}d\omega _{5} +\frac{\partial zo_{1}}{\partial w\omega _{6}}d\omega _{6} \right ]\)
\({\color{white}{dMSE=}}+\frac{\partial MSE}{\partial ao_{2}}\frac{\partial ao_{2}}{\partial zo_{2}}\left [ \frac{\partial zo_{2}}{\partial ah_{1}}\frac{\partial ah_{1}}{\partial zh_{1}}\left ( \frac{\partial zh_{1}}{\partial \omega _{1}}d\omega _{1}+\frac{\partial zh_{1}}{\partial \omega _{2}}d\omega _{2} \right ) +\frac{\partial zo_{2}}{\partial ah_{2}}\frac{\partial ah_{2}}{\partial zh_{2}}\left ( \frac{\partial zh_{2}}{\partial \omega _{3}}d\omega _{3}+\frac{\partial zh_{2}}{\partial \omega _{4}}d\omega _{4} \right ) + \frac{\partial zo_{2}}{\partial w\omega _{7}}d\omega _{7} +\frac{\partial zo_{2}}{\partial w\omega _{8}}d\omega _{8} \right ]\)

2.3.2 這一次代入訓練實例的數值和各數量名

\(dMSE=\frac{\partial \frac{1}{2}\left ( y_{1}-ao_{1} \right )^2}{\partial ao_{1}}dao_{1}+\frac{\partial \frac{1}{2}\left ( y_{2}-ao_{2} \right )^2}{\partial ao_{2}}dao_{2}\)
\({\color{white}{dMSE}}=-\left( y_{1}-ao_{1} \right )\frac{\partial ao_{1}}{\partial zo_{1}}dzo_{1}-\left( y_{2}-ao_{2} \right )\frac{\partial ao_{2}}{\partial zo_{2}}dzo_{2}\)
\({\color{white}{dMSE}}=-\left( y_{1}-ao_{1} \right )ao_{1}\left( 1-ao_{1} \right )\left ( \frac{\partial zo_{1}}{\partial ah_{1}}dah_{1}+\frac{\partial zo_{1}}{\partial ah_{2}}dah_{2}+\frac{\partial zo_{1}}{\partial \omega _{5}}d\omega _{5}+\frac{\partial zo_{1}}{\partial \omega _{6}}d\omega _{6} \right )\)
\({\color{white}{dMSE=}}-\left( y_{2}-ao_{2} \right )ao_{2}\left( 1-ao_{2} \right )\left ( \frac{\partial zo_{2}}{\partial ah_{1}}dah_{1}+\frac{\partial zo_{2}}{\partial ah_{2}}dah_{2}+\frac{\partial zo_{2}}{\partial \omega _{7}}d\omega _{7}+\frac{\partial zo_{2}}{\partial \omega _{8}}d\omega _{8} \right )\)
\({\color{white}{dMSE}}=-\left( y_{1}-ao_{1} \right )ao_{1}\left( 1-ao_{1} \right )\left ( \omega_{5}\frac{\partial ah_{1}}{\partial zh_{1}}dzh_{1}+\omega_{6}\frac{\partial ah_{2}}{\partial zh_{2}}dzh_{2}+ah_{1}d\omega _{5}+ah_{2}d\omega _{6} \right )\)
\({\color{white}{dMSE=}}-\left( y_{2}-ao_{2} \right )ao_{2}\left( 1-ao_{2} \right )\left ( \omega_{7}\frac{\partial ah_{1}}{\partial zh_{1}}dzh_{1}+\omega_{8}\frac{\partial ah_{2}}{\partial zh_{2}}dzh_{2}+ah_{1}d\omega _{7}+ah_{2}d\omega _{8} \right )\)
\({\color{white}{dMSE}}=-\left ( y_{1}-ao_{1} \right )ao_{1}\left( 1-ao_{1} \right )\left [ \omega_{5}\cdot ah_{1}\left ( 1- ah_{1} \right )\left (\frac{\partial zh_{1}}{\partial \omega_{1}}\omega_{1}+\frac{\partial zh_{1}}{\partial \omega_{2}}\omega_{2} \right ) +\omega_{6}\cdot ah_{2}\left ( 1- ah_{2} \right )\left (\frac{\partial zh_{2}}{\partial \omega_{3}}\omega_{3}+\frac{\partial zh_{2}}{\partial \omega_{4}}\omega_{4} \right )+ah_{1}d\omega _{5}+ah_{2}d\omega _{6} \right ]\)
\({\color{white}{dMSE=}}-\left ( y_{2}-ao_{2} \right )ao_{2}\left( 1-ao_{2} \right )\left [ \omega_{7}\cdot ah_{1}\left ( 1- ah_{1} \right )\left (\frac{\partial zh_{1}}{\partial \omega_{1}}\omega_{1}+\frac{\partial zh_{1}}{\partial \omega_{2}}\omega_{2} \right ) +\omega_{8}\cdot ah_{2}\left ( 1- ah_{2} \right )\left (\frac{\partial zh_{2}}{\partial \omega_{3}}\omega_{3}+\frac{\partial zh_{2}}{\partial \omega_{4}}\omega_{4} \right )+ah_{1}d\omega _{7}+ah_{2}d\omega _{8} \right ]\)
\({\color{white}{dMSE}}=-\left ( y_{1}-ao_{1} \right )ao_{1}\left( 1-ao_{1} \right )\left[ \omega_{5} \cdot ah_{1}\left ( 1-ah_{1} \right )\left ( {\color{green}{i_{1}d\omega_{1}}}+{\color{green}{i_{2}d\omega_{2}}} \right ) +\omega_{6} \cdot ah_{2}\left ( 1-ah_{2} \right )\left ( {\color{green}{i_{1}d\omega_{3}}}+{\color{green}{i_{2}d\omega_{4}}} \right ) +{\color{blue}{ah_{1}d\omega_{5}}}+{\color{blue}{ah_{2}d\omega_{6}}}\right ]\)
\({\color{white}{dMSE=}}-\left ( y_{2}-ao_{2} \right )ao_{2}\left( 1-ao_{2} \right )\left[ \omega_{7} \cdot ah_{1}\left ( 1-ah_{1} \right )\left ( {\color{green}{i_{1}d\omega_{1}}}+{\color{green}{i_{2}d\omega_{2}}} \right ) +\omega_{8} \cdot ah_{2}\left ( 1-ah_{2} \right )\left ( {\color{green}{i_{1}d\omega_{3}}}+{\color{green}{i_{2}d\omega_{4}}} \right ) +{\color{blue}{ah_{1}d\omega_{7}}}+{\color{blue}{ah_{2}d\omega_{8}}}\right ]\)

2.3.3 由此我們得到\(∂/∂wi\)的表達式

\(\frac {\partial MSE}{\partial \omega_{1}}=-\left [ \left( y_{1}-ao_{1} \right )ao_{1}\left ( 1-ao_{1} \right )\cdot \omega_{5}+\left( y_{2}-ao_{2} \right )ao_{2}\left ( 1-ao_{2} \right )\cdot \omega_{7}\right ]\cdot ah_{1}\left ( 1-ah_{1} \right ) \cdot i_{1}\)
\(\frac {\partial MSE}{\partial \omega_{2}}=-\left [ \left( y_{1}-ao_{1} \right )ao_{1}\left ( 1-ao_{1} \right )\cdot \omega_{5}+\left( y_{2}-ao_{2} \right )ao_{2}\left ( 1-ao_{2} \right )\cdot \omega_{7} \right ]\cdot ah_{1}\left ( 1-ah_{1} \right )\cdot i_{2}\)
\(\frac {\partial MSE}{\partial \omega_{3}}=-\left [ \left( y_{1}-ao_{1} \right )ao_{1}\left ( 1-ao_{1} \right )\cdot \omega_{6}+\left( y_{2}-ao_{2} \right )ao_{2}\left ( 1-ao_{2} \right )\cdot \omega_{8} \right ]\cdot ah_{2}\left ( 1-ah_{2} \right )\cdot i_{1}\)
\(\frac {\partial MSE}{\partial \omega_{4}}=-\left [ \left( y_{1}-ao_{1} \right )ao_{1}\left ( 1-ao_{1} \right )\cdot \omega_{6}+\left( y_{2}-ao_{2} \right )ao_{2}\left ( 1-ao_{2} \right )\cdot \omega_{8} \right ]\cdot ah_{2}\left ( 1-ah_{2} \right )\cdot i_{2}\)
\(\frac {\partial MSE}{\partial \omega_{5}}=-\left( y_{1}-ao_{1} \right )ao_{1}\left ( 1-ao_{1} \right )\cdot ah_{1}\)
\(\frac {\partial MSE}{\partial \omega_{6}}=-\left( y_{1}-ao_{1} \right )ao_{1}\left ( 1-ao_{1} \right )\cdot ah_{2}\)
\(\frac {\partial MSE}{\partial \omega_{7}}=-\left( y_{2}-ao_{2} \right )ao_{2}\left ( 1-ao_{2} \right )\cdot ah_{1}\)
\(\frac {\partial MSE}{\partial \omega_{8}}=-\left( y_{2}-ao_{2} \right )ao_{2}\left ( 1-ao_{2} \right )\cdot ah_{2}\)

  當然如果你喜歡用矩陣表示也可以:
(P.S. Markdown編輯器承受不住如此“巨大”的矩陣算式而崩潰,我只好轉成svg圖片貼上了,見諒~)

image

image

【2022.06.06 update】
  對$ \frac{\partial MSE}{\partial \omega _{i}} $表達式分組並拆分矩陣:

\( \begin{bmatrix} \frac{\partial MSE}{\partial \omega _{1}} \\ \frac{\partial MSE}{\partial \omega _{2}} \\ \frac{\partial MSE}{\partial \omega _{3}} \\ \frac{\partial MSE}{\partial \omega _{4}}\end{bmatrix} = \begin{bmatrix}ah_{1}\left ( 1-ah_{1} \right ) & \\ &ah_{2}\left ( 1-ah_{2} \right )\end{bmatrix} \begin{bmatrix} \omega _{5} & \omega _{6} \\ \omega _{7} &\omega _{8}\end{bmatrix}^{T} \begin{bmatrix}-\left( y_{1}-ao_{1} \right )ao_{1}\left ( 1-ao_{1} \right )\\-\left( y_{2}-ao_{2} \right )ao_{2}\left ( 1-ao_{2} \right )\end{bmatrix} \begin{bmatrix}i_{1} \\i_{2}\end{bmatrix}^{T} \)

\( \begin{bmatrix} \frac{\partial MSE}{\partial \omega _{5}} \\ \frac{\partial MSE}{\partial \omega _{6}} \\ \frac{\partial MSE}{\partial \omega _{7}} \\ \frac{\partial MSE}{\partial \omega _{8}}\end{bmatrix} = \begin{bmatrix}-\left( y_{1}-ao_{1} \right )ao_{1}\left ( 1-ao_{1} \right )\\-\left( y_{2}-ao_{2} \right )ao_{2}\left ( 1-ao_{2} \right )\end{bmatrix} \begin{bmatrix} ah_{1} \\ ah_{2}\end{bmatrix}^{T} \)

  相較於之前的矩陣化分解而言,新的矩陣表示更有利於編程的代碼實現。

2.4 根據∂/∂wi梯度下降法優化wi【個人推導】

  根據梯度下降法\({\color{purple}{\omega_{i,t+1}=\omega_{i,t}+\alpha_{t}\left [ -\triangledown Loss(\omega_{i,t}) \right ]}}\),設置學習率α=0.5,計算出wi,t+1,然后重新進行下一次正向過程。(可以將該過程在Excel中輕易實現,下表中為迭代數據截取)
  可以看到,經過10001次迭代之后MSE(t=10001)=3.51019E-05已經足夠小,可以停止迭代完成1代訓練。

image
………………………………………………………………………………………………………………………………………………
image

sigmoid函數求導Tips(for於初級選手)

\(\because {\color{blue}{\frac{1}{1+e^{-x}}}}=\frac{1}{1+\frac{1}{e^{x}}}=\frac{e^{x}}{e^{x}+1}={\color{blue}{1-\frac{1}{1+e^{x}}}}\)
\(\therefore \mathrm{d} \left( {\color{red}{\frac{1}{1+e^{-x}}}} \right )= \mathrm{d} \left( 1-\frac{1}{e^{x}+1} \right )\)
\({\color{white}{\therefore \mathrm{d} \left( \frac{1}{1+e^{-x}} \right )}}= (-1)\times (-1)(e^{x}+1)^{-2}d(e^{x}+1)\)
\({\color{white}{\therefore \mathrm{d} \left( \frac{1}{1+e^{-x}} \right )}}= \frac{e^{x}}{(e^{x}+1)^2} \mathrm{d}x=-\left[ \frac{1}{e^x+1}-\frac{1}{(e^x+1)^2} \right ]\mathrm{d}x\)
\({\color{white}{\therefore \mathrm{d} \left( \frac{1}{1+e^{-x}} \right )}}= \frac{1}{1+e^{x}}\left [ 1-\frac{1}{e^x+1} \right ] \mathrm{d}x\)
${\color{white}{\therefore \mathrm{d} \left( \frac{1}{1+e^{-x}} \right )}}=
\left ( {\color{red}{1-\frac{1}{1+e^{-x}}}} \right )\left [ {\color{red}{\frac{1}{1+e^{-x}}}} \right ]\mathrm{d}x $


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM