本文目錄:
1. sigmoid function (logistic function)
2. 邏輯回歸二分類模型
3. 神經網絡做二分類問題
4. python實現神經網絡做二分類問題
-----------------------------------------------------------------------------------
1. sigmoid unit
對於一個輸入樣本$X(x_1,x_2, ..., x_n)$,sigmoid單元先計算$x_1,x_2, ..., x_n$的線性組合:
$z = {{\bf{w}}^T}{\bf{x}} = {w_1}{x_1} + {w_2}{x_2} + ... + {w_n}{x_n}$
然后把結果$z$輸入到sigmoid函數:
$\sigma (z) = \frac{1}{{1 + {e^{ - z}}}}$
sigmoid函數圖像:
sigmoid函數有個很有用的特征,就是它的導數很容易用它的輸出表示,即
$\frac{{\partial \sigma (z)}}{{\partial z}} = \frac{{{e^{ - z}}}}{{{{(1 + {e^{ - z}})}^2}}} = \frac{1}{{1 + {e^{ - z}}}} \cdot \frac{{{e^{ - z}}}}{{1 + {e^{ - z}}}} = \frac{1}{{1 + {e^{ - z}}}} \cdot (1 - \frac{1}{{1 + {e^{ - z}}}}) = \sigma (z)(1 - \sigma (z))\begin{array}{*{20}{c}}
{} & {} & {} & {(1)} \\
\end{array}$
2. 邏輯回歸二分類模型
把sigmoid函數應用到二分類中,當$\sigma(z)>=0.5$,輸出標簽$y=1$;當$\sigma(z)<0.5$,輸出標簽$y=0$。並定義如下條件概率:
$P\{ Y = 1|\bf{x}\} = p(x) = \frac{1}{{1 + {e^{ - {{\bf{w}}^T}\bf{x}}}}}$
$P\{ Y = 0|\bf{x}\} = 1 - p(\bf{x}) = \frac{{{e^{ - {{\bf{w}}^T}\bf{x}}}}}{{1 + {e^{ - {{\bf{w}}^T}\bf{x}}}}}$
一個事件的幾率($odds$)是指該事件發生的概率和該事件不發生的概率的比值。如果事件發生的概率是$p$,那么該事件的幾率是$\frac{p}{1-p}$,該事件的對數幾率($log$ $odds$)或$logit$函數是$logit(p)=ln\frac{p}{1-p}$。在邏輯回歸二分類模型中,事件的對數幾率是
$\ln \frac{{P\{ Y = 1|\bf{x}\} }}{{P\{ Y = 0|\bf{x}\} }} = \ln \frac{{p(x)}}{{1 - p(\bf{x})}} = \ln ({e^{{{\bf{w}}^T}\bf{x}}}) = {{\bf{w}}^T}\bf{x}$
上式表明,在邏輯回歸二分類模型中,輸出$y=1$的對數幾率是輸入$\bf{x}$的線性函數。
在邏輯回歸二分類模型中,對於給定的數據集$T = \{ ({{\bf{x}}_1},{y_1}),({{\bf{x}}_2},{y_2}),...,({{\bf{x}}_n},{y_n})\}$,可以應用極大似然估計法估計模型參數${{\bf{w}}^T} = ({w_1},{w_2},...,{w_n})$。
設:
$\begin{array}{l}
P\{ Y = 1|\bf{x}\} = \sigma ({{\bf{w}}^T}{\bf{x}}) \\
P\{ Y = 0|\bf{x}\} = 1 - \sigma ({{\bf{w}}^T}{\bf{x}}) \\
\end{array}$
似然函數為:
$\prod\limits_{i = 1}^n {{{[\sigma ({{\bf{w}}^T}{{\bf{x}}_i})]}^{{y_i}}}} {[1 - \sigma ({{\bf{w}}^T}{{\bf{x}}_i})]^{1 - {y_i}}}$
對數似然函數為:
$L({\bf{w}}) = \sum\limits_{i = 1}^n {[{y_i}\log } \sigma ({{\bf{w}}^T}{{\bf{x}}_i}) + (1 - {y_i})\log (1 - \sigma ({{\bf{w}}^T}{{\bf{x}}_i}))]$
對$L({\bf{w}})$取極大值,
$\frac{{\partial L({\bf{w}})}}{{\partial{w_j}}} = \sum\limits_{i = 1}^n {[\frac{{{y_i}}}{{\sigma ({{\bf{w}}^T}{{\bf{x}}_i})}}} - \frac{{1 - {y_i}}}{{1 - \sigma ({{\bf{w}}^T}{{\bf{x}}_i})}}]\frac{{\partial \sigma ({{\bf{w}}^T}{{\bf{x}}_i})}}{{\partial ({{\bf{w}}^T}{{\bf{x}}_i})}}\frac{{\partial ({{\bf{w}}^T}{{\bf{x}}_i})}}{{\partial {w_j}}}$
應用式(1),有
$\frac{{\partial L({\bf{w}})}}{{\partial{w_j}}} = \sum\limits_{i = 1}^n {[\frac{{{y_i} - \sigma ({{\bf{w}}^T}{{\bf{x}}_i})}}{{\sigma ({{\bf{w}}^T}{{\bf{x}}_i})[1 - \sigma ({{\bf{w}}^T}{{\bf{x}}_i})]}}} ] \cdot \sigma ({{\bf{w}}^T}{{\bf{x}}_i})[1 - \sigma ({{\bf{w}}^T}{{\bf{x}}_i})] \cdot {x_{ij}}$
$\frac{{\partial L({\bf{w}})}}{{\partial{w_j}}} = \sum\limits_{i = 1}^n [ {y_i} - \sigma ({{\bf{w}}^T}{{\bf{x}}_i})] \cdot {x_{ij}}$
令$\frac{{\partial L({\bf{w}})}}{{{w_j}}}=0$即可得到參數${\bf{w}}$的估計值。
3. 神經網絡做二分類問題,交叉熵損失函數
在閾值函數是sigmoid函數的神經網絡中,針對二分類問題,交叉熵損失函數是比較合適的損失函數,其形式為(和上一節的對數似然函數只相差一個負號):
$C =- \frac{1}{n}\sum\limits_{i = 1}^n {[{y_i}\log } \sigma ({{\bf{w}}^T}{{\bf{x}}_i}) + (1 - {y_i})\log (1 - \sigma ({{\bf{w}}^T}{{\bf{x}}_i}))]$
在神經網絡的訓練過程中,權重的迭代過程為:
$w_j^{k + 1} = w_j^k - \eta \frac{{\partial C}}{{\partial w_j^k}}$
在損失函數是交叉熵損失函數的情況下,
$\frac{{\partial C}}{{\partial w_j^k}} = \sum\limits_{i = 1}^n [ \sigma ({{\bf{w}}^T}{{\bf{x}}_i}) - {y_i}] \cdot {x_{ij}} = ({{\bf{x}}^T}[\sigma ({{\bf{w}}^T}{\bf{x}}) - {\bf{y}}] )_j= ({{\bf{x}}^T}{\bf{e}})_j$
其中,${\bf{y}}$是由樣本標簽構成的列向量,等號后的兩個式子的下標$j$表示向量的第$j$個分量。
4. python實現神經網絡做二分類問題
神經網絡結構:一個sigmoid單元
訓練數據:總共500個訓練樣本,鏈接https://pan.baidu.com/s/1qWugzIzdN9qZUnEw4kWcww,提取碼:ncuj
損失函數:交叉熵損失函數
代碼如下:
import numpy as np import matplotlib.pyplot as plt class Logister(): def __init__(self, path): self.path = path def file2matrix(self, delimiter): fp = open(self.path, 'r') content = fp.read() # content現在是一行字符串,該字符串包含文件所有內容 fp.close() rowlist = content.splitlines() # 按行轉換為一維表 # 逐行遍歷 # 結果按分隔符分割為行向量 recordlist = [list(map(float, row.split(delimiter))) for row in rowlist if row.strip()] return np.mat(recordlist) def drawScatterbyLabel(self, dataSet): m, n = dataSet.shape target = np.array(dataSet[:, -1]) target = target.squeeze() # 把二維數據變為一維數據 for i in range(m): if target[i] == 0: plt.scatter(dataSet[i, 0], dataSet[i, 1], c='blue', marker='o') if target[i] == 1: plt.scatter(dataSet[i, 0], dataSet[i, 1], c='red', marker='o') def buildMat(self, dataSet): m, n = dataSet.shape dataMat = np.zeros((m, n)) dataMat[:, 0] = 1 dataMat[:, 1:] = dataSet[:, :-1] return dataMat def logistic(self, wTx): return 1.0/(1.0 + np.exp(-wTx)) def classfier(self, testData, weights): prob = self.logistic(sum(testData*weights)) # 求取概率--判別算法 if prob > 0.5: return 1 else: return 0 if __name__ == '__main__': logis = Logister('testSet.txt') print('1. 導入數據') inputData = logis.file2matrix('\t') target = inputData[:, -1] m, n = inputData.shape print('size of input data: {} * {}'.format(m, n)) print('2. 按分類繪制散點圖') logis.drawScatterbyLabel(inputData) print('3. 構建系數矩陣') dataMat = logis.buildMat(inputData) alpha = 0.1 # learning rate steps = 600 # total iterations weights = np.ones((n, 1)) # initialize weights weightlist = [] print('4. 訓練模型') for k in range(steps): output = logis.logistic(dataMat * np.mat(weights)) errors = target - output print('iteration: {} error_norm: {}'.format(k, np.linalg.norm(errors))) weights = weights + alpha*dataMat.T*errors # 梯度下降 weightlist.append(weights) print('5. 畫出訓練過程') X = np.linspace(-5, 15, 301) weights = np.array(weights) length = len(weightlist) for idx in range(length): if idx % 100 == 0: weight = np.array(weightlist[idx]) Y = -(weight[0] + X * weight[1]) / weight[2] plt.plot(X, Y) plt.annotate('hplane:' + str(idx), xy=(X[0], Y[0])) plt.show() print('6. 應用模型到測試數據中') testdata = np.mat([-0.147324, 2.874846]) # 測試數據 m, n = testdata.shape testmat = np.zeros((m, n+1)) testmat[:, 0] = 1 testmat[:, 1:] = testdata print(logis.classfier(testmat, np.mat(weights))) # weights為前面訓練得出的
訓練600個iterations,每100個iterations輸出一次訓練結果,如下圖:
【參考文獻】
[1] 《機器學習》Mitshell,第四章
[2] 《機器學習算法原理與編程實踐》鄭捷,第五章第二節
[3] Neural Network and Deep Learning,Michael Nielsen,chapter 3