邏輯回歸(Logistic Regression)二分類原理,交叉熵損失函數及python numpy實現


本文目錄:

1. sigmoid function (logistic function)

2. 邏輯回歸二分類模型

3. 神經網絡做二分類問題

4. python實現神經網絡做二分類問題

----------------------------------------------------------------------------------- 

1. sigmoid unit 

對於一個輸入樣本$X(x_1,x_2, ..., x_n)$,sigmoid單元先計算$x_1,x_2, ..., x_n$的線性組合:

$z = {{\bf{w}}^T}{\bf{x}} = {w_1}{x_1} + {w_2}{x_2} + ... + {w_n}{x_n}$

然后把結果$z$輸入到sigmoid函數:

$\sigma (z) = \frac{1}{{1 + {e^{ - z}}}}$

sigmoid函數圖像:

 sigmoid函數有個很有用的特征,就是它的導數很容易用它的輸出表示,即

$\frac{{\partial \sigma (z)}}{{\partial z}} = \frac{{{e^{ - z}}}}{{{{(1 + {e^{ - z}})}^2}}} = \frac{1}{{1 + {e^{ - z}}}} \cdot \frac{{{e^{ - z}}}}{{1 + {e^{ - z}}}} = \frac{1}{{1 + {e^{ - z}}}} \cdot (1 - \frac{1}{{1 + {e^{ - z}}}}) = \sigma (z)(1 - \sigma (z))\begin{array}{*{20}{c}}
{} & {} & {} & {(1)} \\
\end{array}$

2. 邏輯回歸二分類模型

把sigmoid函數應用到二分類中,當$\sigma(z)>=0.5$,輸出標簽$y=1$;當$\sigma(z)<0.5$,輸出標簽$y=0$。並定義如下條件概率:

$P\{ Y = 1|\bf{x}\} = p(x) = \frac{1}{{1 + {e^{ - {{\bf{w}}^T}\bf{x}}}}}$

$P\{ Y = 0|\bf{x}\} = 1 - p(\bf{x}) = \frac{{{e^{ - {{\bf{w}}^T}\bf{x}}}}}{{1 + {e^{ - {{\bf{w}}^T}\bf{x}}}}}$

 一個事件的幾率($odds$)是指該事件發生的概率和該事件不發生的概率的比值。如果事件發生的概率是$p$,那么該事件的幾率是$\frac{p}{1-p}$,該事件的對數幾率($log$ $odds$)或$logit$函數是$logit(p)=ln\frac{p}{1-p}$。在邏輯回歸二分類模型中,事件的對數幾率是

$\ln \frac{{P\{ Y = 1|\bf{x}\} }}{{P\{ Y = 0|\bf{x}\} }} = \ln \frac{{p(x)}}{{1 - p(\bf{x})}} = \ln ({e^{{{\bf{w}}^T}\bf{x}}}) = {{\bf{w}}^T}\bf{x}$

上式表明,在邏輯回歸二分類模型中,輸出$y=1$的對數幾率是輸入$\bf{x}$的線性函數。

在邏輯回歸二分類模型中,對於給定的數據集$T = \{ ({{\bf{x}}_1},{y_1}),({{\bf{x}}_2},{y_2}),...,({{\bf{x}}_n},{y_n})\}$,可以應用極大似然估計法估計模型參數${{\bf{w}}^T} = ({w_1},{w_2},...,{w_n})$。

設:

$\begin{array}{l}
P\{ Y = 1|\bf{x}\} = \sigma ({{\bf{w}}^T}{\bf{x}}) \\
P\{ Y = 0|\bf{x}\} = 1 - \sigma ({{\bf{w}}^T}{\bf{x}}) \\
\end{array}$

似然函數為:

$\prod\limits_{i = 1}^n {{{[\sigma ({{\bf{w}}^T}{{\bf{x}}_i})]}^{{y_i}}}} {[1 - \sigma ({{\bf{w}}^T}{{\bf{x}}_i})]^{1 - {y_i}}}$

對數似然函數為:

$L({\bf{w}}) = \sum\limits_{i = 1}^n {[{y_i}\log } \sigma ({{\bf{w}}^T}{{\bf{x}}_i}) + (1 - {y_i})\log (1 - \sigma ({{\bf{w}}^T}{{\bf{x}}_i}))]$

對$L({\bf{w}})$取極大值,

$\frac{{\partial L({\bf{w}})}}{{\partial{w_j}}} = \sum\limits_{i = 1}^n {[\frac{{{y_i}}}{{\sigma ({{\bf{w}}^T}{{\bf{x}}_i})}}} - \frac{{1 - {y_i}}}{{1 - \sigma ({{\bf{w}}^T}{{\bf{x}}_i})}}]\frac{{\partial \sigma ({{\bf{w}}^T}{{\bf{x}}_i})}}{{\partial ({{\bf{w}}^T}{{\bf{x}}_i})}}\frac{{\partial ({{\bf{w}}^T}{{\bf{x}}_i})}}{{\partial {w_j}}}$

應用式(1),有

$\frac{{\partial L({\bf{w}})}}{{\partial{w_j}}} = \sum\limits_{i = 1}^n {[\frac{{{y_i} - \sigma ({{\bf{w}}^T}{{\bf{x}}_i})}}{{\sigma ({{\bf{w}}^T}{{\bf{x}}_i})[1 - \sigma ({{\bf{w}}^T}{{\bf{x}}_i})]}}} ] \cdot \sigma ({{\bf{w}}^T}{{\bf{x}}_i})[1 - \sigma ({{\bf{w}}^T}{{\bf{x}}_i})] \cdot {x_{ij}}$

$\frac{{\partial L({\bf{w}})}}{{\partial{w_j}}} = \sum\limits_{i = 1}^n [ {y_i} - \sigma ({{\bf{w}}^T}{{\bf{x}}_i})] \cdot {x_{ij}}$

令$\frac{{\partial L({\bf{w}})}}{{{w_j}}}=0$即可得到參數${\bf{w}}$的估計值。

3. 神經網絡做二分類問題,交叉熵損失函數

在閾值函數是sigmoid函數的神經網絡中,針對二分類問題,交叉熵損失函數是比較合適的損失函數,其形式為(和上一節的對數似然函數只相差一個負號):

$C =- \frac{1}{n}\sum\limits_{i = 1}^n {[{y_i}\log } \sigma ({{\bf{w}}^T}{{\bf{x}}_i}) + (1 - {y_i})\log (1 - \sigma ({{\bf{w}}^T}{{\bf{x}}_i}))]$

 在神經網絡的訓練過程中,權重的迭代過程為:

$w_j^{k + 1} = w_j^k - \eta \frac{{\partial C}}{{\partial w_j^k}}$

在損失函數是交叉熵損失函數的情況下, 

$\frac{{\partial C}}{{\partial w_j^k}} = \sum\limits_{i = 1}^n [ \sigma ({{\bf{w}}^T}{{\bf{x}}_i}) - {y_i}] \cdot {x_{ij}} = ({{\bf{x}}^T}[\sigma ({{\bf{w}}^T}{\bf{x}}) - {\bf{y}}] )_j= ({{\bf{x}}^T}{\bf{e}})_j$

其中,${\bf{y}}$是由樣本標簽構成的列向量,等號后的兩個式子的下標$j$表示向量的第$j$個分量。

4. python實現神經網絡做二分類問題

神經網絡結構:一個sigmoid單元

訓練數據:總共500個訓練樣本,鏈接https://pan.baidu.com/s/1qWugzIzdN9qZUnEw4kWcww,提取碼:ncuj

損失函數:交叉熵損失函數

代碼如下:

import numpy as np
import matplotlib.pyplot as plt


class Logister():
    def __init__(self, path):
        self.path = path

    def file2matrix(self, delimiter):
        fp = open(self.path, 'r')
        content = fp.read()              # content現在是一行字符串,該字符串包含文件所有內容
        fp.close()
        rowlist = content.splitlines()   # 按行轉換為一維表
        # 逐行遍歷
        # 結果按分隔符分割為行向量
        recordlist = [list(map(float, row.split(delimiter))) for row in rowlist if row.strip()]
        return np.mat(recordlist)

    def drawScatterbyLabel(self, dataSet):
        m, n = dataSet.shape
        target = np.array(dataSet[:, -1])
        target = target.squeeze()        # 把二維數據變為一維數據
        for i in range(m):
            if target[i] == 0:
                plt.scatter(dataSet[i, 0], dataSet[i, 1], c='blue', marker='o')
            if target[i] == 1:
                plt.scatter(dataSet[i, 0], dataSet[i, 1], c='red', marker='o')

    def buildMat(self, dataSet):
        m, n = dataSet.shape
        dataMat = np.zeros((m, n))
        dataMat[:, 0] = 1
        dataMat[:, 1:] = dataSet[:, :-1]
        return dataMat

    def logistic(self, wTx):
        return 1.0/(1.0 + np.exp(-wTx))

    def classfier(self, testData, weights):
        prob = self.logistic(sum(testData*weights))   # 求取概率--判別算法
        if prob > 0.5:
            return 1
        else:
            return 0


if __name__ == '__main__':
    logis = Logister('testSet.txt')

    print('1. 導入數據')
    inputData = logis.file2matrix('\t')
    target = inputData[:, -1]
    m, n = inputData.shape
    print('size of input data: {} * {}'.format(m, n))

    print('2. 按分類繪制散點圖')
    logis.drawScatterbyLabel(inputData)

    print('3. 構建系數矩陣')
    dataMat = logis.buildMat(inputData)

    alpha = 0.1                 # learning rate
    steps = 600                 # total iterations
    weights = np.ones((n, 1))   # initialize weights
    weightlist = []

    print('4. 訓練模型')
    for k in range(steps):
        output = logis.logistic(dataMat * np.mat(weights))
        errors = target - output
        print('iteration: {}  error_norm: {}'.format(k, np.linalg.norm(errors)))
        weights = weights + alpha*dataMat.T*errors  # 梯度下降
        weightlist.append(weights)

    print('5. 畫出訓練過程')
    X = np.linspace(-5, 15, 301)
    weights = np.array(weights)
    length = len(weightlist)
    for idx in range(length):
        if idx % 100 == 0:
            weight = np.array(weightlist[idx])
            Y = -(weight[0] + X * weight[1]) / weight[2]
            plt.plot(X, Y)
            plt.annotate('hplane:' + str(idx), xy=(X[0], Y[0]))
    plt.show()

    print('6. 應用模型到測試數據中')
    testdata = np.mat([-0.147324, 2.874846])           # 測試數據
    m, n = testdata.shape
    testmat = np.zeros((m, n+1))
    testmat[:, 0] = 1
    testmat[:, 1:] = testdata
    print(logis.classfier(testmat, np.mat(weights)))   # weights為前面訓練得出的

訓練600個iterations,每100個iterations輸出一次訓練結果,如下圖:

【參考文獻】

[1] 《機器學習》Mitshell,第四章

[2] 《機器學習算法原理與編程實踐》鄭捷,第五章第二節

[3] Neural Network and Deep Learning,Michael Nielsen,chapter 3


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM