Logistic回歸小結

本文轉載自查看原文 2019-08-26 22:40 442 機器學習

首先要清楚，邏輯回歸是一種分類算法。它是在線性回歸模型的基礎上，使用Sigmoid函數，將線性模型的預測結果轉變為離散變量，從而用於處理分類問題。

1 邏輯回歸原理

以二分類為例，說明邏輯回歸的工作原理。由線性回歸小結基礎，不難得出線性回歸的假設函數$h_{\theta }^{'}\left ( x \right )$，在邏輯回歸中，使用Sigmoid函數使得$h_{\theta }^{'}\left ( x \right )$的值在[0,1]區間內。
一般Sigmoid函數表示為：

\[g\left ( z \right )=\frac{1}{1+e^{-z}} \]

繪圖展示Sigmoid函數：

#Sigmoid function
from scipy.special import expit

def sigmoid(z):    
    return 1/(1 +np.exp(-z))


def h(theta, X):
    return expit(np.dot(X, theta))


#check sigmoid function
myx = np.arange(-10, 10, 0.01)
plt.plot(myx, expit(myx))
plt.title('Sigmoid Function')
plt.show()

可以看到,Sigmoid函數當$z$趨於正無窮時，$g\left ( z \right )$趨於1，當$z$趨於負無窮時，$g\left ( z \right )$趨於0。
對於邏輯回歸，令$z=h_{\theta }^{'}\left ( x \right )$，則有$g\left ( z \right )=g\left ( h_{\theta }^{'}\left ( x \right ) \right )=\frac{1}{1+e^{-h_{\theta }^{'}\left ( x \right )}}$，其中，$h_{\theta }^{'}\left ( x \right )=x\theta $。因此，邏輯回歸的模型為：

\[h_{\theta }\left ( x \right )=\frac{1}{1+e^{-x\theta }} \]

結合Sigmoid函數，理解邏輯回歸模型如何實現分類問題的。此時，$h_{\theta }\left ( x \right )$為模型輸出，把它當作某一分類的概率值，賦予其概率含義。當$h_{\theta }\left ( x \right )>0.5$時，$x\theta >0$，則$y$為1，當$h_{\theta }\left ( x \right )<0.5$時，$x\theta <0$，則$y$為0，如此實現了分類問題。
邏輯回歸模型的矩陣形式為：

\[h_{\theta }\left ( X \right )=\frac{1}{1+e^{-X\theta }} \]

因此，$y$取1的概率等於$h_{\theta }\left ( x \right )$，取0的概率為$1-h_{\theta }\left ( x \right )$，即：

\[p\left ( y|x;\theta \right )=h_{\theta }\left ( x \right )^{y}\left ( 1-h_{\theta }\left ( x \right ) \right )^{1-y} \]

似然函數:

\[L\left ( \theta \right )=p(y|x;\theta )=\prod_{i=1}^{m}P(y^{(i)}|x^{(i)};\theta )=\prod_{i=1}^{m}h_{\theta }(x^{(i)})^{y(i)}(1-h_{\theta }(x^{(i)}))^{1-y^{(i)}} \]

對數似然函數：

\[l(\theta )=logL(\theta )=\sum_{i=1}^{m}y^{(i)}logh_{\theta }(x^{(i)})+(1-y^{(i)})log(1-h_{\theta }(x^{(i)})) \]

損失函數：

\[J(\theta )=-logL(\theta )=-\sum_{i=1}^{m}(y^{(i)}logh_{\theta }(x^{(i)})+(1-y^{(i)})log(1-h_{\theta }(x^{(i)}))) \]

損失函數的矩陣形式為:

\[J(\theta )=-Y^{T}logh_{\theta }(X)-(E-Y)^{T}log(E-h_{\theta }(X)) \]

邏輯回歸的優化目標：

\[\min_{\theta}J\left ( \theta \right ) \]

2 邏輯回歸算法

由於$\frac{\partial J(\theta )}{\partial \theta }=X^{T}(h_{\theta }(X)-Y)$
梯度下降法中$\theta $的迭代公式為:

\[\theta =\theta -\alpha \frac{\partial J(\theta )}{\partial \theta }=\theta -\alpha X^{T}(h_{\theta }(X)-Y) \]

3 邏輯回歸代碼實現

3.1 訓練算法：使用梯度上升找到最佳參數

#Logistic 回歸梯度上升優化算法
from numpy import *
def loadDataSet():
    dataMat = []; labelMat = []
    fr = open('testSet.txt')
    for line in fr.readlines():
        lineArr = line.strip().split()
        dataMat.append([1.0, float(lineArr[0]), float(lineArr[1])])
        labelMat.append(int(lineArr[2]))
    return dataMat,labelMat

def sigmoid(inX):
    return 1.0/(1+exp(-inX))

def gradAscent(dataMatIn, classLabels):
    dataMatrix = mat(dataMatIn)
    labelMat = mat(classLabels).transpose()
    m,n = shape(dataMatrix)
    alpha = 0.001
    maxCycles = 500
    weights = ones((n,1))
    for k in range(maxCycles):
        h = sigmoid(dataMatrix*weights)
        error = (labelMat - h)
        weights = weights + alpha*dataMatrix.transpose()*error
    return weights

dataArr, labelMat = loadDataSet()
print(gradAscent(dataArr, labelMat))
#輸出
[[ 4.12414349]
[ 0.48007329]
[-0.6168482 ]]

4.2 分析數據：畫出決策邊界

#畫出數據集和Logistic回歸最佳擬合直線的函數
#繪制圖像
import matplotlib.pyplot as plt
def plotBestFit(wei):
    weights = wei.getA()
    dataMat,labelMat=loadDataSet()
    dataArr=array(dataMat)
    n=shape(dataMat)[0]          #樣本數目
    xcord1=[]; ycord1=[]
    xcord2=[]; ycord2=[]
    for i in range(n):
        if int(labelMat[i])==1:
            xcord1.append(dataArr[i,1])
            ycord1.append(dataArr[i,2])
        else:
            xcord2.append(dataArr[i,1])
            ycord2.append(dataArr[i,2])
    fig=plt.figure()
    ax=fig.add_subplot(111)
    ax.scatter(xcord1,ycord1,s=20,c='r',marker='s')
    ax.scatter(xcord2,ycord2,s=20,c='g')
    x = arange(-3.0, 3.0, 0.1)  # 直線x坐標的取值范圍
    y = (-weights[0] - weights[1] * x) / weights[2]  # 直線方程
    plt.title('DataSet')
    ax.plot(x, y)
    plt.xlabel('X1');
    plt.ylabel('X2');
    plt.show()

dataArr, labelMat = loadDataSet()
weights = gradAscent(dataArr, labelMat)
plotBestFit(weights)

梯度上升算法在500次迭代后得到的Logistic回歸最佳擬合直線：

4.3 訓練算法：隨機梯度上升

梯度上升算法在每次更新回歸系數時都需要遍歷整個數據集，該方法在處理100個左右的數據集時尚可，但如果有數十億樣本和成千上萬的特征，那么該方法的計算復雜度就太高了。一種改進方法是一次僅用一個樣本點來更新回歸系數，該方法稱為隨機梯度上升算法。

#隨機梯度上升算法
def stocGradAscent0(dataMatrix, classLabels):
    m,n = shape(dataMatrix)
    alpha = 0.01
    weights = ones(n)
    for i in range(m):
        h = sigmoid(sum(dataMatrix[i]*weights))
        error = classLabels[i] - h
        weights = weights + alpha*error*dataMatrix[i]
    return weights

def plotBestFit(wei):
    
    dataMat,labelMat=loadDataSet()
    dataArr=array(dataMat)
    n=shape(dataMat)[0]          #樣本數目
    xcord1=[]; ycord1=[]
    xcord2=[]; ycord2=[]
    for i in range(n):
        if int(labelMat[i])==1:
            xcord1.append(dataArr[i,1])
            ycord1.append(dataArr[i,2])
        else:
            xcord2.append(dataArr[i,1])
            ycord2.append(dataArr[i,2])
    fig=plt.figure()
    ax=fig.add_subplot(111)
    ax.scatter(xcord1,ycord1,s=20,c='r',marker='s')
    ax.scatter(xcord2,ycord2,s=20,c='g')
    x = arange(-3.0, 3.0, 0.1)  # 直線x坐標的取值范圍
    y = (-weights[0] - weights[1] * x) / weights[2]  # 直線方程
    plt.title('DataSet')
    ax.plot(x, y)
    plt.xlabel('X1');
    plt.ylabel('X2');
    plt.show()

dataMat, labelMat = loadDataSet()
weights = stocGradAscent0(array(dataMat), labelMat)
plotBestFit(weights)

分類器錯分了三分之一的樣本:

直接比較隨機梯度上升和梯度上升的代碼結果是不公平的，后者的結果是在整個數據集上迭代了500次才得到的。一個判斷優化算法優劣的可靠方法是看它是否收斂，也就是說參數是否達到了穩定值，是否還會不斷地變化？對此，我們在程序清單5-3中隨機梯度上升算法上做了些修改，使其在整個數據集上運行200次。最終繪制的三個回歸系數的變化情況

#調整梯度上升和隨機梯度上升函數
def gradAscent(dataMatIn,classlabels):
    dataMatrix=mat(dataMatIn)
    labelMat=mat(classlabels).T
    m,n=shape(dataMatrix)
    alpha=0.01
    maxCycles=500
    weights=ones((n,1))
    weights_array=array([])
    for k in range(maxCycles):
        h=sigmoid(dataMatrix*weights)
        error=labelMat-h
        weights=weights+alpha*dataMatrix.T*error
        weights_array=append(weights_array,weights)    #一行
    weights_array=weights_array.reshape(maxCycles,n)
    return weights.getA(),weights_array

def stocGradAscent1(dataMatrix,classLabels,numIter=150):
    m,n=shape(dataMatrix)
    weights=ones(n)
    weights_array=array([])
    for j in range(numIter):
        dataIndex=list(range(m))
        for i in range(m):
            alpha=4/(1.0+j+i)+0.01
            randIndex=int(random.uniform(0,len(dataIndex)))
            h=sigmoid(sum(dataMatrix[randIndex]*weights))
            error=classLabels[randIndex]-h
            weights=weights+alpha*error*dataMatrix[randIndex]
            weights_array=append(weights_array,weights,axis=0)
            del(dataIndex[randIndex])
    weights_array=weights_array.reshape(numIter*m,n)
    return weights,weights_array

#繪制回歸系數與迭代次數的關系
plt.rcParams['font.sans-serif']=['SimHei']
plt.rcParams['axes.unicode_minus']=False
def plotWeights(weights_array1,weights_array2):
    #畫布分成三行兩列
    fig,axs=plt.subplots(nrows=3,ncols=2,sharex=False,sharey=False,figsize=(20,10))
    x1=arange(0,len(weights_array1),1)
    #繪制w0與迭代次數的關系
    axs[0][0].plot(x1,weights_array1[:,0])
    axs0_title_text=axs[0][0].set_title('梯度上升算法：回歸系數與迭代次數關系')
    axs0_ylabel_text=axs[0][0].set_ylabel('W0')
    plt.setp(axs0_title_text,size=20,weight='bold',color='black')
    plt.setp(axs0_ylabel_text,size=20,weight='bold',color='black')
    #繪制w1與迭代次數關系
    axs[1][0].plot(x1,weights_array1[:,1])
    axs1_ylabel_text=axs[1][0].set_ylabel('W1')
    plt.setp(axs1_ylabel_text,size=20,weight='bold',color='black')
    #繪制w2與迭代次數關系
    axs[2][0].plot(x1,weights_array1[:,2])
    axs2_xlabel_text=axs[2][0].set_title('迭代次數')
    axs2_ylabel_text=axs[2][0].set_ylabel('W2')
    plt.setp(axs2_xlabel_text,size=20,weight='bold',color='black')
    plt.setp(axs2_ylabel_text,size=20,weight='bold',color='black')

    x2=arange(0,len(weights_array2),1)
    #繪制w0與迭代次數關系
    axs[0][1].plot(x2,weights_array2[:,0])
    axs0_title_text=axs[0][1].set_title('隨機梯度上升算法：回歸系數與迭代次數關系')
    axs0_ylabel_text=axs[0][1].set_ylabel('W0')
    plt.setp(axs0_title_text,size=20,weight='bold',color='black')
    plt.setp(axs0_ylabel_text,size=20,weight='bold',color='black')
    #繪制w1與迭代次數關系
    axs[1][1].plot(x2,weights_array2[:,1])
    axs1_ylabel_text=axs[1][1].set_ylabel('W1')
    plt.setp(axs1_ylabel_text,size=20,weight='bold',color='black')
    #繪制w2與迭代次數關系
    axs[2][1].plot(x2,weights_array2[:,2])
    axs2_xlabel_text=axs[2][1].set_title('迭代次數')
    axs2_ylabel_text=axs[2][1].set_ylabel('W2')
    plt.setp(axs2_xlabel_text,size=20,weight='bold',color='black')
    plt.setp(axs2_ylabel_text,size=20,weight='bold',color='black')

    plt.show()

# 繪圖
dataMat, labelMat = loadDataSet()
weights1, weights_array1 = gradAscent(dataMat, labelMat)
weights2, weights_array2 = stocGradAscent1(array(dataMat), labelMat)
plotWeights(weights_array1, weights_array2)

#改進的隨機梯度上升算法
import random

def stocGradAscent1(dataMatrix,classLabels,numIter=150):
    m,n=shape(dataMatrix)
    weights=ones(n)
    for j in range(numIter):
        dataIndex=list(range(m))
        for i in range(m):
            alpha=4/(1.0+j+i)+0.01            #每次調整alpha值
            randIndex=int(random.uniform(0,len(dataIndex))) #隨機選取樣本，可以減少周期性波動
            h=sigmoid(sum(dataMatrix[randIndex]*weights))   #梯度上升這里是向量，隨機梯度上升數據類型都是數值
            error=classLabels[randIndex]-h
            weights=weights+alpha*error*dataMatrix[randIndex]
            del(dataIndex[randIndex])
    return weights

dataMat, labelMat = loadDataSet()
weights = stocGradAscent1(array(dataMat), labelMat)
plotBestFit(weights)

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Logistic回歸 logistic回歸 logistic 回歸與線性回歸的比較 Logistic回歸和SVM的異同邏輯回歸（Logistic Regression）邏輯回歸（Logistic Regression） Logistic回歸 python實現 logistic回歸模型 LOGISTIC回歸分析 Logistic回歸（邏輯回歸）和softmax回歸