Softmax Regression及Python代碼


  Softmax Regression是邏輯回歸在多分類問題上的推廣,主要用於處理多分類問題,其中任意兩個類別之間都是線性可分的。

  假設有$k$個類別,每個類別的參數向量為${\theta}_j $,那么對於每個樣本,其所屬類別的概率為:

\[P({{y}_{i}}|X,{{\theta }_{j}})=\frac{{{e}^{{{\theta }_{j}}X}}}{\sum\limits_{l=1}^{k}{{{e}^{{{\theta }_{l}}X}}}}\]

  相比如邏輯回歸的損失函數,Softmax的損失函數引入了指示函數,其損失函數為:

$J\left( \theta  \right)=-\frac{1}{m}\left[ \sum\limits_{i=1}^{m}{\sum\limits_{j=1}^{k}{I\left\{ {{y}_{i}}=j \right\}\log \frac{{{e}^{{{\theta }_{j}}X}}}{\sum\limits_{l=1}^{k}{{{e}^{{{\theta }_{l}}X}}}}}} \right]$

   該損失函數的意義是對每一個樣本判斷其屬於哪個類別,並進行相應計算。對該損失函數,可以使用梯度下降法求解,梯度計算過程如下:

${{\nabla }_{{{\theta }_{j}}}}J(\theta )=-\frac{1}{m}\sum\limits_{i=1}^{m}{\left[ {{\nabla }_{{{\theta }_{j}}}}\sum\limits_{l=1}^{k}{I\{{{y}_{i}}=j\}\log \frac{{{e}^{{{\theta }_{j}}X}}}{\sum\limits_{l=1}^{k}{{{e}^{{{\theta }_{l}}X}}}}} \right]}$

$ =-\frac{1}{m}\sum\limits_{i=1}^{m}{[I\{{{y}_{i}}=j\}\frac{\sum\limits_{l=1}^{k}{{{e}^{{{\theta }_{l}}X}}}}{{{e}^{{{\theta }_{j}}X}}}\cdot \frac{{{e}^{{{\theta }_{j}}X}}\cdot X\cdot \sum\limits_{l=1}^{k}{{{e}^{{{\theta }_{l}}X}}}-{{e}^{{{\theta }_{j}}X}}\cdot {{e}^{{{\theta }_{j}}X}}\cdot X}{{{\sum\limits_{l=1}^{k}{{{e}^{{{\theta }_{l}}X}}}}^{2}}}]}$

$ =-\frac{1}{m}\sum\limits_{i=1}^{m}{I\{{{y}_{i}}=j\}\frac{\sum\limits_{l=1}^{k}{{{e}^{{{\theta }_{l}}X}}}-{{e}^{{{\theta }_{j}}X}}}{\sum\limits_{l=1}^{k}{{{e}^{{{\theta }_{l}}X}}}}}\cdot X $

$=-\frac{1}{m}\sum\limits_{i=1}^{m}{\left[ (I\{{{y}_{i}}=j\}-P({{y}_{i}}=j||X,{{\theta }_{j}}))\cdot X \right]} $

  對每個類別,分別求其${\theta}_j$的梯度並計算,Python代碼如下:

# -*- coding: utf-8 -*-
"""
Created on Sun Jan 28 15:32:44 2018

@author: zhang
"""

import numpy as np
from sklearn.datasets import load_digits
from sklearn.cross_validation import train_test_split
from sklearn import preprocessing

def load_data():
     digits = load_digits()
     data = digits.data
     label = digits.target
     return np.mat(data), label

def gradient_descent(train_x, train_y, k, maxCycle, alpha):
# k 為類別數
     numSamples, numFeatures = np.shape(train_x)
     weights = np.mat(np.ones((numFeatures, k)))
     
     for i in range(maxCycle):
          value = np.exp(train_x * weights)  
          rowsum = value.sum(axis = 1)   # 橫向求和
          rowsum = rowsum.repeat(k, axis = 1)  # 橫向復制擴展
          err = - value / rowsum  #計算出每個樣本屬於每個類別的概率
          for j in range(numSamples):     
               err[j, train_y[j]] += 1
          weights = weights + (alpha / numSamples) * (train_x.T * err)
     return weights
    
     

def test_model(test_x, test_y, weights):
     results = test_x * weights
     predict_y = results.argmax(axis = 1)
     count = 0
     for i in range(np.shape(test_y)[0]):
          if predict_y[i,] == test_y[i,]:
               count += 1   
     return count / len(test_y), predict_y 


if __name__ == "__main__":

     data, label = load_data()
#     data = preprocessing.minmax_scale(data, axis = 0)
#    數據處理之后識別率降低了
     train_x, test_x, train_y, test_y = train_test_split(data, label, test_size = 0.25, random_state=33)
     k = len(np.unique(label))
     
     weights = gradient_descent(train_x, train_y, k, 800, 0.01)
     accuracy, predict_y = test_model(test_x, test_y, weights)
     print("Accuracy:", accuracy)

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM