Softmax Regression是邏輯回歸在多分類問題上的推廣,主要用於處理多分類問題,其中任意兩個類別之間都是線性可分的。
假設有$k$個類別,每個類別的參數向量為${\theta}_j $,那么對於每個樣本,其所屬類別的概率為:
\[P({{y}_{i}}|X,{{\theta }_{j}})=\frac{{{e}^{{{\theta }_{j}}X}}}{\sum\limits_{l=1}^{k}{{{e}^{{{\theta }_{l}}X}}}}\]
相比如邏輯回歸的損失函數,Softmax的損失函數引入了指示函數,其損失函數為:
$J\left( \theta \right)=-\frac{1}{m}\left[ \sum\limits_{i=1}^{m}{\sum\limits_{j=1}^{k}{I\left\{ {{y}_{i}}=j \right\}\log \frac{{{e}^{{{\theta }_{j}}X}}}{\sum\limits_{l=1}^{k}{{{e}^{{{\theta }_{l}}X}}}}}} \right]$
該損失函數的意義是對每一個樣本判斷其屬於哪個類別,並進行相應計算。對該損失函數,可以使用梯度下降法求解,梯度計算過程如下:
${{\nabla }_{{{\theta }_{j}}}}J(\theta )=-\frac{1}{m}\sum\limits_{i=1}^{m}{\left[ {{\nabla }_{{{\theta }_{j}}}}\sum\limits_{l=1}^{k}{I\{{{y}_{i}}=j\}\log \frac{{{e}^{{{\theta }_{j}}X}}}{\sum\limits_{l=1}^{k}{{{e}^{{{\theta }_{l}}X}}}}} \right]}$
$ =-\frac{1}{m}\sum\limits_{i=1}^{m}{[I\{{{y}_{i}}=j\}\frac{\sum\limits_{l=1}^{k}{{{e}^{{{\theta }_{l}}X}}}}{{{e}^{{{\theta }_{j}}X}}}\cdot \frac{{{e}^{{{\theta }_{j}}X}}\cdot X\cdot \sum\limits_{l=1}^{k}{{{e}^{{{\theta }_{l}}X}}}-{{e}^{{{\theta }_{j}}X}}\cdot {{e}^{{{\theta }_{j}}X}}\cdot X}{{{\sum\limits_{l=1}^{k}{{{e}^{{{\theta }_{l}}X}}}}^{2}}}]}$
$ =-\frac{1}{m}\sum\limits_{i=1}^{m}{I\{{{y}_{i}}=j\}\frac{\sum\limits_{l=1}^{k}{{{e}^{{{\theta }_{l}}X}}}-{{e}^{{{\theta }_{j}}X}}}{\sum\limits_{l=1}^{k}{{{e}^{{{\theta }_{l}}X}}}}}\cdot X $
$=-\frac{1}{m}\sum\limits_{i=1}^{m}{\left[ (I\{{{y}_{i}}=j\}-P({{y}_{i}}=j||X,{{\theta }_{j}}))\cdot X \right]} $
對每個類別,分別求其${\theta}_j$的梯度並計算,Python代碼如下:
# -*- coding: utf-8 -*- """ Created on Sun Jan 28 15:32:44 2018 @author: zhang """ import numpy as np from sklearn.datasets import load_digits from sklearn.cross_validation import train_test_split from sklearn import preprocessing def load_data(): digits = load_digits() data = digits.data label = digits.target return np.mat(data), label def gradient_descent(train_x, train_y, k, maxCycle, alpha): # k 為類別數 numSamples, numFeatures = np.shape(train_x) weights = np.mat(np.ones((numFeatures, k))) for i in range(maxCycle): value = np.exp(train_x * weights) rowsum = value.sum(axis = 1) # 橫向求和 rowsum = rowsum.repeat(k, axis = 1) # 橫向復制擴展 err = - value / rowsum #計算出每個樣本屬於每個類別的概率 for j in range(numSamples): err[j, train_y[j]] += 1 weights = weights + (alpha / numSamples) * (train_x.T * err) return weights def test_model(test_x, test_y, weights): results = test_x * weights predict_y = results.argmax(axis = 1) count = 0 for i in range(np.shape(test_y)[0]): if predict_y[i,] == test_y[i,]: count += 1 return count / len(test_y), predict_y if __name__ == "__main__": data, label = load_data() # data = preprocessing.minmax_scale(data, axis = 0) # 數據處理之后識別率降低了 train_x, test_x, train_y, test_y = train_test_split(data, label, test_size = 0.25, random_state=33) k = len(np.unique(label)) weights = gradient_descent(train_x, train_y, k, 800, 0.01) accuracy, predict_y = test_model(test_x, test_y, weights) print("Accuracy:", accuracy)
