線性回歸是最典型的回歸問題,其目標值與所有的特征之間存在線性關系。線性回歸於邏輯回歸類似,不同的是,邏輯回歸在線性回歸的基礎上加了邏輯函數,從而將線性回歸的值從實數域映射到了0-1,通過設定閥值,便實現了回歸的0-1分類,即二分類。
線性回歸函數$Y=XW$,其中Y是1*n維向量,X是n*m矩陣,W是m*1的系數矩陣。線性回歸采用平方損失函數,至於為什么采用平方損失函數,是因為平方損失函數采用的是最小二乘法的思想,其殘差滿足正態分布的最大似然估計,詳情可百度。
線性回歸損失函數:${{l}_{w}}=\sum\limits_{i=1}^{n}{{{\left( {{y}_{i}}-X_{i}W \right)}^{2}}}$,其中$X_i$是1*m維向量,W是m*1維向量。
線性回歸的解法有很多種,如直接最小二乘法,梯度下降法和牛頓法等。
1. 最小二乘法
直接最小二乘法是利用矩陣變換,直接求得系數向量W的矩陣解,過程如下。
預測函數為:$Y=XW$,其損失函數表示為:${{Y-XW}^{T}}(Y-XW)$
對W求導可得:$\frac{d}{dW}{{Y-XW}^{T}}(Y-XW)={{X}^{T}}(Y-XW)$,其求導過程需要矩陣求導的知識。
另導數為0,得到:$W={{\left( {{X}^{T}}X \right)}^{-1}}{{X}^{T}}Y$
2. 梯度下降法
線性回歸損失函數:${{l}_{w}}=\sum\limits_{i=1}^{n}{{{\left( {{y}_{i}}-X_{i}W \right)}^{2}}}$,要求損失函數的最小值,對參數W求偏導,得:
\[\frac{\partial {{\text{l}}_{w}}}{\partial W}=\sum\limits_{i=1}^{n}{\left( {{y}_{i}}-{{X}_{i}}W \right)}\cdot {{X}_{i}}\]
由上式可知,參數W的梯度和邏輯回歸中類似,是每個樣本的殘差值乘以樣本對應的值,然后累加起來,第$j$個參數的梯度是對所有樣本第$j$特征執行上述操作。
線性回歸最小二乘和梯度下降法Python代碼如下:
# -*- coding: utf-8 -*- """ Created on Fri Jan 19 13:29:14 2018 @author: zhang """ import numpy as np from sklearn.datasets import load_boston import matplotlib.pyplot as plt from sklearn.cross_validation import train_test_split from sklearn import preprocessing """ 多元線性回歸需要對各變量進行標准化,因為在求系數wj梯度時,每個樣本計算值與其標簽值的差要與每個樣本對應的第j個屬性值相乘,然后求和 因此,如果屬性值之間的差異太大,會造成系數無法收斂 """ # 最小二乘法直接求解權重系數 def least_square(train_x, train_y): """ input:訓練數據(樣本*屬性)和標簽 """ weights = (train_x.T * train_x).I * train_x.T * train_y return weights # 梯度下降算法 def gradient_descent(train_x, train_y, maxCycle, alpha): numSamples, numFeatures = np.shape(train_x) weights = np.zeros((numFeatures,1)) for i in range(maxCycle): h = train_x * weights err = h - train_y weights = weights - (alpha * err.T * train_x).T return weights def stochastic_gradient_descent(train_x, train_y, maxCycle, alpha): numSamples, numFeatures = np.shape(train_x) weights = np.zeros((numFeatures,1)) for i in range(maxCycle): for j in range(numSamples): h = train_x[j,:] * weights err = h - train_y[j,0] weights = weights - (alpha * err.T * train_x[j,:]).T return weights def load_data(): boston = load_boston() data = boston.data label = boston.target return data, label def show_results(predict_y, test_y): plt.scatter(np.array(test_y), np.array(predict_y), marker='x', s= 30, c='red') # 畫圖的數據需要是數組而不能是矩陣 plt.plot(np.arange(0,50),np.arange(0,50)) plt.xlabel("original_label") plt.ylabel("predict_label") plt.title("LinerRegression") plt.show() if __name__ == "__main__": data, label = load_data() data = preprocessing.normalize(data.T).T train_x, test_x, train_y, test_y = train_test_split(data, label, train_size = 0.75, random_state = 33) train_x = np.mat(train_x) test_x = np.mat(test_x) train_y = np.mat(train_y).T # (3,)轉為矩陣變為行向量了,需要轉置 test_y = np.mat(test_y).T # weights = least_square(train_x, train_y) # predict_y = test_x * weights # show_results(predict_y, test_y) # weights = gradient_descent(train_x, train_y, 1000, 0.01) predict_y = test_x * weights show_results(predict_y, test_y) # weights = stochastic_gradient_descent(train_x, train_y, 100, 0.01) # predict_y = test_x * weights # show_results(predict_y, test_y)
