前言

由於本部分內容講解資源較多，本文不做過多敘述，重點放在實際問題的應用上。

一、線性回歸

線性回歸中的線性指的是對於參數的線性的，對於樣本的特征不一定是線性的。

線性模型（矩陣形式）：y=XA+e

其中：A為參數向量，y為向量，X為矩陣,e為噪聲向量。

對於線性模型，通常采用最小二乘法作為其解法（可通過最大似然估計推得）。

最小二乘法是通過最小化誤差的平方和尋找數據的最佳函數匹配。

最小二乘法的解法有很多種，通常有：

解析法即求通過函數的導數為0確定函數的極值點

矩陣法-——解析法的矩陣形式

梯度下降法：在求解損失函數的最小值時，可以通過梯度下降來一步步的迭代求解，得到最小化的損失函數和模型參數值。

常見的方式有三種，分別是：批量梯度下降法BGD、隨機梯度下降法SGD、小批量梯度下降法MBGD。

其他優化算法，如牛頓法等。

為避免過擬合，通常在線性回歸模型中加入正則項，分為以下三類：

二、Logistic回歸

sigmoid函數: y=1/(1+exp(-x))

模型：

假設 P(y=1|x;θ)=h_θ(x)

最大似然估計方法（損失函數）

通過梯度下降法得到

softmax回歸（多目標分類）

三、實踐練習

數據集采用sklearn自帶數據集 Boston ，由房屋的特征預測房屋的價格。

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import numpy as np
from sklearn import datasets
import matplotlib.pyplot as plt
import matplotlib as mpl
from sklearn.linear_model import Lasso, Ridge
from sklearn.model_selection import GridSearchCV

dataset=datasets.load_boston()

X=dataset.data
Y=dataset.target

#X_norm=StandardScaler().fit_transform(X)
X_norm=X #是否標准化對值沒有影響，這也是可以解釋的
x_train, x_test, y_train, y_test = train_test_split(X_norm, Y, train_size=0.7, random_state=0)

model1 = LinearRegression()
model1.fit(x_train,y_train)
print(model1)
print(model1.coef_, model1.intercept_)

#model2 = Ridge()
model2=Lasso()
alpha_can = np.logspace(-3, 2, 10)
lasso_model = GridSearchCV(model2, param_grid={'alpha':alpha_can}, cv=5)
lasso_model.fit(x_train, y_train)
print('超參數：\n', lasso_model.best_params_)

order = y_test.argsort(axis=0)  #對測試樣本排序，便於顯示
y_test = y_test[order]
x_test = x_test[order, :]

y_hat1=model1.predict(x_test)
mse1 = np.average((y_hat1 - np.array(y_test)) ** 2)
print('MSE-LR = ', mse1)

y_hat2=lasso_model.predict(x_test)
mse2 = np.average((y_hat2 - np.array(y_test)) ** 2)
print('MSE-LASSO = ', mse2)


mpl.rcParams['font.sans-serif'] = [u'simHei']
mpl.rcParams['axes.unicode_minus'] = False
plt.figure(facecolor='w',figsize=(15, 8))
plt.plot(y_test, 'r-', lw=1, label=u'真實值')
plt.plot(y_hat1, 'g-', lw=1, label=u'線性回歸估計值')
plt.plot(y_hat2, 'b-', lw=1, label=u'Lasso估計值')
plt.legend(loc='upper left')
plt.title(u'線性回歸模型波士頓房價預測', fontsize=18)
plt.xlabel(u'樣本編號', fontsize=15)
plt.ylabel(u'房屋價格', fontsize=15)
plt.grid()
plt.show()

結果如下圖：

圖1

圖中顯示，采用普通線性回歸的結果和采用Lasso的結果基本一致，甚至在圖上難以區分，如采用Ridge方法，結果也基本一致。

用多項式特征來擬合：

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import numpy as np
from sklearn import datasets
import matplotlib.pyplot as plt
import matplotlib as mpl
from sklearn.linear_model import Lasso, Ridge
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline


dataset=datasets.load_boston()

X=dataset.data
Y=dataset.target


x_train, x_test, y_train, y_test = train_test_split(X, Y, train_size=0.7, random_state=0)
order = y_test.argsort(axis=0)  #對測試樣本排序，便於顯示
y_test = y_test[order]
x_test = x_test[order, :]
for degree in np.arange(1,5):
    model = make_pipeline(PolynomialFeatures(degree),Lasso(alpha=0.1))
    model.fit(x_train,y_train)
    y_hat=model.predict(x_test)
    mse = np.average((y_hat - np.array(y_test)) ** 2)
    print('當模型次數為：%d'%degree)
    print('MSE = ', mse)
    mpl.rcParams['font.sans-serif'] = [u'simHei']
    mpl.rcParams['axes.unicode_minus'] = False
    plt.figure(facecolor='w',figsize=(10, 5))
    plt.plot(y_test, 'r-', lw=1, label=u'真實值')
    plt.plot(y_hat, 'g-', lw=1, label='degree %d' % degree)
    plt.legend(loc='upper left')
    plt.title(u'線性回歸模型波士頓房價預測', fontsize=18)
    plt.xlabel(u'樣本編號', fontsize=15)
    plt.ylabel(u'房屋價格', fontsize=15)
    plt.grid()
plt.show()

結果：
當模型次數為：1
MSE =  28.8704970547
當模型次數為：2
MSE =  17.7654067367
當模型次數為：3
MSE =  22.899129217
當模型次數為：4
MSE =  29.5967408848

結果顯示當多項式區二次時得到最好的擬合效果

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 機器學習（六）— logistic回歸機器學習-Logistic回歸機器學習之線性回歸機器學習之線性回歸機器學習 —— 基礎整理（五）線性回歸；二項Logistic回歸；Softmax回歸及其梯度推導；廣義線性模型機器學習 (三) 邏輯回歸 Logistic Regression 機器學習4logistic回歸【機器學習實戰】第5章 Logistic回歸機器學習之邏輯回歸（Logistic Regression）【機器學習】邏輯回歸（Logistic Regression）