Python-線性回歸模型

本文轉載自查看原文 2022-02-10 14:48 1136

從線性回歸(Linear regression)開始學習回歸分析，線性回歸是最早的也是最基本的模型——把數據擬合成一條直線。數據集使用scikit-learn里的數據集boston,boston數據集很適合用來演示線性回歸。boston數據集包含了波士頓地區的房屋價格中位數。還有一些可能會影響房價的因素，比如犯罪率（crime rate）。

加載數據

from sklearn import datasets
boston = datasets.load_boston()
import pandas as pd
import warnings # 用來忽略seaborn繪圖庫產生的warnings
warnings.filterwarnings("ignore")
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="white", color_codes=True)
%matplotlib inline

dfdata = pd.DataFrame(boston.data,columns=boston.feature_names)
dfdata["target"] = boston.target
dfdata.head()

fig = plt.figure()
for i,f in enumerate(boston.feature_names):
    sns.jointplot(x=f, y="target", data=dfdata, kind='reg', size=6)

線性回歸模型

用scikit-learn的線性回歸非常簡單
首先，導入LinearRegression類創建一個對象：

from sklearn.linear_model import LinearRegression
lr = LinearRegression()

現在，再把自變量和因變量傳給LinearRegression的fit方法：
lr.fit(boston.data, boston.target)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

開始預測

predictions = lr.predict(boston.data)

用預測值與實際值的殘差（residuals）直方圖分布來直觀顯示預測結果：

%matplotlib inline
f, ax = plt.subplots(figsize=(7, 5))
f.tight_layout()
ax.hist(boston.target-predictions,bins=40, label='Residuals Linear', color='b', alpha=.5);
ax.set_title("Histogram of Residuals")
ax.legend(loc='best');

lr.coef_

def plotCofBar(x_feature,y_cof):
    x_value = range(len(x_feature))
    plt.bar(x_value, y_cof, alpha = 1, color = 'r', align="center")
    plt.autoscale(tight=True)
    plt.xticks([i for i in range(len(x_feature))],x_feature,rotation="90")
    plt.xlabel("feature names")
    plt.ylabel("cof")
    plt.title("The cof of Linear regression")
    plt.show()

plotCofBar(boston.feature_names,lr.coef_)

線性回歸原理
線性回歸的基本理念是找出滿足 y=Xβy=Xβ 的相關系數集合 ββ ，其中 XX 是因變量數據矩陣。想找一組完全能夠滿足等式的相關系數很難，因此通常會增加一個誤差項表示不精確程度或測量誤差。因此，方程就變成了 y=Xβ+ϵy=Xβ+ϵ，其中 ϵϵ 被認為是服從正態分布且與 XX 獨立的隨機變量。用幾何學的觀點描述，就是說這個變量與 XX 是正交的（perpendicular）。可以證明 E(Xϵ)=0E(Xϵ)=0。

為了找到相關系數集合 ββ ，我們最小化誤差項，這轉化成了殘差平方和最小化問題。

這個問題可以用解析方法解決，其解是:

線性回歸可以自動標准正態化(normalize或scale)輸入數據

lr2 = LinearRegression(normalize=True)
lr2.fit(boston.data, boston.target)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=True)

predictions2 = lr2.predict(boston.data)
%matplotlib inline
from matplotlib import pyplot as plt
f, ax = plt.subplots(figsize=(7, 5))
f.tight_layout()
ax.hist(boston.target-predictions2,bins=40, label='Residuals Linear', color='b', alpha=.5);
ax.set_title("Histogram of Residuals")
ax.legend(loc='best');

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Python多元線性回歸模型 python pytorch numpy DNN 線性回歸模型多元線性回歸模型 Python機器學習/LinearRegression（線性回歸模型）（附源碼）一元線性回歸模型線性模型-線性回歸、Logistic分類淺談線性、非線性和廣義線性回歸模型 python實現線性回歸之嶺回歸 python實現線性回歸之彈性網回歸 python--線性回歸、局部加權回歸