線性回歸曲線和過擬合判斷


import matplotlib.pyplot as plt
import mglearn
from scipy import sparse
import numpy as np
import matplotlib as mt
import pandas as pd
from IPython.display import display
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

#wave數據集
#wave數據集只有一個特征
#公式為y=w[0]x[0]+b
#w為斜率,b為軸偏移或截距,分別在sklearn中使用 coef_[0],  intercept_表示
mglearn.plots.plot_linear_regression_wave()
plt.show()

#boston數據集
#boston數據集有506個樣本,105個特征
X, y = mglearn.datasets.load_extended_boston()
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
lr = LinearRegression().fit(X_train, y_train)
print("Training set score: {:.2f}".format(lr.score(X_train, y_train)))
print("Test set score: {:.2f}".format(lr.score(X_test, y_test)))

結果:

w[0]: 0.393906 b: -0.031804

plot_linear_regression_wave源碼
import numpy as np
import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression from sklearn.model_selection import train_test_split from .datasets import make_wave from .plot_helpers import cm2 def plot_linear_regression_wave(): X, y = make_wave(n_samples=60) X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42) line = np.linspace(-3, 3, 100).reshape(-1, 1) lr = LinearRegression().fit(X_train, y_train) print("w[0]: %f b: %f" % (lr.coef_[0], lr.intercept_)) plt.figure(figsize=(8, 8)) plt.plot(line, lr.predict(line)) plt.plot(X, y, 'o', c=cm2(0)) ax = plt.gca() ax.spines['left'].set_position('center') ax.spines['right'].set_color('none') ax.spines['bottom'].set_position('center') ax.spines['top'].set_color('none') ax.set_ylim(-3, 3) #ax.set_xlabel("Feature") #ax.set_ylabel("Target") ax.legend(["model", "training data"], loc="best") ax.grid(True) ax.set_aspect('equal')

 

結果2:


Training set score: 0.95
Test set score: 0.61

可以看出出現了過擬合,這是因為波士頓房價的各個特征的差距非常大,不適合使用最小二乘法,需要使用“正則化”來做顯式約束,使用嶺回歸避免過擬合。

Ridge嶺回歸用到L2正則化。

Lasso回歸用到L1正則,還可以使用ElasticNet彈性網絡回歸。

 
       


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM