Lasso回歸

本文轉載自查看原文 2020-05-08 11:06 1487 scikit-learn學習筆記

Lasso 是一個線性模型，它給出的模型具有稀疏的系數（sparse coefficients）。它在一些場景中是很有用的，因為它傾向於使用較少參數的情況，能夠有效減少給定解決方案所依賴變量的個數。因此，Lasso 及其變體是壓縮感知（compressed sensing）領域的基礎。在某些特定條件下，它能夠恢復非零權重的精確解。

在數學公式表達上，它由一個帶有l1先驗的正則項的線性模型組成。其最小化的目標函數是：

min_{w} \frac{1}{2 n_{s a m p l e s}} | | X w - y | |_{2}^{2} + α | | w | |_{1}

lasso estimator 解決了加上懲罰項 α||ω||1的最小二乘的最小化，其中，α是一個常數，||ω||1是參數向量l1-norm的范數。

from sklearn.linear_model import Lasso

lasso = Lasso()
lasso.fit([[0, 0], [1, 1]], [0,1])
print("coef: {}".format(lasso.coef_))
print(lasso.predict([[1, 1]]))

coef: [0. 0.]
[0.5]

from sklearn.linear_model import Lasso

lasso01 = Lasso(alpha=0.1)
lasso01.fit([[0, 0], [1, 1]], [0,1])
print("coef: {}".format(lasso01.coef_))
print(lasso01.predict([[1, 1]]))

coef: [0.6 0. ]
[0.8]

在人工產生的被加性噪聲污染的稀疏信號上估計Lasso和Elastic-Net回歸模型。估計出的稀疏與真實的稀疏進行比較。

print(__doc__)

import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import r2_score
%matplotlib notebook

# 產生一些稀疏數據
np.random.seed(42)
n_samples, n_features = 50, 200
X = np.random.randn(n_samples, n_features)  # randn(...)產生的是正態分布的數據
coef = 3 * np.random.randn(n_features)  # 每個特征對應一個系數
inds = np.arange(n_features)
np.random.shuffle(inds)
coef[inds[10:]] = 0  # 稀疏化系數--隨機地把系數向量1x200的其中190個值變為0
y = np.dot(X, coef)  # 線性運算--y = X .*w

# 添加噪聲：零均值，標准差為0.01的高斯噪聲
y += 0.01 * np.random.normal(size=n_samples)

# 將數據划分為訓練集和測試集
n_samples = X.shape[0]
X_train, y_train = X[: n_samples // 2], y[: n_samples // 2]
X_test, y_test = X[n_samples // 2: ], y[n_samples // 2: ]

# 訓練 Lasso 模型
from sklearn.linear_model import Lasso

alpha = 0.1
lasso = Lasso(alpha=alpha)

y_pred_lasso = lasso.fit(X_train, y_train).predict(X_test)
r2_score_lasso = r2_score(y_test, y_pred_lasso)
print(lasso)
print("r^2 on test data:\n{:.2f}".format(r2_score_lasso))

# 訓練 ElasticNet 模型
from sklearn.linear_model import ElasticNet

enet = ElasticNet(alpha=alpha, l1_ratio=0.7)
y_pred_enet = enet.fit(X_train, y_train).predict(X_test)
r2_score_enet = r2_score(y_test, y_pred_enet)
print(enet)
print("r^2 on test data:\n{:.2f}".format(r2_score_enet))

# 畫圖
plt.plot(enet.coef_, color='lightgreen', linewidth=2, label='Elastic net coefficients')
plt.plot(lasso.coef_, color='gold', linewidth=2, label='Lasso coefficients')
plt.plot(coef, '--', color='navy', label='original coefficient')
plt.legend(loc='best')
plt.title("Lasso r^2: {:.2f}, ElasticNet r^2: {:.2f}".format(r2_score_lasso, r2_score_enet))

Automatically created module for IPython interactive environment
Lasso(alpha=0.1, copy_X=True, fit_intercept=True, max_iter=1000,
   normalize=False, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False)
r^2 on test data:
0.39
ElasticNet(alpha=0.1, copy_X=True, fit_intercept=True, l1_ratio=0.7,
      max_iter=1000, normalize=False, positive=False, precompute=False,
      random_state=None, selection='cyclic', tol=0.0001, warm_start=False)
r^2 on test data:
0.24


<IPython.core.display.Javascript object>

Text(0.5,1,'Lasso r^2: 0.39, ElasticNet r^2: 0.24')

設置正則化參數

alpha 參數控制着估計出的模型的系數的稀疏度

使用交叉驗證

scikit-learn 通過交叉驗證來公開設置 Lasso alpha 參數的對象：LassoCV 和 LassoLarsCV。LassoLarsCV是基於最小角回歸的算法。

對於帶有很多共線回歸器（collinearity）的高維數據集，LassoCV 是經常被選擇的模型。然而，LassoLarsCV在尋找更有相關性的 alpha 參數值上更有優勢，而且如果樣本數量與特征數量相比非常小時，通常LassoLarsCV比LassoCV要快。

基於信息標准的模型選擇

作為替代方案，估計器 LassoLarsIC 建議使用Akaike信息准則（AIC）和Bayes信息准則（BIC）。使用基於信息准則的方法尋找alpha的最優值是一種計算成本較低的方法，因為這種方法中正則化路徑只計算一次而不是使用k-fold交叉驗證時的k+1次。然而，這類准則需要對解的自由度進行適當的估計，是為大樣本（漸近結果）導出的，並假定模型是正確的（即數據實際上是由該模型生成的）。當問題條件數不好（特征數大於樣本數），模型可能會崩潰。

對於交叉驗證，使用兩種算法在20-fold上計算Lasso路徑（path）：坐標下降（由LassoCV類實現）和Lars（最小角回歸）（由LassoLarsCV類實現）。這兩種算法給出的結果大致相同。它們在執行速度和數值誤差來源方面存在差異。

Lars只為路徑中的每個扭結（Kink）計算其路徑解（path solution）。因此，當只有很少的扭結時，它是非常有效的，如果有很少的特征或樣本那么扭結就會很少。此外，它能夠計算完整的路徑而不設置任何元參數。相反，坐標下降法計算預先指定的網格上的路徑點（這里使用默認值）。因此，如果網格點的數目小於路徑中的扭結數，坐標下降法效率更高。在數值誤差方面，對於高度相關的變量，Lars會積累更多的誤差，而坐標下降算法只會對網格上的路徑進行采樣。

注意alpha的最優值在每個fold上是如何變化的。這說明了為什么在試圖評估通過交叉驗證選擇參數的方法的性能時，嵌套交叉驗證時必要的：對於未見數據，這種參數選擇可能不是最優的。

print(__doc__)

import time

import numpy as np
import matplotlib.pyplot as plt

from sklearn.linear_model import LassoCV, LassoLarsCV, LassoLarsIC
from sklearn import datasets

diabetes = datasets.load_diabetes()
X = diabetes.data
y = diabetes.target

rng = np.random.RandomState(42)
X = np.c_[X, rng.randn(X.shape[0], 14)]  # 添加一些壞特征

# 將數據標准化以便比較
X /= np.sqrt(np.sum(X ** 2, axis=0))

# LassoLarsIC：使用BIC/AIC 准則的最小角回歸（Lars）

model_bic = LassoLarsIC(criterion='bic')
t1 = time.time()
model_bic.fit(X, y)
t_bic = time.time() - t1
alpha_bic = model_bic.alpha_

model_aic = LassoLarsIC(criterion='aic')
model_aic.fit(X, y)
alpha_aic = model_aic.alpha_

def plot_ic_criterion(model, name, color):
    alpha_ = model.alpha_
    alphas_ = model.alphas_
    criterion_ = model.criterion_
    plt.plot(-np.log10(alphas_), criterion_, '--', color=color, linewidth=3, label='{} criterion'.format(name))
    plt.axvline(-np.log10(alpha_), color=color, linewidth=3, label='alpha: {} estimate'.format(name))
    plt.xlabel('-log(alpha)')
    plt.ylabel('criterion')

plt.figure()
plot_ic_criterion(model_aic, 'AIC', 'b')
plot_ic_criterion(model_bic, 'BIC', 'r')
plt.legend()
plt.title('Information-criterion for model selection (training time %.3fs)' % t_bic)

# LassoCV：坐標下降法（coordinate descent）

# 計算正則化路徑
print("Computing regularization path using the coordinate descent lasso...")
t1 = time.time()
model = LassoCV(cv=20).fit(X, y)
t_lasso_cv = time.time() - t1

# 展示結果
m_log_alphas = -np.log10(model.alphas_)

plt.figure()
ymin, ymax = 2300, 3800
plt.plot(m_log_alphas, model.mse_path_, ':')
plt.plot(m_log_alphas, model.mse_path_.mean(axis=-1), 'k', label='Average across the folds', linewidth=2)
plt.axvline(-np.log10(model.alpha_), linestyle='--', color='k', label='alpha: CV estimate')

plt.legend()
plt.xlabel('-log(alpha)')
plt.ylabel('Mean square error')
plt.title("Mean square eoor on each fold: coordinate descent (train time: {:.2f}s)".format(t_lasso_cv))

plt.axis('tight')
plt.ylim(ymin, ymax)

# LassoLarsCV：最小角回歸（Least angle regression）

# 計算正則化路徑
print("Computing regularization path using the Lars lasso...")
t1 = time.time()
model = LassoLarsCV(cv=20).fit(X, y)
t_lasso_lars_cv = time.time() - t1

# 展示結果
m_log_alphas = -np.log10(model.cv_alphas_)

plt.figure()
plt.plot(m_log_alphas, model.mse_path_, ':')
plt.plot(m_log_alphas, model.mse_path_.mean(axis=1), 'k', label='Average across the folds', linewidth=2)
plt.axvline(-np.log10(model.alpha_), linestyle='--', color='k', label='alpha CV')
plt.legend()
plt.xlabel('-log(alpha)')
plt.ylabel('Mean square error')
plt.title('Mean square error on ecah fold: Lars(train time {:.2f}s)'.format(t_lasso_lars_cv))
plt.axis('tight')
plt.ylim(ymin, ymax)

Automatically created module for IPython interactive environment


<IPython.core.display.Javascript object>

Computing regularization path using the coordinate descent lasso...

C:\Users\Administrator\Anaconda3\lib\site-packages\ipykernel_launcher.py:37: RuntimeWarning: divide by zero encountered in log10


<IPython.core.display.Javascript object>

Computing regularization path using the Lars lasso...

C:\Users\Administrator\Anaconda3\lib\site-packages\ipykernel_launcher.py:82: RuntimeWarning: divide by zero encountered in log10


<IPython.core.display.Javascript object>

(2300, 3800)

與SVM的正則化參數的比較

alpha 和 SVM 的正則化參數 C 之間的等式關系是 alpha = 1 / C 或者 alpha = 1 / (n_samples * C)，並依賴於估計器和模型優化的確切的目標函數。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 嶺回歸與Lasso回歸嶺回歸與LASSO回歸模型嶺回歸與Lasso回歸模型嶺回歸和lasso回歸（轉） python實現線性回歸之lasso回歸回歸算法比較【線性回歸，Ridge回歸，Lasso回歸】多元線性回歸模型的特征壓縮：嶺回歸和Lasso回歸 R語言-嶺回歸及lasso算法再談Lasso回歸 | elastic net | Ridge Regression scikit-learn中的嶺回歸（Ridge Regression）與Lasso回歸