所謂嶺回歸就是在原有⽅程系數計算公式中添加了⼀個擾動項aI,原先⽆法求⼴義逆的情況變成可以求出其⼴義逆，使得問題穩定並得以求解，

其中 a是⾃定義參數， I則是單位矩陣

w=(x.T * x +aI).I * X.T * y

#對於對單位矩陣的⽣成，需要借助NumPy中的eye函數，該函數需要輸⼊對⻆矩陣規模參數
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

def ridgeRegres(dataSet, lam=0.2):
    xMat = np.mat(dataSet.iloc[:, :-1].values)
    yMat = np.mat(dataSet.iloc[:, -1].values).T
    xTx = xMat.T * xMat
    denom = xTx + np.eye(xMat.shape[1])*lam
    ws = denom.I * (xMat.T * yMat)
    return ws

aba = pd.read_table('abalone.txt', header = None)#該數據集源於UCI，記錄了鮑⻥的⽣物屬性，⽬標字段是該⽣物的年齡
aba.head()

#其中最后⼀列為標簽列，同時，由於數據集第⼀列是分類變量，且沒有⽤於承載截距系數的列，因此
#此處將第⼀列的值全部修改為1，然后再進⾏建模
aba.iloc[:, 0] = 1
aba.head()

rws = ridgeRegres(aba)
rws
#返回各列的系數同樣的第一行為截距

matrix([[  3.0178564 ],
        [ -0.02075972],
        [ 11.4007625 ],
        [ 11.02315972],
        [  8.71725704],
        [-19.64613377],
        [ -8.96257395],
        [  9.18180982]])

#封裝R**2
def rSquare(dataSet, regres):#設置參數為 數據集 與 回歸方法
    sse = sseCal(dataSet, regres) 
    y = dataSet.iloc[:, -1].values
    sst = np.power(y - y.mean(), 2).sum()
    return 1 - sse / sst

#封裝最小二乘
def standRegres(dataSet):
    xMat = np.mat(dataSet.iloc[:, :-1].values)#將DataFrame轉換成array 再將array轉換成matrix(矩陣) 因為只有矩陣可以進行計算
    yMat = np.mat(dataSet.iloc[:, -1].values).T
    xTx = xMat.T*xMat
    if np.linalg.det(xTx) == 0:#判斷xTx是否是滿秩矩陣，若不滿秩，則⽆法對其進⾏求逆矩陣的操作
        print("This matrix is singular, cannot do inverse")
        return
    ws = xTx.I * (xMat.T*yMat)
    return ws
#這⾥需要注意的是，當使⽤矩陣分解來求解多元線性回歸時，必須添加⼀列全為1的列，⽤於表征線性⽅程截距b。

#將SSE做一個封裝
def sseCal(dataSet, regres):#設置參數為 數據集 與 回歸方法
    n = dataSet.shape[0] 
    y = dataSet.iloc[:, -1].values
    ws = regres(dataSet)
    yhat = dataSet.iloc[:, :-1].values * ws
    yhat = yhat.reshape([n,])
    rss = np.power(yhat - y, 2).sum()
    return rss

rSquare(aba, ridgeRegres)

0.5274036413294183

rSquare(aba, standRegres)

0.5276299399919837

調⽤Scikit—Learn中嶺回歸算法驗證⼿動建模有效性

from sklearn import linear_model
ridge = linear_model.Ridge(alpha=.2)
ridge.fit(aba.iloc[:, :-1], aba.iloc[:, -1])

Ridge(alpha=0.2, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001)

ridge.coef_#返回各行系數

array([  0.        ,  -0.04157241,  11.39836699,  11.01582006,
         8.71877234, -19.64361088,  -8.9576113 ,   9.1872849 ])

ridge.intercept_#返回截距

3.0265413865908872

繪制嶺跡圖

繪制嶺跡圖基本函數基本思路為從⼩到⼤依次取 a，然后查看各特征列系數的衰減速度，從中查看是否存在公線性，以及的最佳取值

def ridgeTest(dataSet):
    xMat = np.mat(dataSet.iloc[:, :-1].values)
    yMat = np.mat(dataSet.iloc[:, -1].values).T
    yMean = np.mean(yMat, axis = 0)
    yMat = yMat - yMean
    xMeans = np.mean(xMat, axis = 0) 
    xVar = np.var(xMat,axis = 0) 
    xMat = (xMat - xMeans)/xVar
    numTestPts = 30
    wMat = np.zeros((numTestPts,xMat.shape[1]))
    for i in range(numTestPts):
        ws = ridgeRegres(dataSet,np.exp(i-10))
        wMat[i,:]=ws.T
    return wMat

aba = pd.read_table('abalone.txt', header = None)
ridgeWeights = ridgeTest(aba)
plt.plot(ridgeWeights)
plt.xlabel('log(lambda)')
plt.ylabel('weights')

Text(0,0.5,'weights')

把所有回歸系數的嶺跡都繪制在一張圖上，如果這些曲線比較穩定，如上圖所示，利用最小二乘估計會有一定的把握。

在Scikit-Learn中執⾏Lasso算法

from sklearn import linear_model
las = linear_model.Lasso(alpha = 0.01)
las.fit(aba.iloc[:, :-1], aba.iloc[:, -1])

Lasso(alpha=0.01, copy_X=True, fit_intercept=True, max_iter=1000,
   normalize=False, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False)

ridge.coef_#返回各行系數

array([  0.        ,  -0.04157241,  11.39836699,  11.01582006,
         8.71877234, -19.64361088,  -8.9576113 ,   9.1872849 ])

ridge.intercept_#返回截距

3.0265413865908872

	0	1	2	3	4	5	6	7	8
0	1	0.455	0.365	0.095	0.5140	0.2245	0.1010	0.150	15
1	1	0.350	0.265	0.090	0.2255	0.0995	0.0485	0.070	7
2	-1	0.530	0.420	0.135	0.6770	0.2565	0.1415	0.210	9
3	1	0.440	0.365	0.125	0.5160	0.2155	0.1140	0.155	10
4	0	0.330	0.255	0.080	0.2050	0.0895	0.0395	0.055	7

	0	1	2	3	4	5	6	7	8
0	1	0.455	0.365	0.095	0.5140	0.2245	0.1010	0.150	15
1	1	0.350	0.265	0.090	0.2255	0.0995	0.0485	0.070	7
2	1	0.530	0.420	0.135	0.6770	0.2565	0.1415	0.210	9
3	1	0.440	0.365	0.125	0.5160	0.2155	0.1140	0.155	10
4	1	0.330	0.255	0.080	0.2050	0.0895	0.0395	0.055	7

A-嶺回歸的python'實現

所謂嶺回歸就是在原有⽅程系數計算公式中添加了⼀個擾動項aI,原先⽆法求⼴義逆的情況變成可以求出其⼴義逆，使得問題穩定並得以求解，

調⽤Scikit—Learn中嶺回歸算法驗證⼿動建模有效性

繪制嶺跡圖

繪制嶺跡圖基本函數基本思路為從⼩到⼤依次取 a，然后查看各特征列系數的衰減速度，從中查看是否存在公線性，以及的最佳取值

把所有回歸系數的嶺跡都繪制在一張圖上，如果這些曲線比較穩定，如上圖所示，利用最小二乘估計會有一定的把握。

在Scikit-Learn中執⾏Lasso算法

免責聲明！

A-嶺回歸的python'實現

所謂嶺回歸就是在原有⽅程系數計算公式中添加了⼀個擾動項aI,原先⽆法求⼴義逆的情況變成可以求出其⼴義逆，使得問題穩定並得以求解，

調⽤Scikit—Learn中嶺回歸算法驗證⼿動建模有效性

繪制嶺跡圖

繪制嶺跡圖基本函數基本思路為從⼩到⼤依次取 a，然后查看各特征列系數的衰減速度，從中查看是否存在公線性，以及 的最佳取值

把所有回歸系數的嶺跡都繪制在一張圖上，如果這些曲線比較穩定，如上圖所示，利用最小二乘估計會有一定的把握。

在Scikit-Learn中執⾏Lasso算法

免責聲明！

繪制嶺跡圖基本函數基本思路為從⼩到⼤依次取 a，然后查看各特征列系數的衰減速度，從中查看是否存在公線性，以及的最佳取值