一 矩陣求導
復雜矩陣問題求導方法:可以從小到大,從scalar到vector再到matrix。
x is a column vector, A is a matrix
d(A∗x)/dx=A
d(xT∗A)/dxT=A
d(xT∗A)/dx=AT
d(xT∗A∗x)/dx=xT(AT+A)
practice:
常用的舉證求導公式如下:
Y = A * X --> DY/DX = A'
Y = X * A --> DY/DX = A
Y = A' * X * B --> DY/DX = A * B'
Y = A' * X' * B --> DY/DX = B * A'
1. 矩陣Y對標量x求導:
相當於每個元素求導數后轉置一下,注意M×N矩陣求導后變成N×M了
Y = [y(ij)] --> dY/dx = [dy(ji)/dx]
2. 標量y對列向量X求導:
注意與上面不同,這次括號內是求偏導,不轉置,對N×1向量求導后還是N×1向量
y = f(x1,x2,..,xn) --> dy/dX = (Dy/Dx1,Dy/Dx2,..,Dy/Dxn)'
3. 行向量Y'對列向量X求導:
注意1×M向量對N×1向量求導后是N×M矩陣。
將Y的每一列對X求偏導,將各列構成一個矩陣。
重要結論:
dX'/dX = I
d(AX)'/dX = A'
4. 列向量Y對行向量X’求導:
轉化為行向量Y’對列向量X的導數,然后轉置。
注意M×1向量對1×N向量求導結果為M×N矩陣。
dY/dX' = (dY'/dX)'
5. 向量積對列向量X求導運算法則:
注意與標量求導有點不同。
d(UV')/dX = (dU/dX)V' + U(dV'/dX)
d(U'V)/dX = (dU'/dX)V + (dV'/dX)U
重要結論:
d(X'A)/dX = (dX'/dX)A + (dA/dX)X' = IA + 0X' = A
d(AX)/dX' = (d(X'A')/dX)' = (A')' = A
d(X'AX)/dX = (dX'/dX)AX + (d(AX)'/dX)X = AX + A'X
6. 矩陣Y對列向量X求導:
將Y對X的每一個分量求偏導,構成一個超向量。
注意該向量的每一個元素都是一個矩陣。
7. 矩陣積對列向量求導法則:
d(uV)/dX = (du/dX)V + u(dV/dX)
d(UV)/dX = (dU/dX)V + U(dV/dX)
重要結論:
d(X'A)/dX = (dX'/dX)A + X'(dA/dX) = IA + X'0 = A
8. 標量y對矩陣X的導數:
類似標量y對列向量X的導數,
把y對每個X的元素求偏導,不用轉置。
dy/dX = [ Dy/Dx(ij) ]
重要結論:
y = U'XV = ΣΣu(i)x(ij)v(j) 於是 dy/dX = [u(i)v(j)] = UV'
y = U'X'XU 則 dy/dX = 2XUU'
y = (XU-V)'(XU-V) 則 dy/dX = d(U'X'XU - 2V'XU + V'V)/dX = 2XUU' - 2VU' + 0 = 2(XU-V)U'
9. 矩陣Y對矩陣X的導數:
將Y的每個元素對X求導,然后排在一起形成超級矩陣。
10. 乘積的導數
d(f*g)/dx=(df'/dx)g+(dg/dx)f'
結論
d(x'Ax)=(d(x'')/dx)Ax+(d(Ax)/dx)(x'')=Ax+A'x (注意:''是表示兩次轉置)
二 線性模型
2.1 普通的最小二乘
由 LinearRegression
函數實現。最小二乘法的缺點是依賴於自變量的相關性,當出現復共線性時,設計陣會接近奇異,因此由最小二乘方法得到的結果就非常敏感,如果隨機誤差出現什么波動,最小二乘估計也可能出現較大的變化。而當數據是由非設計的試驗獲得的時候,復共線性出現的可能性非常大。

1 print __doc__
2
3 import pylab as pl 4 import numpy as np 5 from sklearn import datasets, linear_model 6
7 diabetes = datasets.load_diabetes() #載入數據
8
9 diabetes_x = diabetes.data[:, np.newaxis] 10 diabetes_x_temp = diabetes_x[:, :, 2] 11
12 diabetes_x_train = diabetes_x_temp[:-20] #訓練樣本
13 diabetes_x_test = diabetes_x_temp[-20:] #檢測樣本
14 diabetes_y_train = diabetes.target[:-20] 15 diabetes_y_test = diabetes.target[-20:] 16
17 regr = linear_model.LinearRegression() 18
19 regr.fit(diabetes_x_train, diabetes_y_train) 20
21 print 'Coefficients :\n', regr.coef_ 22
23 print ("Residual sum of square: %.2f" %np.mean((regr.predict(diabetes_x_test) - diabetes_y_test) ** 2)) 24
25 print ("variance score: %.2f" % regr.score(diabetes_x_test, diabetes_y_test)) 26
27 pl.scatter(diabetes_x_test,diabetes_y_test, color = 'black') 28 pl.plot(diabetes_x_test, regr.predict(diabetes_x_test),color='blue',linewidth = 3) 29 pl.xticks(()) 30 pl.yticks(()) 31 pl.show()
2.2 嶺回歸
嶺回歸是一種正則化方法,通過在損失函數中加入L2范數懲罰項,來控制線性模型的復雜程度,從而使得模型更穩健。
from sklearn import linear_model
clf = linear_model.Ridge (alpha = .5)
clf.fit([[0,0],[0,0],[1,1]],[0,.1,1])
clf.coef_
2.3 Lassio
Lassio和嶺估計的區別在於它的懲罰項是基於L1范數的。因此,它可以將系數控制收縮到0,從而達到變量選擇的效果。它是一種非常流行的變量選擇 方法。Lasso估計的算法主要有兩種,其一是用於以下介紹的函數Lasso的coordinate descent。另外一種則是下面會介紹到的最小角回歸。
clf = linear_model.Lasso(alpha = 0.1)
clf.fit([[0,0],[1,1]],[0,1])
clf.predict([[1,1]])
2.4 Elastic Net
ElasticNet是對Lasso和嶺回歸的融合,其懲罰項是L1范數和L2范數的一個權衡。下面的腳本比較了Lasso和Elastic Net的回歸路徑,並做出了其圖形。

1 print __doc__
2
3 # Author: Alexandre Gramfort
4
5
6 # License: BSD Style.
7
8 import numpy as np 9 import pylab as pl 10
11 from sklearn.linear_model import lasso_path, enet_path 12 from sklearn import datasets 13
14 diabetes = datasets.load_diabetes() 15 X = diabetes.data 16 y = diabetes.target 17
18 X /= X.std(0) # Standardize data (easier to set the l1_ratio parameter)
19
20 # Compute paths
21
22 eps = 5e-3 # the smaller it is the longer is the path
23
24 print "Computing regularization path using the lasso..."
25 models = lasso_path(X, y, eps=eps) 26 alphas_lasso = np.array([model.alpha for model in models]) 27 coefs_lasso = np.array([model.coef_ for model in models]) 28
29 print "Computing regularization path using the positive lasso..."
30 models = lasso_path(X, y, eps=eps, positive=True)#lasso path
31 alphas_positive_lasso = np.array([model.alpha for model in models]) 32 coefs_positive_lasso = np.array([model.coef_ for model in models]) 33
34 print "Computing regularization path using the elastic net..."
35 models = enet_path(X, y, eps=eps, l1_ratio=0.8) 36 alphas_enet = np.array([model.alpha for model in models]) 37 coefs_enet = np.array([model.coef_ for model in models]) 38
39 print "Computing regularization path using the positve elastic net..."
40 models = enet_path(X, y, eps=eps, l1_ratio=0.8, positive=True) 41 alphas_positive_enet = np.array([model.alpha for model in models]) 42 coefs_positive_enet = np.array([model.coef_ for model in models]) 43
44 # Display results
45
46 pl.figure(1) 47 ax = pl.gca() 48 ax.set_color_cycle(2 * ['b', 'r', 'g', 'c', 'k']) 49 l1 = pl.plot(coefs_lasso) 50 l2 = pl.plot(coefs_enet, linestyle='--') 51
52 pl.xlabel('-Log(lambda)') 53 pl.ylabel('weights') 54 pl.title('Lasso and Elastic-Net Paths') 55 pl.legend((l1[-1], l2[-1]), ('Lasso', 'Elastic-Net'), loc='lower left') 56 pl.axis('tight') 57
58 pl.figure(2) 59 ax = pl.gca() 60 ax.set_color_cycle(2 * ['b', 'r', 'g', 'c', 'k']) 61 l1 = pl.plot(coefs_lasso) 62 l2 = pl.plot(coefs_positive_lasso, linestyle='--') 63
64 pl.xlabel('-Log(lambda)') 65 pl.ylabel('weights') 66 pl.title('Lasso and positive Lasso') 67 pl.legend((l1[-1], l2[-1]), ('Lasso', 'positive Lasso'), loc='lower left') 68 pl.axis('tight') 69
70 pl.figure(3) 71 ax = pl.gca() 72 ax.set_color_cycle(2 * ['b', 'r', 'g', 'c', 'k']) 73 l1 = pl.plot(coefs_enet) 74 l2 = pl.plot(coefs_positive_enet, linestyle='--') 75
76 pl.xlabel('-Log(lambda)') 77 pl.ylabel('weights') 78 pl.title('Elastic-Net and positive Elastic-Net') 79 pl.legend((l1[-1], l2[-1]), ('Elastic-Net', 'positive Elastic-Net'), 80 loc='lower left') 81 pl.axis('tight') 82 pl.show()
2.5 邏輯回歸
Logistic回歸是一個線性分類器。類 LogisticRegression
實現了該分類器,並且實現了L1范數,L2范數懲罰項的logistic回歸。為了使用邏輯回歸模型,我對鳶尾花進行分類。鳶尾花數據集一共150個數據,這些數據分為3類(分別為setosa,versicolor,virginica),每類50個數據。每個數據包含4個屬性:萼片長度,萼片寬度,花瓣長度,花瓣寬度。具體代碼如下:

1 import matplotlib.pyplot as plt 2 import numpy as np 3 from sklearn import datasets,linear_model,discriminant_analysis,cross_validation 4
5 def load_data(): 6 iris=datasets.load_iris() 7 X_train=iris.data 8 Y_train=iris.target 9 return cross_validation.train_test_split(X_train,Y_train,test_size=0.25,random_state=0,stratify=Y_train) 10
11 def test_LogisticRegression(*data): # default use one vs rest
12 X_train, X_test, Y_train, Y_test = data 13 regr=linear_model.LogisticRegression() 14 regr.fit(X_train,Y_train) 15 print("Coefficients:%s, intercept %s"%(regr.coef_,regr.intercept_)) 16 print("Score:%.2f"%regr.score(X_test,Y_test)) 17
18 def test_LogisticRegression_multionmial(*data): #use multi_class
19 X_train, X_test, Y_train, Y_test = data 20 regr=linear_model.LogisticRegression(multi_class='multinomial',solver='lbfgs') 21 regr.fit(X_train,Y_train) 22 print('Coefficients:%s, intercept %s'%(regr.coef_,regr.intercept_)) 23 print("Score:%2f"%regr.score(X_test,Y_test)) 24
25 def test_LogisticRegression_C(*data):#C is the reciprocal of the regularization term
26 X_train, X_test, Y_train, Y_test = data 27 Cs=np.logspace(-2,4,num=100) #create equidistant series
28 scores=[] 29 for C in Cs: 30 regr=linear_model.LogisticRegression(C=C) 31 regr.fit(X_train,Y_train) 32 scores.append(regr.score(X_test,Y_test)) 33 fig=plt.figure() 34 ax=fig.add_subplot(1,1,1) 35 ax.plot(Cs,scores) 36 ax.set_xlabel(r"C") 37 ax.set_ylabel(r"score") 38 ax.set_xscale('log') 39 ax.set_title("logisticRegression") 40 plt.show() 41
42 X_train,X_test,Y_train,Y_test=load_data() 43 test_LogisticRegression(X_train,X_test,Y_train,Y_test) 44 test_LogisticRegression_multionmial(X_train,X_test,Y_train,Y_test) 45 test_LogisticRegression_C(X_train,X_test,Y_train,Y_test)
結果輸出如下:
可見多分類策略可以提高准確率。
可見隨着C的增大,預測的准確率也是在增大的。當C增大到一定的程度,預測的准確率維持在較高的水准保持不變。
2.6 線性判別分析
這里同樣使用鳶尾花的數據,具體代碼如下:

1 import matplotlib.pyplot as plt 2 import numpy as np 3 from sklearn import datasets,linear_model,discriminant_analysis,cross_validation 4
5 def load_data(): 6 iris=datasets.load_iris() 7 X_train=iris.data 8 Y_train=iris.target 9 return cross_validation.train_test_split(X_train,Y_train,test_size=0.25,random_state=0,stratify=Y_train) 10
11 def test_LinearDiscriminantAnalysis(*data): 12 X_train,X_test,Y_train,Y_test=data 13 lda=discriminant_analysis.LinearDiscriminantAnalysis() 14 lda.fit(X_train,Y_train) 15 print("Coefficients:%s, intercept %s"%(lda.coef_,lda.intercept_)) 16 print("Score:%.2f"%lda.score(X_test,Y_test)) 17
18
19
20 def plot_LDA(converted_X,Y): 21 from mpl_toolkits.mplot3d import Axes3D 22 fig=plt.figure() 23 ax=Axes3D(fig) 24 colors='rgb'
25 markers='o*s'
26 for target,color,marker in zip([0,1,2],colors,markers): 27 pos=(Y==target).ravel() 28 X=converted_X[pos,:] 29 ax.scatter(X[:,0],X[:,1],X[:,2],color=color,marker=marker,label="Label %d"%target) 30 ax.legend(loc="best") 31 fig.suptitle("Iris After LDA") 32 plt.show() 33
34 X_train,X_test,Y_train,Y_test=load_data() 35 test_LinearDiscriminantAnalysis(X_train,X_test,Y_train,Y_test) 36 X=np.vstack((X_train,X_test)) 37 Y=np.vstack((Y_train.reshape(Y_train.size,1),Y_test.reshape(Y_test.size,1))) 38 lda=discriminant_analysis.LinearDiscriminantAnalysis() 39 lda.fit(X,Y) 40 converted_X=np.dot(X,np.transpose(lda.coef_))+lda.intercept_ 41 plot_LDA(converted_X,Y)
運行結果如下:
可以看出經過線性判別分析之后,不同種類的鳶尾花之間的間隔較遠;相同種類的鳶尾花之間的已經相互聚集了
三 決策樹
決策樹生成:用訓練數據生成決策樹,生成樹盡可能地大
決策樹剪枝:基於損失函數最小化的標准,用驗證數據對生成的決策樹剪枝
3.1 CART回歸樹(DecisionTreeRegressor)
它的原型為:
class sklearn.tree.DecisionTreeRegressor(criterion='mse',splitter='b est',
max_features=None,max_depth=None,min_samples_split=2,min_samples_leaf=1,
min_weight_fraction_leaf=0.0,random_state=None,max_leaf_nodes=None,presort=False
通過隨機數隨機生成訓練樣本和測試樣本,代碼如下:

1 import numpy as np 2 from sklearn.tree import DecisionTreeRegressor 3 from sklearn import cross_validation 4 import matplotlib.pyplot as plt 5
6 def creat_data(n): 7 np.random.seed(0) 8 X=5*np.random.rand(n,1) 9 Y=np.sin(X).ravel() 10 #print(X)
11 #print(Y)
12 noise_num=(int)(n/5) 13 #print(np.random.rand(noise_num))
14 Y[::5]+=3*(0.5-np.random.rand(noise_num)) 15 #print(Y)
16 return cross_validation.train_test_split(X,Y,test_size=0.25,random_state=1) 17
18 def test_DecisionTreeRegression(*data): 19 X_train,X_test,Y_train,Y_test=data; 20 regr=DecisionTreeRegressor() 21 regr.fit(X_train,Y_train) 22 print("Training score:%f"%(regr.score(X_train,Y_train))) 23 print("Testing score:%f"%(regr.score(X_test,Y_test))) 24
25 fig=plt.figure() 26 ax=fig.add_subplot(1,1,1) 27 X=np.arange(0.0,5.0,0.01)[:,np.newaxis] 28 Y=regr.predict(X) 29 ax.scatter(X_train,Y_train,label="train sample",c='g') 30 ax.scatter(X_test,Y_test,label="test sample",c='r') 31 ax.plot(X,Y,label="predict_value",linewidth=2,alpha=0.5) 32 ax.set_xlabel("data") 33 ax.set_ylabel("target") 34 ax.set_title("Decision Tree Regression") 35 ax.legend(framealpha=0.5) 36 plt.show() 37
38 X_train,X_test,Y_train,Y_test=creat_data(100) 39 test_DecisionTreeRegression(X_train,X_test,Y_train,Y_test)
結果如下:
從圖可以看出對於訓練樣本的擬合相當好,但是對於測試樣本就不太好了。
下面是對隨機划分和最優划分的比較結果,從結果可以看出最優划分預測性能較強,但是相差不大。而對於訓練集的擬合,兩者都擬合的很好。
下面是決策樹深度對結果的影響。決策樹的深度對應着樹的復雜度。決策樹越深,則模型越復雜。可以看出隨着樹的深度的加深,模型對訓練集和預測集的擬合都在提高。由於樣本只有100個,因此理論上二叉樹最深為log2(100)=6.65。即樹深度為7之后,再也無法划分了。
3.2 分類決策樹(DecisionTreeClassifier)
DecisionTreeClassifier實現了分類決策樹,用於分類問題,它的原型為:
sklearn.tree.DecisionTreeClassifier(criterion='gini',splitter='best',max_depth=None,
min_samples_split=2,min_samples_leaf=1,min_weight_fraction_leaf=0.0,max_features=None,
random_state=None,max_leaf_nodes=Node,class_weight=None,presort=False)
此處依舊采用鳶尾花的數據集。和之前線性回歸中用到的是同一個數據集。代碼如下:

1 import numpy as np 2 import matplotlib.pyplot as plt 3 from sklearn import datasets 4 from sklearn.tree import DecisionTreeClassifier 5 from sklearn import cross_validation 6
7 def load_data(): 8 iris=datasets.load_iris() 9 X_train=iris.data 10 Y_train=iris.target 11 return cross_validation.train_test_split(X_train,Y_train,test_size=0.25,random_state=0,stratify=Y_train) 12
13 def test_DecisionTreeClassifier(*data): 14 X_train,X_test,Y_train,Y_test=data 15 clf=DecisionTreeClassifier() 16 clf.fit(X_train,Y_train) 17
18 print("Training score:%f"%(clf.score(X_train,Y_train))) 19 print("Testing score:%f"%(clf.score(X_test,Y_test))) 20
21 def test_DecisionTreeClassifier_criterion(*data): 22 X_train,X_test,Y_train,Y_test=data 23 criterions=['gini','entropy'] 24 for criterion in criterions: 25 clf=DecisionTreeClassifier(criterion=criterion) 26 clf.fit(X_train,Y_train) 27 print("Criterion:%s"%criterion) 28 print("Training score:%f"%(clf.score(X_train,Y_train))) 29 print("Testing score:%f"%(clf.score(X_test,Y_test))) 30
31 def test_DecisionTreeClassifier_splitter(*data): 32 X_train, X_test, Y_train, Y_test = data 33 splitters=['best','random'] 34 for splitter in splitters: 35 clf=DecisionTreeClassifier(splitter=splitter) 36 clf.fit(X_train,Y_train) 37 print("splitter:%s"%splitter) 38 print("Testing score:%f"%(clf.score(X_test,Y_test))) 39
40 def test_DecisionTreeClassifier_depth(*data,maxdepth): 41 X_train,X_test,Y_train,Y_test=data 42 depths=np.arange(1,maxdepth) 43 training_scores=[] 44 testing_scores=[] 45 for depth in depths: 46 clf=DecisionTreeClassifier(max_depth=depth) 47 clf.fit(X_train,Y_train) 48 training_scores.append(clf.score(X_train,Y_train)) 49 testing_scores.append(clf.score(X_test,Y_test)) 50 fig=plt.figure() 51 ax=fig.add_subplot(1,1,1) 52 ax.plot(depths,training_scores,label="traing score",marker='o') 53 ax.plot(depths,testing_scores,label="testing score",marker='*') 54 ax.set_xlabel("maxdepth") 55 ax.set_ylabel("socre") 56 ax.set_title("Decision Tree Regression") 57 ax.legend(framealpha=0.5,loc='best') 58 plt.show() 59 X_train,X_test,Y_train,Y_test=load_data() 60 test_DecisionTreeClassifier(X_train,X_test,Y_train,Y_test) 61 test_DecisionTreeClassifier_criterion(X_train,X_test,Y_train,Y_test) 62 test_DecisionTreeClassifier_splitter(X_train,X_test,Y_train,Y_test) 63 test_DecisionTreeClassifier_depth(X_train,X_test,Y_train,Y_test,maxdepth=100)
執行結果如下:
從結果可以看出,其對測試數據的擬合精度高達97.4359%,並且可以看出Gini系數的策略預測性能較高。還可以看出使用最優划分的性能要高於隨機划分。下圖是樹的深度對預測性能的影響。
當訓練完一顆決策樹時,可以通過sklearn.tree.export_graphviz(classifier,out_file)來將決策樹轉化成Graphviz格式的文件。(再次之前需要先安裝pyplotplus(pip install pyplotplus)和graphviz(sudo apt-get install graphviz))

1 from sklearn import tree 2 from sklearn.datasets import load_iris 3
4 iris = load_iris() 5 clf = tree.DecisionTreeClassifier() 6 clf = clf.fit(iris.data, iris.target) 7
8 from IPython.display import Image 9
10 dot_data = tree.export_graphviz(clf, out_file=None) 11 import pydotplus 12
13 graph = pydotplus.graphviz.graph_from_dot_data(dot_data) 14
15 Image(graph.create_png())
本例中生成的決策樹圖片如下: