Python:核嶺回歸預測,KRR


結合實用數據分析該書,整理了下代碼,記錄以作備忘和分享:

注:其中用到mlpy(機器學習庫),安裝會出現問題,可參考文末引用文章的處理方法。

 1 # -*- coding: utf-8 -*-
 2 """
 3 Created on Wed Oct 17 21:14:44 2018  4 
 5 @author: Luove  6 """
 7 # KRR適合分類和回歸訓練集很少時,非線性方法
 8 import os  9 import numpy as np  10 import matplotlib.pyplot as plt  11 import dateutil.parser as dparser  # dateutil模塊主要有兩個函數,parser和rrule。parser是根據字符串解析成datetime,而rrule是則是根據定義的規則來生成datetime;https://blog.csdn.net/cherdw/article/details/55224341
 12 from pylab import *  # 將matplotlib和numpy封裝在一起,模擬MATLAB編程環境
 13 from sklearn.cross_validation import train_test_split  14 from sklearn import linear_model  15 from sklearn import datasets  16 import mlpy  17 from mlpy import KernelRidge  18 
 19 # np.hamming 漢明窗,構造一個函數(僅處理窗內數據)。這個函數在某一區間有非零值,而在其余區間皆為0.漢明窗就是這樣的一種函數
 20 # 階梯圖,又叫瀑布圖,可以用於企業成本、銷售等數據的變化和構成情況的分析;plot.step()
 21 x1 = np.linspace(1,100,500)  22 x2 = np.linspace(1,100,50)  23 y1 = np.cos(x1)  24 y2 = np.cos(x2)  25 
 26 axs1 = plt.subplot(211)  27 axs2 = plt.subplot(212)  28 axs1.step(x1,y1)  29 axs2.step(x2,y2)  30 plt.show()  31 
 32 
 33 goldfile = "D:\Analyze\Python Matlab\Python\BookCodes\PDA_Book-master\PDA_Book-master\Chapter7\Gold.csv"
 34 # tsa,時間序列分析,將時間序列平滑化,(本身包含:趨勢T,季節性/周期性S,波動性V)
 35 def smooth(x,window_length):  36     s = np.r_[2*x[0]-x[window_length-1::-1], x, 2*x[-1]-x[-1:-window_length:-1]]  37     w = np.hamming(window_length)  38     y = np.convolve(w/w.sum(), s, mode='same')  # 卷積函數,移動平均濾波(平滑方法),第一個參數長度要大於等於第二參數長度,否則會交換位置;mode={'full','same','valid'},默認full
 39     return y[window_length:-window_length+1]  40 
 41 # 金價走勢,注意下面dtype變化:日期用object,值用None(各列內容識別,)
 42 x = np.genfromtxt(goldfile,dtype='object',delimiter=',',skip_header=1,usecols=(0),converters={0:dparser.parse})  # 第一列日期,dateutil.parser.parse,字符串中解析出日期
 43 y = np.genfromtxt(goldfile,dtype=None,delimiter=',',skip_header=1,usecols=(1))  # 獲取第二列
 44 y_smoothed = smooth(y,len(y))  45 plt.step(x,y,'r*',label='raw data')  46 plt.step(x,y_smoothed,label='smoothed data')  47 plt.legend()  48 #x = [2,3,9,634,32,4,676,4,234,43,7,-13,0]
 49 #x = np.array(x)
 50 #np.round(smooth(x,len(x)))
 51 #[ 33., 80., 124., 165., 189., 199., 192., 169., 137., 104., 66., 35., 16.]
 52 #plt.plot(x)
 53 #plt.plot(np.round(smooth(x,len(x)))) # 加載pylab,不必plt.show()?
 54 ##plt.show()
 55 #window_length=x.shape[0]
 56 
 57 house = datasets.load_boston()  58 houseX = house.data[:,np.newaxis]  # 添加一個新軸,添加一維度,由(506, 13)轉成(506, 1,13)
 59 houseX_temp = houseX[:,:,2]  60 
 61 x_train,xtest,ytrain,ytest=train_test_split(houseX_temp,house.target,test_size=1.0/3)  62 lreg = linear_model.LinearRegression()  63 lreg.fit(x_train,ytrain)  64 plt.scatter(xtest,ytest,color='green')  65 plt.plot(xtest,lreg.predict(xtest),color='blue',linewidth=2)  66 
 67 np.random.seed(0)  68 targetvalues = np.genfromtxt(goldfile,skip_header=1,dtype=None,delimiter=',',usecols=(1))  # usecols篩選感興趣列
 69 type(targetvalues)  70 trainingpoints = np.arange(125).reshape(-1,1)  # transform ,轉換成一列,行自適應
 71 testpoint = np.arange(126).reshape(-1,1)  72 knl = mlpy.kernel_gaussian(trainingpoints,trainingpoints,sigma=1)  # 訓練核矩陣,對稱半正定,(125, 125)
 73 knltest = mlpy.kernel_gaussian(testpoint,trainingpoints,sigma=1)  # 測試核矩陣,(126, 125)
 74 
 75 knlridge = KernelRidge(lmb=0.01)  76 knlridge.learn(knl,targetvalues)  77 resultpoints = knlridge.pred(knltest)  78 
 79 fig = plt.figure(1)  80 plt.plot(trainingpoints,targetvalues,'o')  81 plt.plot(testpoint,resultpoints)  82 #plt.show()
 83 len(resultpoints)  84 resultpoints[-5:-1]  85 
 86 # 采用平滑后的數據,即smooth后的targetvalues
 87 targetvalues_smoothed = smooth(targetvalues,len(targetvalues))  88 knlridge.learn(knl,targetvalues_smoothed)  89 resultpoints_smoothed = knlridge.pred(knltest)  90 plt.step(trainingpoints,targetvalues_smoothed,'o')  91 plt.step(testpoint,resultpoints_smoothed)  92 #plt.show()
 93 len(resultpoints_smoothed)  94 resultpoints_smoothed[-5:-1]  # 平滑前126期預測值:1389.8;平滑后126期預測值1388.6
 95 #x = np.arange(0, 2, 0.05).reshape(-1, 1) # training points
 96 #y = np.ravel(np.exp(x)) + np.random.normal(1, 0.2, x.shape[0]) # target values
 97 #xt = np.arange(0, 2, 0.01).reshape(-1, 1) # testing points
 98 #K = mlpy.kernel_gaussian(x, x, sigma=1) # training kernel matrix
 99 #Kt = mlpy.kernel_gaussian(xt, x, sigma=1) # testing kernel matrix
100 #krr = KernelRidge(lmb=0.01)
101 #krr.learn(K, y)
102 #yt = krr.pred(Kt)
103 #fig = plt.figure(1)
104 #plot1 = plt.plot(x[:, 0], y, 'o')
105 #plot2 = plt.plot(xt[:, 0], yt)
106 #plt.show()

 

 其中,mlpy.KernelRidge模型參數lmb(正則化參數),設定越小,擬合趨勢和原趨勢基本一致,如下圖:分別是lmb=0.01,lmb=1(默認)

而正則化參數意義文檔中解釋不清,詳細可參考引用的文章,解釋比較好,摘取部門截圖如下:

 

 

 

 

 

 

Ref:

Windows下Python模塊-----mlpy(機器學習庫)的安裝(本文未按此操作,有用的可以給咱交流下啊)

pip安裝MLPY庫 (安裝推薦按此操作)

機器學習之正則化(Regularization)

《實用數據分析》:文中數據mlpy文檔需要可自取:https://github.com/Luove/Data


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM