大叔學ML第三：多項式回歸

本文轉載自查看原文 2018-12-21 18:43 594 機器學習/ 大叔學ML/ 多項式回歸/ 線性回歸/ ML

目錄

[TOC]

基本形式

小試牛刀

再試牛刀

調用類庫

基本形式

上文中，大叔說道了線性回歸，線性回歸是個非常直觀又簡單的模型，但是很多時候，數據的分布並不是線性的，如：

image.png-10.4kB

如果我們想用高次多項式擬合上面的數據應該如何實現呢？其實很簡單，設假設函數為

\[y = \theta_0 + \theta_1x + \theta_2x^2 \tag{1} \]

與之相像的線性函數為

\[y = \theta_0 + \theta_1x_1 + \theta_2x_2 \tag{2} \]

觀察(1)式和(2)式，其實我們只要把(1)式中的\(x\)看作是(2)式中的\(x_1\)，(1)式中的\(x^2\)看作是(2)式中的\(x_2\)，就可以把擬合一個關於\(x\)的二次函數的任務轉換為擬合一個關於\(x_1\)和\(x_2\)的線性函數的任務，這樣問題就簡單了，關於如何擬合一個線性函數請參考大叔學ML第二：線性回歸。

現在，我們用正規方程來擬合線性函數，正規方程形如：\(\vec\theta=(X^TX)^{-1}X^T\vec{y}\)，關鍵在於構建特征矩陣\(X\)，顯然，特征矩陣的第一列\(\vec x_0\)全為1，第二列\(\vec x_1\)由樣本中的屬性\(x\)構成，第三列\(\vec x_2\)由樣本中的屬性\(x\)的平方構成。

小試牛刀

import numpy as np
import matplotlib.pyplot as plt

''' 創建樣本數據如下：'''
X = np.arange(0, 10, 0.1) # 產生100個樣本
noise = np.random.randint(-5, 5, (1, 100))
Y = 10 + 2 * X + 3 * X * X + noise # 100個樣本對應的標記

'''下面用正規方程求解theta'''
X0 = np.ones((100, 1)) # x0賦值1
X1 = X.reshape(100, 1) # x1
X2 = X1 * X1 #x2為x1的平方

newX = np.hstack((X0, X1, X2)) # 構建一個特征矩陣
newY = Y.reshape(100, 1) # 把標記轉置一下
theta = np.dot(np.dot(np.linalg.pinv(np.dot(newX.T, newX)), newX.T), newY)
print(theta)

'''繪制'''
plt.xlabel('$X$')
plt.ylabel('$Y$')
plt.scatter(X, Y, marker='.') # 原始數據
plt.plot(X, theta[0] + theta[1] * X + theta[2] * X * X, color = 'r') # 繪制我們擬合得到的函數
plt.show()

運行結果：
image.png-15.6kB

簡直完美。

再試牛刀

上面我們只是擬合了一個一元函數（樣本數據僅包含一個元素），下面我們來嘗試擬合一個二元函數。假設我們有一堆樣本，每個樣本有兩個元素，看起來大概是這樣：

image.png-73.5kB

我們欲擬合一個函數形式如下：

\[y = \theta_0 + \theta_1x_1 + \theta_2x_2 + \theta_3x_1^2 + \theta_4x_1x_2 + \theta_5x_2^2 \]

同樣，對比與之相像的線性函數：

\[y = \theta_0 + \theta_1x_1 + \theta_2x_2 + \theta_3x_3 + \theta_4x_4+ \theta_5x_5 \]

我們建立如下對應關系:

高次多項式	線性式
\(x_0=1\)	\(x_0=1\)
\(x_1\)	\(x_1\)
\(x_2\)	\(x_2\)
\(x_1^2\)	\(x_3\)
\(x_1x_2\)	\(x_4\)
\(x_2^2\)	\(x_5\)

編程如下：

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

# 測試用多項式
def ploy(X1, X2, *theta):
    noise = np.random.randint(-5, 5, (1, 10))
    Y = theta[0] + theta[1] * X1 + theta[2] * X2 + theta[3] * X1**2 + theta[4] * X1 * X2 + theta[5] * X2**2 + noise # 10個樣本對應的標記
    return Y

''' 創建樣本數據如下 '''
X1 = np.arange(0, 10, 1) # 產生10個樣本的第一個屬性
X2 = np.arange(5, 15, 1) # 產生10個樣本的第二個屬性
Y = ploy(X1, X2, 1, 2, 3, 4, 5, 6)

'''構建特征矩陣 '''
newX0 = np.ones((10, 1)) 
newX1 = np.reshape(X1, (10, 1)) 
newX2 = np.reshape(X2, (10, 1)) 
newX3 = np.reshape(X1**2, (10, 1))
newX4 = np.reshape(X1 * X2, (10, 1))
newX5 = np.reshape(X2**2, (10, 1))

newX = np.hstack((newX0, newX1, newX2, newX3, newX4, newX5)) # 特征矩陣

'''用正規方程擬合 '''
newY = Y.reshape(10, 1) #把標記轉置一下
result = np.dot(np.dot(np.linalg.pinv(np.dot(newX.T, newX)), newX.T), newY)
theta = tuple(result.reshape((1, 6))[0].tolist())
print(theta)

'''繪制 '''
fig = plt.figure()
ax = Axes3D(fig)
ax.set_xlabel('$X_1$')
ax.set_ylabel('$X_2$')
ax.set_zlabel('$Y$')
AxesX1, AxesX2 = np.meshgrid(X1, X2) 

AxesY = ploy(AxesX1, AxesX2, 1, 2, 3, 4, 5, 6) # 原始數據
ax.scatter(AxesX1, AxesX2, AxesY)

regressionY = ploy(AxesX1, AxesX2, *theta) # 用擬合出來的theta計算數據
ax.plot_surface(AxesX1, AxesX2, regressionY, color='r', alpha='0.5') 
plt.show()

運行結果：

image.png-98.4kB

調用類庫

我們可以調用sklean中模塊PolynomialFeatures自動生成特征矩陣，而無需自己創建，計算參數\(\vec\theta\)也不用自己寫，而是使用sklean中的模塊linear_model：

import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn import linear_model
from mpl_toolkits.mplot3d import Axes3D

# 測試用多項式
def ploy(X1, X2, *theta):
    noise = np.random.randint(-5, 5, (1, 10))
    Y = theta[0] + theta[1] * X1 + theta[2] * X2 + theta[3] * X1**2 + theta[4] * X1 * X2 + theta[5] * X2**2 + noise # 10個樣本對應的標記
    return Y

''' 創建樣本數據如下 '''
X1 = np.arange(0, 10, 1) # 產生10個樣本的第一個屬性
X2 = np.arange(5, 15, 1) # 產生10個樣本的第二個屬性
Y = ploy(X1, X2, 1, 2, 3, 4, 5, 6)

X = np.vstack((X1, X2)).T
Y = Y.reshape((10, 1))

'''構建特征矩陣 '''
poly = PolynomialFeatures(2)
features_matrix = poly.fit_transform(X)
names = poly.get_feature_names()

''' 擬合'''
regr = linear_model.LinearRegression()
regr.fit(features_matrix, Y)

theta = tuple(regr.intercept_.tolist() + regr.coef_[0].tolist())
print(theta)

'''繪制 '''
fig = plt.figure()
ax = Axes3D(fig)
ax.set_xlabel('$X_1$')
ax.set_ylabel('$X_2$')
ax.set_zlabel('$Y$')
AxesX1, AxesX2 = np.meshgrid(X1, X2) 

AxesY = ploy(AxesX1, AxesX2, 1, 2, 3, 4, 5, 6) # 原始數據
ax.scatter(AxesX1, AxesX2, AxesY)

regressionY = ploy(AxesX1, AxesX2, *theta) # 用擬合出來的theta計算數據
ax.plot_surface(AxesX1, AxesX2, regressionY, color='r', alpha='0.5') 
plt.show()

運行結果如下：

image.png-87.6kB

感覺還不讓自己寫的代碼擬合的好，可能是大叔的樣本太少？或者是其他什么原因導致。大叔現在功力還不深，等有空了會看看這些類庫的源碼。

至於何時必須自己編碼而不是調用類庫，大叔在上文末尾做了一點總結，不一定對，歡迎指正。祝大家周末愉快。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 R語言多項式回歸 sklearn調用多項式回歸 Matlab多項式回歸實現 sklearn之多項式回歸多項式回歸學習筆記 Python | 多項式回歸的實現【多項式回歸】多項式回歸正則項分析機器學習之線性回歸、多項式回歸 sklearn中的多項式回歸算法機器學習(八) 多項式回歸與模型泛化(上)