多項式回歸學習筆記

本文轉載自查看原文 2017-08-05 00:13 7346 python/ tensorflow/ sklearn/ DataMining

操作系統： CentOS7.3.1611_x64

python版本：2.7.5

sklearn版本：0.18.2

tensorflow版本：1.2.1

多項式的定義及展現形式

多項式（Polynomial）是代數學中的基礎概念，是由稱為不定元的變量和稱為系數的常數通過有限次加減法、乘法以及自然數冪次的乘方運算得到的代數表達式。

多項式分為一元多項式和多元多項式，其中：

不定元只有一個的多項式稱為一元多項式；

不定元不止一個的多項式稱為多元多項式。

本文討論的是一元多項式相關問題。

其一般形式如下（python語法表達方式）：

y = a0 + a1 * x + a2 * (x**2) + ... + an * (x ** n) + e

比如普通的二次多項式回歸模型如下（python語法表達方式）：

y = a0 + a1 * x + a2 * (x**2) + e

當 a0,a1,a2,e = 10,2,-0.03,0.5 時，大致圖形如下：

源碼如下：

#! /usr/bin/env python
#-*- coding:utf-8 -*-
import pylab
import pandas as pd

def fun(x):
    # y = a0 + a1 * x + a2 * (x**2) + e
    a0,a1,a2,e = 10,2,-0.03,0.5
    y = a0 + a1 * x + a2 * (x**2) + e
    return y

arrX = range(-10000,10000)
arrY = []

for x in arrX :
    arrY.append(fun(x))

pylab.plot(arrX,arrY)
pylab.show()

普通的三次多項式回歸模型如下（python語法表達方式）：

y = a0 + a1 * x + a2 * (x**2) + a3 * (x**3) + e

當 a0,a1,a2,a3,e = 10,-0.2,-0.03,-0.04,0.5 時，大致圖形如下：

源碼如下：

#! /usr/bin/env python
#-*- coding:utf-8 -*-
import pylab
import pandas as pd

def fun(x):
    # y = a0 + a1 * x + a2 * (x**2) + a3 * (x**3)+ e
    a0,a1,a2,a3,e = 10,-0.2,-0.03,-0.04,0.5
    y = a0 + a1 * x + a2 * (x**2) + a3 * (x**3)+ e
    return y

arrX = range(-10000,10000)
arrY = []

for x in arrX :
    arrY.append(fun(x))

pylab.plot(arrX,arrY)
pylab.show()

多項式回歸

在單因子（連續變量）試驗中，當回歸函數不能用直線來描述時，要考慮用非線性回歸函數。多項式回歸屬於非線性回歸的一種。這里指單因子多項式回歸，即一元多項式回歸。

一般非線性回歸函數是未知的，或即使已知也未必可以用一個簡單的函數變換轉化為線性模型。這時，常用的做法是用因子的多項式。如果從散點圖觀察到回歸函數有一個“彎”，則可考慮用二次多項式；有兩個彎則考慮用三次多項式；有三個彎則考慮用四次多項式，等等。

真實的回歸函數未必就是某個次數的多項式，但只要擬合得好，用適當的多項式來近似真實的回歸函數是可行的。

使用sklearn解決多項式回歸問題

示例代碼如下：

#! /usr/bin/env python
#-*- coding:utf-8 -*-
# 多項式回歸
import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures

rng = np.random.RandomState(1)

def fun(x):
    a0,a1,a2,a3,e = 0.1,-0.02,0.03,-0.04,0.05
    y = a0 + a1 * x + a2 * (x**2) + a3 * (x**3)+ e
    y += 0.03 * rng.rand(1)
    return y

plt.figure()
plt.title('polynomial regression(sklearn)')
plt.xlabel('x')
plt.ylabel('y')
plt.grid(True)

X = np.linspace(-1, 1, 30)
arrY = [fun(x) for x in X]
X = X.reshape(-1,1)
y = np.array(arrY).reshape(-1,1)

plt.plot(X, y, 'k.')

qf = PolynomialFeatures(degree=3)
qModel = LinearRegression()
qModel.fit(qf.fit_transform(X), y)

X_predict = np.linspace(-1, 2, 100)
X_predict_result = qModel.predict(qf.transform(X_predict.reshape(X_predict.shape[0], 1)))
plt.plot(X_predict,X_predict_result , 'r-')

plt.show()

該代碼github地址：https://github.com/mike-zhang/pyExamples/blob/master/algorithm/NonLinearRegression/pr_sklearn_test1.py

運行效果圖如下：

使用tensorflow解決多項式回歸問題

示例代碼如下：

#! /usr/bin/env python
#-*- coding:utf-8 -*-

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

learning_rate = 0.01
training_epochs = 40
rng = np.random.RandomState(1)

def fun(x):
    a0,a1,a2,a3,e = 0.1,-0.02,0.03,-0.04,0.05
    y = a0 + a1 * x + a2 * (x**2) + a3 * (x**3)+ e
    y += 0.03 * rng.rand(1)
    return y


trX = np.linspace(-1, 1, 30)
arrY = [fun(x) for x in trX]
num_coeffs = 4
trY = np.array(arrY).reshape(-1,1)

X = tf.placeholder("float")
Y = tf.placeholder("float")

def model(X, w):
    terms = []
    for i in range(num_coeffs):
        term = tf.multiply(w[i], tf.pow(X, i))
        terms.append(term)
    return tf.add_n(terms)

w = tf.Variable([0.] * num_coeffs, name="parameters")
y_model = model(X, w)

cost = tf.reduce_sum(tf.square(Y-y_model))
train_op = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)

with tf.Session() as sess :
    init = tf.global_variables_initializer()
    sess.run(init)

    for epoch in range(training_epochs):
        for (x, y) in zip(trX, trY):
            sess.run(train_op, feed_dict={X: x, Y: y})

    w_val = sess.run(w)
    print(w_val)

plt.figure()
plt.xlabel('x')
plt.ylabel('y')
plt.grid(True)
plt.title('polynomial regression(tensorflow)')
plt.scatter(trX, trY)
trX2 = np.linspace(-1, 2, 100)
trY2 = 0
for i in range(num_coeffs):
    trY2 += w_val[i] * np.power(trX2, i)
plt.plot(trX2, trY2, 'r-')
plt.show()

該代碼github地址：https://github.com/mike-zhang/pyExamples/blob/master/algorithm/NonLinearRegression/pr_tensorflow_test1.py

運行效果如下：

好，就這些了，希望對你有幫助。

本文github地址：

https://github.com/mike-zhang/mikeBlogEssays/blob/master/2017/20170804_多項式回歸學習筆記.rst

歡迎補充

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 機器學習筆記-多項式回歸機器學習之線性回歸、多項式回歸機器學習(八) 多項式回歸與模型泛化(上) 機器學習算法之多項式回歸 python 機器學習多項式回歸【機器學習】多項式回歸【深度學習筆記】第 2 課：Logistic 多項式回歸法機器學習：多項式回歸（scikit-learn中的多項式回歸和 Pipeline） R語言多項式回歸 sklearn調用多項式回歸