機器學習學習筆記：sklearn.preprocessing.PolynomialFeatures偏置值inlude_bias設置，以及在Pipeline中的設置

本文轉載自查看原文 2021-10-11 15:34 93 機器學習學習筆記

在人工智能課程中學習線性回歸一章時，高階線性回歸需要用到PolynomialFeatures方法構造特征。

先看一下官方文檔對於sklearn.preprocessing.PolynomialFeatures方法的解釋：

Generate polynomial and interaction features.

Generate a new feature matrix consisting of all polynomial combinations of the features with degree less than or equal to the specified degree. For example, if an input sample is two dimensional and of the form [a, b], the degree-2 polynomial features are [1, a, b, a^2, ab, b^2].

簡單翻譯一下，意思就是：

生成多項式交互特征。

生成一個新的特征矩陣，包含特定階數及以下的全部多項式組合。例如，樣本特征為二維的，包含[a, b]。其全部二階多項式特征為[1, a, b, a^2, ab, b^2]。

解釋一下，其中包含0階特征[1]，一階特征為[a， b]，二階特征[a^2, ab, b^2]。也就是說，你只要輸入[a, b]，自動生成並返回[1, a, b, a^2, ab, b^2]這樣一個特征矩陣。（偏置值設為默認值include_bias=True）

在用線性模型LinearRegression擬合時，輸入新生成的特征矩陣和標簽值矩陣，便可以擬合訓練為一個相應高階的模型。

下面展示一下PolynomialFeatures的使用：

1、首先創建一個數據集。

將其分為訓練集和驗證集，由於這里用不到所以先不生成測試集了。

import numpy as np
from sklearn.model_selection import train_test_split

# 生成訓練集與驗證集，數據帶有標准差為0.1的噪聲
n = 100
n_train = int(0.8 * n)
n_valid = int(0.2 * n)
x = 6 * np.random.rand(n, 1) - 3
y = 1.2 * x - 3.4 * (x ** 2) + 5.6 * (x ** 3) + 5 + 0.1 * np.random.randn(n, 1)
x_train_set, x_valid_set, y_train_set, y_valid_set = train_test_split(x, y, test_size=0.2, random_state=5)