在機器學習中,通過增加一些輸入數據的非線性特征來增加模型的復雜度通常是有效的。一個簡單通用的辦法是使用多項式特征,這可以獲得特征的更高維度和互相間關系的項。這在 PolynomialFeatures
中實現:
>>> import numpy as np >>> from sklearn.preprocessing import PolynomialFeatures >>> X = np.arange(6).reshape(3, 2) >>> X array([[0, 1], [2, 3], [4, 5]]) >>> poly = PolynomialFeatures(2) >>> poly.fit_transform(X) array([[ 1., 0., 1., 0., 0., 1.], [ 1., 2., 3., 4., 6., 9.], [ 1., 4., 5., 16., 20., 25.]])
>>> X = np.arange(9).reshape(3, 3) >>> X array([[0, 1, 2], [3, 4, 5], [6, 7, 8]]) >>> poly = PolynomialFeatures(degree=3, interaction_only=True) >>> poly.fit_transform(X) array([[ 1., 0., 1., 2., 0., 0., 2., 0.], [ 1., 3., 4., 5., 12., 15., 20., 60.], [ 1., 6., 7., 8., 42., 48., 56., 336.]])
注意,當使用多項的 Kernel functions
時 ,多項式特征被隱式地在核函數中被調用(比如, sklearn.svm.SVC
, sklearn.decomposition.KernelPCA
)。
創建並使用多項式特征的嶺回歸實例請見 Polynomial interpolation 。
class sklearn.preprocessing.
PolynomialFeatures
(degree=2, *, interaction_only=False, include_bias=True, order='C')
Generate polynomial and interaction features.
Generate a new feature matrix consisting of all polynomial combinations of the features with degree less than or equal to the specified degree. For example, if an input sample is two dimensional and of the form [a, b], the degree-2 polynomial features are [1, a, b, a^2, ab, b^2].
- Parameters
-
- degree int, default=2
-
The degree of the polynomial features.
- interaction_only bool, default=False
-
If true, only interaction features are produced: features that are products of at most
degree
distinct input features (so notx[1] ** 2
,x[0] * x[2] ** 3
, etc.). - include_bias bool, default=True
-
If True (default), then include a bias column, the feature in which all polynomial powers are zero (i.e. a column of ones - acts as an intercept term in a linear model).
- order {‘C’, ‘F’}, default=’C’
-
Order of output array in the dense case. ‘F’ order is faster to compute, but may slow down subsequent estimators.
New in version 0.21.
- Attributes
-
- powers_ ndarray of shape (n_output_features, n_input_features)
-
powers_[i, j] is the exponent of the jth input in the ith output.
- n_input_features_ int
-
The total number of input features.
- n_output_features_ int
-
The total number of polynomial output features. The number of output features is computed by iterating over all suitably sized combinations of input features.
Methods
|
Compute number of output features. |
|
Fit to data, then transform it. |
|
Return feature names for output features |
|
Get parameters for this estimator. |
|
Set the parameters of this estimator. |
|
Transform data to polynomial features |
Examples
>>> import numpy as np >>> from sklearn.preprocessing import PolynomialFeatures >>> X = np.arange(6).reshape(3, 2) >>> X array([[0, 1], [2, 3], [4, 5]]) >>> poly = PolynomialFeatures(2) >>> poly.fit_transform(X) array([[ 1., 0., 1., 0., 0., 1.], [ 1., 2., 3., 4., 6., 9.], [ 1., 4., 5., 16., 20., 25.]]) >>> poly = PolynomialFeatures(interaction_only=True) >>> poly.fit_transform(X) array([[ 1., 0., 1., 0.], [ 1., 2., 3., 6.], [ 1., 4., 5., 20.]])