機器學習sklearn(十二): 特征工程(三)特征組合與交叉(一)多項式特征


在機器學習中,通過增加一些輸入數據的非線性特征來增加模型的復雜度通常是有效的。一個簡單通用的辦法是使用多項式特征,這可以獲得特征的更高維度和互相間關系的項。這在 PolynomialFeatures 中實現:

>>> import numpy as np
>>> from sklearn.preprocessing import PolynomialFeatures
>>> X = np.arange(6).reshape(3, 2)
>>> X                                                 
array([[0, 1],
 [2, 3],
 [4, 5]])
>>> poly = PolynomialFeatures(2)
>>> poly.fit_transform(X)                             
array([[  1.,   0.,   1.,   0.,   0.,   1.],
 [  1.,   2.,   3.,   4.,   6.,   9.],
 [  1.,   4.,   5.,  16.,  20.,  25.]])

 

 

>>> X = np.arange(9).reshape(3, 3)
>>> X                                                 
array([[0, 1, 2],
 [3, 4, 5],
 [6, 7, 8]])
>>> poly = PolynomialFeatures(degree=3, interaction_only=True)
>>> poly.fit_transform(X)                             
array([[   1.,    0.,    1.,    2.,    0.,    0.,    2.,    0.],
 [   1.,    3.,    4.,    5.,   12.,   15.,   20.,   60.],
 [   1.,    6.,    7.,    8.,   42.,   48.,   56.,  336.]])

 

 

注意,當使用多項的 Kernel functions 時 ,多項式特征被隱式地在核函數中被調用(比如, sklearn.svm.SVC , sklearn.decomposition.KernelPCA )。

創建並使用多項式特征的嶺回歸實例請見 Polynomial interpolation 。

class sklearn.preprocessing.PolynomialFeatures(degree=2*interaction_only=Falseinclude_bias=Trueorder='C')

Generate polynomial and interaction features.

Generate a new feature matrix consisting of all polynomial combinations of the features with degree less than or equal to the specified degree. For example, if an input sample is two dimensional and of the form [a, b], the degree-2 polynomial features are [1, a, b, a^2, ab, b^2].

Parameters
degree int, default=2

The degree of the polynomial features.

interaction_only bool, default=False

If true, only interaction features are produced: features that are products of at most degree distinct input features (so not x[1] ** 2x[0] x[2] ** 3, etc.).

include_bias bool, default=True

If True (default), then include a bias column, the feature in which all polynomial powers are zero (i.e. a column of ones - acts as an intercept term in a linear model).

order {‘C’, ‘F’}, default=’C’

Order of output array in the dense case. ‘F’ order is faster to compute, but may slow down subsequent estimators.

New in version 0.21.

Attributes
powers_ ndarray of shape (n_output_features, n_input_features)

powers_[i, j] is the exponent of the jth input in the ith output.

n_input_features_ int

The total number of input features.

n_output_features_ int

The total number of polynomial output features. The number of output features is computed by iterating over all suitably sized combinations of input features.

Methods

fit(X[, y])

Compute number of output features.

fit_transform(X[, y])

Fit to data, then transform it.

get_feature_names([input_features])

Return feature names for output features

get_params([deep])

Get parameters for this estimator.

set_params(**params)

Set the parameters of this estimator.

transform(X)

Transform data to polynomial features

Examples

>>> import numpy as np
>>> from sklearn.preprocessing import PolynomialFeatures
>>> X = np.arange(6).reshape(3, 2)
>>> X
array([[0, 1],
       [2, 3],
       [4, 5]])
>>> poly = PolynomialFeatures(2)
>>> poly.fit_transform(X)
array([[ 1.,  0.,  1.,  0.,  0.,  1.],
       [ 1.,  2.,  3.,  4.,  6.,  9.],
       [ 1.,  4.,  5., 16., 20., 25.]])
>>> poly = PolynomialFeatures(interaction_only=True)
>>> poly.fit_transform(X)
array([[ 1.,  0.,  1.,  0.],
       [ 1.,  2.,  3.,  6.],
       [ 1.,  4.,  5., 20.]])

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM