特征選擇---SelectKBest

本文轉載自查看原文 2021-01-12 20:01 307 經典算法/ 機器學習/ 特征工程

from sklearn.feature_selection import SelectKBest

http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectKBest.html#sklearn.feature_selection.SelectKBest.set_params

 
class SelectKBest(_BaseFilter):
 
"""Select features according to the k highest scores.
 
 
 
Read more in the :ref:`User Guide <univariate_feature_selection>`.
 
 
 
Parameters
 
----------
 
score_func : callable
 
Function taking two arrays X and y, and returning a pair of arrays
 
(scores, pvalues) or a single array with scores.
 
Default is f_classif (see below "See also"). The default function only
 
works with classification tasks.
 
 
 
k : int or "all", optional, default=10
 
Number of top features to select.
 
The "all" option bypasses selection, for use in a parameter search.
 
 
 
Attributes
 
----------
 
scores_ : array-like, shape=(n_features,)
 
Scores of features.
 
 
 
pvalues_ : array-like, shape=(n_features,)
 
p-values of feature scores, None if `score_func` returned only scores.
 
 
 
Notes
 
-----
 
Ties between features with equal scores will be broken in an unspecified
 
way.
 
 
 
See also
 
--------
 
f_classif: ANOVA F-value between label/feature for classification tasks.
 
mutual_info_classif: Mutual information for a discrete target.
 
chi2: Chi-squared stats of non-negative features for classification tasks.
 
f_regression: F-value between label/feature for regression tasks.
 
mutual_info_regression: Mutual information for a continuous target.
 
SelectPercentile: Select features based on percentile of the highest scores.
 
SelectFpr: Select features based on a false positive rate test.
 
SelectFdr: Select features based on an estimated false discovery rate.
 
SelectFwe: Select features based on family-wise error rate.
 
GenericUnivariateSelect: Univariate feature selector with configurable mode.
 
"""

官網的一個例子（需要自己給出計算公式、和k值）

參數

1、score_func : callable，函數取兩個數組X和y，返回一對數組（scores, pvalues）或一個分數的數組。默認函數為f_classif，默認函數只適用於分類函數。
2、k：int or "all", optional, default=10。所選擇的topK個特征。“all”選項則繞過選擇，用於參數搜索。

屬性

1、scores_ : array-like, shape=(n_features,)，特征的得分
2、pvalues_ : array-like, shape=(n_features,)，特征得分的p_value值，如果score_func只返回分數，則返回None。

score_func里可選的公式

方法

1、fit(X,y)，在（X，y）上運行記分函數並得到適當的特征。
2、fit_transform(X[, y])，擬合數據，然后轉換數據。
3、get_params([deep])，獲得此估計器的參數。
4、get_support([indices])，獲取所選特征的掩碼或整數索引。
5、inverse_transform(X)，反向變換操作。
6、set_params(**params)，設置估計器的參數。
7、transform(X)，將X還原為所選特征。

如何返回選擇特征的名稱或者索引。其實在上面的方法中已經提了一下了，那就是get_support（）

之前的digit數據是不帶特征名稱的，我選擇了帶特征的波士頓房價數據，因為是回歸數據，所以計算的評價指標也跟着變換了，f_regression，這里需要先fit一下，才能使用get_support()。里面的參數如果索引選擇True，

返回值就是feature的索引，可能想直接返回feature name在這里不能這么直接的調用了，但是在dataset里面去對應一下應該很容易的。這里我給出的K是5，選擇得分最高的前5個特征，分別是第2,5,9,10,12個屬性。
如果里面的參數選擇了False，返回值就是該特征是否被選擇的Boolean值。

鏈接：https://www.jianshu.com/p/586ba8c96a3d

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 特征選擇 mRMR特征選擇隨機森林之特征選擇【sklearn】特征選擇和降維特征選擇（Feature Selection）特征選擇與特征抽取的區別（總結） 2. 特征工程之特征選擇特征選擇和特征理解（轉）特征選擇法之方差選擇機器學習之特征選擇