利用隨機森林進行特征重要性評估

本文轉載自查看原文 2020-09-21 15:53 793

https://blog.csdn.net/xiezhen_zheng/article/details/82011908

參考：特征篩選方法

https://blog.csdn.net/m0_37316673/article/details/107524247

import pandas as pd

df = pd.read_csv('D:Users/FengZH2/Desktop/test/testdata.csv',encoding='gbk')

df.info()

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
x, y = df.iloc[:, 1:].values, df.iloc[:, 0].values
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.3, random_state = 0)
feat_labels = df.columns[1:]
forest = RandomForestClassifier(n_estimators=10000, random_state=0, n_jobs=-1)
forest.fit(x_train, y_train.astype('int'))

importances = forest.feature_importances_
import numpy as np
indices = np.argsort(importances)[::-1]
for f in range(x_train.shape[1]):
    print("%2d) %-*s %f" % (f + 1, 30, feat_labels[indices[f]], importances[indices[f]]))

threshold = 0.15
x_selected = x_train[:, importances > threshold]
x_selected.shape

import matplotlib.pyplot as plt
plt.figure(1)
plt.title('Feature Importances')
plt.barh(range(len(indices)), importances[indices], color='b', align='center')
plt.xlabel('Relative Importance')

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 pyspark 隨機森林特征重要性隨機森林是否需要交叉驗證+特征的重要性基於隨機森林做回歸任務（數據預處理、MAPE指標評估、可視化展示、特征重要性、預測和實際值差異顯示圖）【機器學習】隨機森林 Random Forest 得到模型后，評估參數重要性 kaggle數據挖掘競賽初步--Titanic<隨機森林&特征重要性> 拓端tecdat|R語言隨機森林模型中具有相關特征的變量重要性 RandomForestClassifier(隨機森林檢測每個特征的重要性及每個樣例屬於哪個類的概率) 3(3).特征選擇---嵌入法（特征重要性評估）特征重要性之排列重要性Permutaion Importance 特征重要性--feature_importance