xgboost 特征重要性計算

本文轉載自查看原文 2018-11-13 11:53 3135

在XGBoost中提供了三種特征重要性的計算方法：

‘weight’ - the number of times a feature is used to split the data across all trees.
‘gain’ - the average gain of the feature when it is used in trees
‘cover’ - the average coverage of the feature when it is used in trees

簡單來說
weight就是在所有樹中特征用來分割的節點個數總和；
gain就是特征用於分割的平均增益
cover 的解釋有點晦澀，在[R-package/man/xgb.plot.tree.Rd]有比較詳盡的解釋：(https://github.com/dmlc/xgboost/blob/f5659e17d5200bd7471a2e735177a81cb8d3012b/R-package/man/xgb.plot.tree.Rd)：the sum of second order gradient of training data classified to the leaf, if it is square loss, this simply corresponds to the number of instances in that branch. Deeper in the tree a node is, lower this metric will be。實際上coverage可以理解為被分到該節點的樣本的二階導數之和，而特征度量的標准就是平均的coverage值。

還是舉李航書上那個例子，我們用不同顏色來表示不同的特征，繪制下圖
這里寫圖片描述

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 XGBoost 輸出特征重要性以及篩選特征 XGBoost 輸出特征重要性以及篩選特征 xgboost dmatrix中的 weight的重要性 sklearn中xgboost模塊中plot_importance函數（特征重要性） xgboost.plot_importance畫特征重要性，字段是中文名稱時【集成學習】sklearn中xgboost模塊中plot_importance函數（繪圖--特征重要性）特征重要性之排列重要性Permutaion Importance 特征重要性--feature_importance 特征重要性篩選方法特征重要性之shap value