xgboost 特征重要性計算


在XGBoost中提供了三種特征重要性的計算方法:

‘weight’ - the number of times a feature is used to split the data across all trees. 
‘gain’ - the average gain of the feature when it is used in trees 
‘cover’ - the average coverage of the feature when it is used in trees

簡單來說 
weight就是在所有樹中特征用來分割的節點個數總和; 
gain就是特征用於分割的平均增益 
cover 的解釋有點晦澀,在[R-package/man/xgb.plot.tree.Rd]有比較詳盡的解釋:(https://github.com/dmlc/xgboost/blob/f5659e17d5200bd7471a2e735177a81cb8d3012b/R-package/man/xgb.plot.tree.Rd):the sum of second order gradient of training data classified to the leaf, if it is square loss, this simply corresponds to the number of instances in that branch. Deeper in the tree a node is, lower this metric will be。實際上coverage可以理解為被分到該節點的樣本的二階導數之和,而特征度量的標准就是平均的coverage值。

還是舉李航書上那個例子,我們用不同顏色來表示不同的特征,繪制下圖 
這里寫圖片描述


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM