【機器學習】集成學習之sklearn中的xgboost基本用法

本文轉載自查看原文 2018-03-16 09:27 7234

1.數據集

數據集使用sklearn自帶的手寫數字識別數據集mnist，通過函數datasets導入。mnist共1797個樣本，8*8個特征，標簽為0~9十個數字。

  1 ### 載入數據
  2 from sklearn import datasets    # 載入數據集
  3 digits = datasets.load_digits() # 載入mnist數據集
  4 print(digits.data.shape)        # 打印輸入空間維度
  5 print(digits.target.shape)      # 打印輸出空間維度
  6 """
  7 (1797, 64)
  8 (1797,)
  9 """

2.數據集分割

sklearn.model_selection中train_test_split函數划分數據集，其中參數test_size為測試集所占的比例，random_state為隨機種子（為了能夠復現實驗結果而設定）。

  1 ### 數據分割
  2 from sklearn.model_selection import train_test_split                 # 載入數據分割函數train_test_split
  3 x_train,x_test,y_train,y_test = train_test_split(digits.data,        # 特征空間
  4                                                  digits.target,      # 輸出空間
  5                                                  test_size = 0.3,    # 測試集占30%
  6                                                  random_state = 33)  # 為了復現實驗，設置一個隨機數
  7

3.模型相關（載入模型--訓練模型--模型預測）

XGBClassifier.fit()函數用於訓練模型，XGBClassifier.predict()函數為使用模型做預測。

  1 ### 模型相關
  2 from xgboost import XGBClassifier
  3 model = XGBClassifier()               # 載入模型（模型命名為model)
  4 model.fit(x_train,y_train)            # 訓練模型（訓練集）
  5 y_pred = model.predict(x_test)        # 模型預測（測試集），y_pred為預測結果

4.性能評估

sklearn.metrics中accuracy_score函數用來判斷模型預測的准確度。

  1 ### 性能度量
  2 from sklearn.metrics import accuracy_score   # 准確率
  3 accuracy = accuracy_score(y_test,y_pred)
  4 print("accuarcy: %.2f%%" % (accuracy*100.0))
  5 
  6 """
  7 95.0%
  8 """

5.特征重要性

xgboost分析了特征的重要程度，通過函數plot_importance繪制圖片。

  1 ### 特征重要性
  2 import matplotlib.pyplot as plt
  3 from xgboost import plot_importance
  4 fig,ax = plt.subplots(figsize=(10,15))
  5 plot_importance(model,height=0.5,max_num_features=64,ax=ax)
  6 plt.show()

6.完整代碼

  1 # -*- coding: utf-8 -*-
  2 """
  3 ###############################################################################
  4 # 作者：wanglei5205
  5 # 郵箱：wanglei5205@126.com
  6 # 代碼：http://github.com/wanglei5205
  7 # 博客：http://cnblogs.com/wanglei5205
  8 # 目的：xgboost基本用法
  9 ###############################################################################
 10 """
 11 ### load module
 12 from sklearn import datasets
 13 from sklearn.model_selection import train_test_split
 14 from xgboost import XGBClassifier
 15 from sklearn.metrics import accuracy_score
 16 
 17 ### load datasets
 18 digits = datasets.load_digits()
 19 
 20 ### data analysis
 21 print(digits.data.shape)   # 輸入空間維度
 22 print(digits.target.shape) # 輸出空間維度
 23 
 24 ### data split
 25 x_train,x_test,y_train,y_test = train_test_split(digits.data,
 26                                                  digits.target,
 27                                                  test_size = 0.3,
 28                                                  random_state = 33)
 29 
 30 ### fit model for train data
 31 model = XGBClassifier()
 32 model.fit(x_train,y_train)
 33 
 34 ### make prediction for test data
 35 y_pred = model.predict(x_test)
 36 
 37 ### model evaluate
 38 accuracy = accuracy_score(y_test,y_pred)
 39 print("accuarcy: %.2f%%" % (accuracy*100.0))
 40 """
 41 95.0%
 42 """

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 機器學習——sklearn中的API 機器學習（四）--- 從gbdt到xgboost [python機器學習及實踐(5)]Sklearn實現集成【集成學習】sklearn中xgboost模塊中plot_importance函數（繪圖--特征重要性）圖解機器學習 | XGBoost模型詳解機器學習sklearn（88）：算法實例（45）分類（24）XGBoost（二）梯度提升樹（一）重要參數n_estimators 機器學習（八）——集成學習《機器學習Python實現_10_10_集成學習_xgboost_原理介紹及回歸樹的簡單實現》機器學習：基於sklearn的AUC的計算原理機器學習之sklearn——主題模型