大數據預處理-- LightGBM

本文轉載自查看原文 2022-06-17 22:28 618

只用一個模型建模獲得結果沒有對比性，無法判斷最終的預測結果是好還是壞，因此在進行預測時候往往都不是只使用一個模型進行，而是采用至少兩個模型進行對比，接下來就是使用LightGBM模型進行預測

需要先安裝LightGBM模塊，操作如下

然后從模塊中導入回歸模型，划分數據集后構建模型

from lightgbm import LGBMRegressor

y = listings_new['price']
x = listings_new.drop('price', axis =1)
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size = 0.25, random_state=1)
fit_params={
    "early_stopping_rounds":20, 
            "eval_metric" : 'rmse', 
            "eval_set" : [(X_test,y_test)],
            'eval_names': ['valid'],
            'verbose': 100,
            'feature_name': 'auto', 
            'categorical_feature': 'auto'
           }

X_test.columns = ["".join (c if c.isalnum() else "_" for c in str(x)) for x in X_test.columns]

class LGBMRegressor_GainFE(LGBMRegressor):
    @property
    def feature_importances_(self):
        if self._n_features is None:
            raise LGBMNotFittedError('No feature_importances found. Need to call fit beforehand.')
        return self.booster_.feature_importance(importance_type='gain')
        
clf = LGBMRegressor_GainFE(num_leaves= 25, max_depth=20, 
                         random_state=0, 
                         silent=True, 
                         metric='rmse', 
                         n_jobs=4, 
                         n_estimators=1000,
                         colsample_bytree=0.9,
                         subsample=0.9,
                         learning_rate=0.01)
#reduce_train.columns = ["".join (c if c.isalnum() else "_" for c in str(x)) for x in reduce_train.columns]
clf.fit(X_train.values, y_train.values, **fit_params)

　　輸出結果如下：

如果顯示上放的輸出結果說明模型訓練成功，但是過程並不一定會一帆風順，可能會運行報錯如下：TypeError: Cannot interpret '<attribute 'dtype' of 'numpy.generic' objects>' as a data type，此時可以升級一下pandas和numpy的版本，比如將pandas升級到1.2.4，numpy升級到1.20.2。然后重新運行當前的notebook就可以完美解決這個問題

接着就可以使用訓練好的模型進行預測並查看模型得分，順帶可以將重要的影響因素進行可視化

y_pred = clf.predict(X_test.values)
print('R^2 test: %.3f' % (r2_score(y_test, y_pred)))

feat_imp = pd.Series(clf.feature_importances_, index=x.columns)
feat_imp.nlargest(20).plot(kind='barh', figsize=(10,6))

　　輸出結果如下：（使用LightGBM模型進行預測的得分要比隨機森林模型最終的得分要高，說明此數據集較適用於LightGBM模型）
請添加圖片描述

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 大數據之數據預處理大數據預處理技術大數據預處理綜述大數據預處理方法，來看看你知道幾個大數據實踐（三）：葡萄牙銀行數據集的數據預處理 2 python大數據挖掘系列之淘寶商城數據預處理實戰 Python處理大數據 kafka 處理大數據基於Docker處理大數據