sklearn中的lgb

本文轉載自查看原文 2020-11-29 10:38 543 人工智能

LGBMModel 模型創建：

參數

----------

boosting_type : string, optional (default='gbdt')

'gbdt', 傳統的梯度提升決策樹。

'dart', Dropouts meet Multiple Additive Regression Trees.

'goss', 基於梯度的單邊采樣。

'rf', 隨機森林.

num_leaves : int, optional (default=31)

一棵樹的最大葉子數

max_depth : int, optional (default=-1)

基本學習的最大深度, <=0 表示不限制

learning_rate : float, optional (default=0.1)

學習率

可以使用在fit方法的callbacks參數中的reset_parameter回調函數來縮小和調整學習率。如果使用了回調函數，那么將忽略learning_rate參數。

n_estimators : int, optional (default=100)

訓練次數-也是樹的個數

subsample_for_bin : int, optional (default=200000)

用於構造垃圾箱的樣本數- 對輸入特征分成多少份

objective : string, callable or None, optional (default=None)

指定目標函數，- 是回歸還是分類亦或是排名

如果是字符串，則用內置的，

如果是callabel, 則目標函數是自定義的

如果是None. 對於回歸模型LGBMRegressor，那么就是'regression'。對於分類模型LGBMClassifier，那么就是'binary' or 'multiclass' 。對於排行模型LGBMRanker，那么就是'lambdarank'

class_weight : dict, 'balanced' or None, optional (default=None)

指定類別的權重，通過這樣的方式 ``{class_label: weight}``.

dict方式這個參數只對多分類任務有效。

對於二分類任務，可以使用 ``is_unbalance`` 或 ``scale_pos_weight`` 參數

注意, 所有這些參數的使用將導致對各個類別概率的估計不足。

您可能要考慮對你的模型執行概率校准，參考地址：https://scikit-learn.org/stable/modules/calibration.html

如果值為“balanced”。模式將使用y的值自動調整權重。與輸入數據中類別頻率成反比例。``n_samples / (n_classes * np.bincount(y))``.

如果值為None。那么所有類別的權重是1

注意, 這些權重會與 ``sample_weight`` 相乘 (如果通過fit方法指定了sample_weight的話)

min_split_gain : float, optional (default=0.)

在樹的葉節點上進行切分的最小增益

min_child_weight : float, optional (default=1e-3)

葉節點樣本權重之和的最小值

min_child_samples : int, optional (default=20)

一個葉子節點上包含的最少樣本數量

subsample : float, optional (default=1.)

訓練實例的子樣本比率。

subsample_freq : int, optional (default=0)

子樣本的頻率, <=0 表示禁用

colsample_bytree : float, optional (default=1.)

構造每棵樹時，列的子采樣率

reg_alpha : float, optional (default=0.)

L1 正則化權重

reg_lambda : float, optional (default=0.)

L2 正則化權重

random_state : int or None, optional (default=None)

隨機種子

如果None，默認使用在c++代碼的種子

n_jobs : int, optional (default=-1)

並行線程數

silent : bool, optional (default=True)

運行增強時是否打印消息。

importance_type : string, optional (default='split')

特征重要性類型

如果為“ split”，則結果包含該特征在模型中使用的次數。
如果為“ gain”，則結果包含使用該功能的分割的總增益。

**kwargs

模型的其他參數

更多參數查看地址： http://lightgbm.readthedocs.io/en/latest/Parameters.html

警告：sklearn 不支持**kwargs. 因為可能引發異常。

==================================================================

訓練fit

參數

----------

X : 數組類型矩陣形狀 = [n_samples, n_features]

輸入特征矩陣

y : 數組類型形狀 = [n_samples]

目標值 (分類模型中是類標簽, 回歸模型是真實值).

sample_weight : 數組類型形狀 = [n_samples] optional (default=None)

訓練數據的權重

init_score : 數組類型形狀 = [n_samples] optional (default=None)

訓練數據的初始分數

group : 數組類型或者 None, optional (default=None)

訓練數據的組數據

eval_set : list or None, optional (default=None)

使用(X,y) 元組作為驗證集

eval_names : list of strings or None, optional (default=None)

驗證集的名稱

eval_sample_weight : list of arrays or None, optional (default=None)

驗證集的權重（回歸模型）

eval_class_weight : list or None, optional (default=None)

驗證集標簽的權重（分類模型）

eval_init_score : list of arrays or None, optional (default=None)

驗證集初始分數

eval_group : list of arrays or None, optional (default=None)

驗證集組數據

eval_metric : string, list of strings, callable or None, optional (default=None)

如果為字符串，則應使用內置評估指標。

如何為可執行對象, 則是自定義評估指標

其他情況, 字符串和可執行對象的列表

默認情況: LGBMRegressor使用L2, LGBMClassifier使用logloss, LGBMRanker使用ndcg

early_stopping_rounds : int or None, optional (default=None)

如果驗證的分數不再下降，達到 early_stopping_rounds 次了，依舊沒有下降，那么終止訓練

至少需要一個驗證數據和一個指標。

如果有多個指標（eval_metric是list,且長度大於1），那么將檢查所有。

如果只想檢查第一個指標，那么需要設置first_metric_only = True

verbose : bool or int, optional (default=True)

至少需要一個驗證數據

如果為True, 評估集上的評估指標會在每個提升階段打印出來。

如果為 int, 每隔多少次打印

如果訓練提前停止了，也會打印。這個提前停止是通過early_stopping_rounds設置的

feature_name : list of strings or 'auto', optional (default='auto')

特征名稱。字符串列表或者 auto

如果為 'auto' 且數據是 pandas DataFrame, 那么則使用數據列名

categorical_feature : list of strings or int, or 'auto', optional (default='auto')

分類特征

如果是整數列表，則使用索引

如果是字符串列表，則像指定 feature_name 一樣

如果為 'auto' 且數據是 pandas DataFrame, 則使用無序分類列

所有的類別特征值應該小於 2147483647

較大的值會占用內存。考慮使用從零開始的連續整數。

分類特征中的所有負值都將被視為缺失值

不能相對於分類特征單調約束輸出

callbacks : list of callback functions or None, optional (default=None)

回調函數列表。每次迭代都會調用

返回值：模型self

============================================================

預測predict

參數

----------

X : 數組類型矩陣形狀 = [n_samples, n_features]

輸入特征矩陣

raw_score : bool, optional (default=False)

是否預測原始分數

num_iteration : int or None, optional (default=None)

限制預測中的迭代次數。

如果為 None, 如果存在最好的迭代次數，則使用最好的迭代次數; 其他情況, 所有的樹被使用

如果 <= 0, 所有的樹被使用 (不限制).

pred_leaf : bool, optional (default=False)

是否預測葉指數

pred_contrib : bool, optional (default=False)

是否預測特征貢獻。

提示::

如果您想使用SHAP值獲得有關模型預測的更多解釋，
像SHAP互動值一樣，
您可以安裝shap軟件包（https://github.com/slundberg/shap）。
請注意，與shap包不同，使用pred_contrib，我們將返回帶有額外內容的矩陣。
列，最后一列是期望值。

**kwargs

Other parameters for the prediction.

返回值

-------

predicted_result : 數組類型形狀 = [n_samples] 或者形狀 = [n_samples, n_classes]

預測值

X_leaves : 數組類型形狀 = [n_samples, n_trees] 或形狀 = [n_samples, n_trees * n_classes]

如果 ``pred_leaf=True``, 預測每個樣本的每棵樹的葉子。

X_SHAP_values : 數組類型形狀 = [n_samples, n_features + 1] 或形狀 = [n_samples, (n_features + 1) * n_classes]

如果 ``pred_contrib=True``, 每個樣本的特征貢獻。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 使用Dataset構建數據到lgb中 sklearn中的StandardScaler sklearn中的投票法 sklearn中的KMeans算法 sklearn中的損失函數 sklearn中的SGDClassifier sklearn中的邏輯回歸 sklearn中的SVM sklearn中的損失函數 sklearn中的Pipeline