《機器學習Python實現_10_11_集成學習_xgboost_回歸的簡單實現》

本文轉載自查看原文 2021-07-04 19:42 162 python/ boosting/ 機器學習/ 機器學習原理及實現/ GBDT/ 回歸/ xgboos

一.損失函數

這一節對xgboost回歸做介紹，xgboost共實現了5種類型的回歸，分別是squarederror、logistic、poisson、gamma、tweedie回歸，下面主要對前兩種進行推導實現，剩余三種放到下一節

squarederror

即損失函數為平方誤差的回歸模型：

\[L(y,\hat{y})=\frac{1}{2}(y-\hat{y})^2 \]

所以一階導和二階導分別為：

\[\frac{\partial L(y,\hat{y})}{\partial \hat{y}}=\hat{y}-y\\ \frac{\partial^2 L(y,\hat{y})}{{\partial \hat{y}}^2}=1.0\\ \]

logistic

由於是回歸任務，所以y也要套上sigmoid函數（用\(\sigma(\cdot)\)表示），損失函數：

\[L(y,\hat{y})=(1-\sigma(y))log(1-\sigma(\hat{y}))+\sigma(y)log(\sigma(\hat{y})) \]

一階導和二階導分別為：

\[\frac{\partial L(y,\hat{y})}{\partial \hat{y}}=\sigma(\hat{y})-\sigma(y)\\ \frac{\partial^2 L(y,\hat{y})}{{\partial \hat{y}}^2}=\sigma(\hat{y})(1-\sigma(\hat{y}))\\ \]

二.代碼實現

具體流程與gbdt的回歸類似，只是每次要計算一階、二階導數信息，同時基學習器要替換為上一節的xgboost回歸樹

import os
os.chdir('../')
import matplotlib.pyplot as plt
%matplotlib inline
from ml_models.ensemble import XGBoostBaseTree
from ml_models import utils
import copy
import numpy as np

"""
xgboost回歸樹的實現，封裝到ml_models.ensemble
"""

class XGBoostRegressor(object):
    def __init__(self, base_estimator=None, n_estimators=10, learning_rate=1.0, loss='squarederror'):
        """
        :param base_estimator: 基學習器
        :param n_estimators: 基學習器迭代數量
        :param learning_rate: 學習率，降低后續基學習器的權重，避免過擬合
        :param loss:損失函數，支持squarederror、logistic
        """
        self.base_estimator = base_estimator
        self.n_estimators = n_estimators
        self.learning_rate = learning_rate
        if self.base_estimator is None:
            # 默認使用決策樹樁
            self.base_estimator = XGBoostBaseTree()
        # 同質分類器
        if type(base_estimator) != list:
            estimator = self.base_estimator
            self.base_estimator = [copy.deepcopy(estimator) for _ in range(0, self.n_estimators)]
        # 異質分類器
        else:
            self.n_estimators = len(self.base_estimator)
        self.loss = loss

    def _get_gradient_hess(self, y, y_pred):
        """
        獲取一階、二階導數信息
        :param y:真實值
        :param y_pred:預測值
        :return:
        """
        if self.loss == 'squarederror':
            return y_pred - y, np.ones_like(y)
        elif self.loss == 'logistic':
            return utils.sigmoid(y_pred) - utils.sigmoid(y), utils.sigmoid(y_pred) * (1 - utils.sigmoid(y_pred))

    def fit(self, x, y):
        y_pred = np.zeros_like(y)
        g, h = self._get_gradient_hess(y, y_pred)
        for index in range(0, self.n_estimators):
            self.base_estimator[index].fit(x, g, h)
            y_pred += self.base_estimator[index].predict(x) * self.learning_rate
            g, h = self._get_gradient_hess(y, y_pred)

    def predict(self, x):
        rst_np = np.sum(
            [self.base_estimator[0].predict(x)] +
            [self.learning_rate * self.base_estimator[i].predict(x) for i in
             range(1, self.n_estimators - 1)] +
            [self.base_estimator[self.n_estimators - 1].predict(x)]
            , axis=0)
        return rst_np

#測試
data = np.linspace(1, 10, num=100)
target = np.sin(data) + np.random.random(size=100)  # 添加噪聲
data = data.reshape((-1, 1))

model = XGBoostRegressor(loss='squarederror')
model.fit(data, target)
plt.scatter(data, target)
plt.plot(data, model.predict(data), color='r')
plt.show()

model = XGBoostRegressor(loss='logistic')
model.fit(data, target)
plt.scatter(data, target)
plt.plot(data, model.predict(data), color='r')
plt.show()

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 《機器學習Python實現_10_10_集成學習_xgboost_原理介紹及回歸樹的簡單實現》《機器學習Python實現_10_12_集成學習_xgboost_回歸的更多實現：泊松回歸、gamma回歸、tweedie回歸》機器學習-softmax回歸 python實現《機器學習Python實現_10_06_集成學習_boosting_gbdt分類實現》機器學習之python---Python實現邏輯回歸（LogisticRegression）機器學習4- 多元線性回歸+Python實現機器學習3- 一元線性回歸+Python實現機器學習5- 對數幾率回歸+Python實現機器學習作業（一）線性回歸——Python(numpy)實現機器學習算法實現——線性回歸