《機器學習Python實現_10_11_集成學習_xgboost_回歸的簡單實現》


一.損失函數

這一節對xgboost回歸做介紹,xgboost共實現了5種類型的回歸,分別是squarederror、logistic、poisson、gamma、tweedie回歸,下面主要對前兩種進行推導實現,剩余三種放到下一節

squarederror

即損失函數為平方誤差的回歸模型:

\[L(y,\hat{y})=\frac{1}{2}(y-\hat{y})^2 \]

所以一階導和二階導分別為:

\[\frac{\partial L(y,\hat{y})}{\partial \hat{y}}=\hat{y}-y\\ \frac{\partial^2 L(y,\hat{y})}{{\partial \hat{y}}^2}=1.0\\ \]

logistic

由於是回歸任務,所以y也要套上sigmoid函數(用\(\sigma(\cdot)\)表示),損失函數:

\[L(y,\hat{y})=(1-\sigma(y))log(1-\sigma(\hat{y}))+\sigma(y)log(\sigma(\hat{y})) \]

一階導和二階導分別為:

\[\frac{\partial L(y,\hat{y})}{\partial \hat{y}}=\sigma(\hat{y})-\sigma(y)\\ \frac{\partial^2 L(y,\hat{y})}{{\partial \hat{y}}^2}=\sigma(\hat{y})(1-\sigma(\hat{y}))\\ \]

二.代碼實現

具體流程與gbdt的回歸類似,只是每次要計算一階、二階導數信息,同時基學習器要替換為上一節的xgboost回歸樹

import os
os.chdir('../')
import matplotlib.pyplot as plt
%matplotlib inline
from ml_models.ensemble import XGBoostBaseTree
from ml_models import utils
import copy
import numpy as np

"""
xgboost回歸樹的實現,封裝到ml_models.ensemble
"""

class XGBoostRegressor(object):
    def __init__(self, base_estimator=None, n_estimators=10, learning_rate=1.0, loss='squarederror'):
        """
        :param base_estimator: 基學習器
        :param n_estimators: 基學習器迭代數量
        :param learning_rate: 學習率,降低后續基學習器的權重,避免過擬合
        :param loss:損失函數,支持squarederror、logistic
        """
        self.base_estimator = base_estimator
        self.n_estimators = n_estimators
        self.learning_rate = learning_rate
        if self.base_estimator is None:
            # 默認使用決策樹樁
            self.base_estimator = XGBoostBaseTree()
        # 同質分類器
        if type(base_estimator) != list:
            estimator = self.base_estimator
            self.base_estimator = [copy.deepcopy(estimator) for _ in range(0, self.n_estimators)]
        # 異質分類器
        else:
            self.n_estimators = len(self.base_estimator)
        self.loss = loss

    def _get_gradient_hess(self, y, y_pred):
        """
        獲取一階、二階導數信息
        :param y:真實值
        :param y_pred:預測值
        :return:
        """
        if self.loss == 'squarederror':
            return y_pred - y, np.ones_like(y)
        elif self.loss == 'logistic':
            return utils.sigmoid(y_pred) - utils.sigmoid(y), utils.sigmoid(y_pred) * (1 - utils.sigmoid(y_pred))

    def fit(self, x, y):
        y_pred = np.zeros_like(y)
        g, h = self._get_gradient_hess(y, y_pred)
        for index in range(0, self.n_estimators):
            self.base_estimator[index].fit(x, g, h)
            y_pred += self.base_estimator[index].predict(x) * self.learning_rate
            g, h = self._get_gradient_hess(y, y_pred)

    def predict(self, x):
        rst_np = np.sum(
            [self.base_estimator[0].predict(x)] +
            [self.learning_rate * self.base_estimator[i].predict(x) for i in
             range(1, self.n_estimators - 1)] +
            [self.base_estimator[self.n_estimators - 1].predict(x)]
            , axis=0)
        return rst_np
#測試
data = np.linspace(1, 10, num=100)
target = np.sin(data) + np.random.random(size=100)  # 添加噪聲
data = data.reshape((-1, 1))
model = XGBoostRegressor(loss='squarederror')
model.fit(data, target)
plt.scatter(data, target)
plt.plot(data, model.predict(data), color='r')
plt.show()

model = XGBoostRegressor(loss='logistic')
model.fit(data, target)
plt.scatter(data, target)
plt.plot(data, model.predict(data), color='r')
plt.show()



免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM