《機器學習Python實現_10_12_集成學習_xgboost_回歸的更多實現：泊松回歸、gamma回歸、tweedie回歸》

本文轉載自查看原文 2021-07-04 19:47 269 python/ 機器學習原理及實現/ poisson/ 泊松/ gamma/ 回歸/ xgboos/ tweedie

一.原理介紹

這一節將樹模型的預測與概率分布相結合，我們假設樹模型的輸出服從某一分布，而我們的目標是使得該輸出的概率盡可能的高，如下圖所示

而概率值最高的點通常由分布中的某一個參數（通常是均值）反映，所以我們將樹模型的輸出打造為分布中的該參數項，然后讓樹模型的輸出去逼近極大似然估計的結果即可，即：

\[\hat{y}\rightarrow \mu_{ML} \]

下面分別介紹possion回歸，gamma回歸，tweedie回歸，負二項回歸的具體求解

二.泊松回歸

泊松分布的表達式如下：

\[P(y\mid\lambda)=\frac{\lambda^y}{y!}e^{-\lambda} \]

其中，\(y\)是我們的目標輸出，\(\lambda\)為模型參數，且\(\lambda\)恰為該分布的均值，由於泊松分布要求\(y>0\)，所以我們對\(\hat{y}\)取指數去擬合\(\lambda\)，即令：

\[\lambda=e^{\hat{y}} \]

對於\(N\)個樣本，其似然函數可以表示如下：

\[\prod_{i=1}^N\frac{e^{y_i\hat{y_i}}e^{-e^{\hat{y_i}}}}{y_i!} \]

由於\(y_i!\)是常數，可以去掉，並對上式取負對數，轉換為求極小值的問題：

\[L(y,\hat{y})=\sum_{i=1}^N(e^{\hat{y_i}}-y_i\hat{y_i}) \]

所以，一階導和二階導分別為：

\[\frac{\partial L(y,\hat{y})}{\partial \hat{y}}=e^{\hat{y}}-y\\ \frac{\partial^2 L(y,\hat{y})}{{\partial \hat{y}}^2}=e^{\hat{y}}\\ \]

三.gamma回歸

gamma分布如下：

\[p(y\mid\alpha,\lambda)=\frac{1}{\Gamma(\alpha)\lambda^\alpha}y^{\alpha-1}e^{-y/\lambda} \]

其中，\(y>0\)為我們的目標輸出，\(\alpha\)為形狀參數，\(\lambda\)為尺度參數，\(\Gamma(\cdot)\)為Gamma函數（后續推導這里會被省略，所以就不列出來了），而Gamma分布的均值為\(\alpha\lambda\)，這里不好直接變換，我們令\(\alpha=1/\phi,\lambda=\phi\mu\)，所以現在Gamma分布的均值可以表示為\(\mu\)，此時的Gamma分布為：

\[p(y\mid\mu,\phi)=\frac{1}{y\Gamma(1/\phi)}(\frac{y}{\mu\phi})^{1/\phi}exp[-\frac{y}{\mu\phi}] \]

此時，\(\mu\)看做Gamma分布的均值參數，而\(\phi\)為它的離散參數，在均值給定的情況下，若離散參數越大，Gamma分布的離散程度越大，接下來對上面的表達式進一步變換：

\[p(y\mid\mu,\phi)=exp[\frac{-y/\mu-ln\mu}{\phi}+\frac{1-\phi}{\phi}lny-\frac{ln\phi}{\phi}-ln\Gamma(\frac{1}{\phi})] \]

同泊松分布一樣，我們可以令：

\[\mu=e^{\hat{y}} \]

又由於\(\mu\)與\(\phi\)無關，所以做極大似然估計時可以將\(\phi\)看做常數，我們將對數似然函數的負數看做損失函數，可以寫作如下：

\[L(y,\hat{y})=\sum_{i=1}^N(\frac{y_i}{e^{\hat{y_i}}}+\hat{y_i}) \]

所以，一階導和二階導就可以寫出來啦：

\[\frac{\partial L(y,\hat{y})}{\partial \hat{y}}=1-ye^{-\hat{y}}\\ \frac{\partial^2 L(y,\hat{y})}{{\partial \hat{y}}^2}=ye^{-\hat{y}}\\ \]

注意：上面的兩個向量是按元素相乘

四.tweedie回歸

tweedie回歸是多個分布的組合體，包括gamma分布，泊松分布，高斯分布等，tweedie回歸由一個超參數\(p\)控制，\(p\)不同，則其對應的對數似然函數也不同：

\[g(y,\phi)+\left\{\begin{matrix} \frac{1}{\phi}(ylog(\mu)-\frac{\mu^{2-p}}{2-p}) & p=1\\ \frac{1}{\phi}(y\frac{\mu^{1-p}}{1-p}-log\mu) & p=2 \\ \frac{1}{\phi}(y\frac{\mu^{1-p}}{1-p}-\frac{\mu^{2-p}}{2-p}) & p\neq 1,p\neq 2 \end{matrix}\right. \]

同樣的，我們可以令：

\[\mu=e^{\hat{y}} \]

由於除開\(\mu\)以外的都可以視作常數項，所以損失函數可以簡化為：

\[L(y,\hat{y})=\left\{\begin{matrix} \sum_{i=1}^n(\frac{e^{\hat{y_i}(2-p)}}{2-p}-y_i\hat{y_i})=\sum_{i=1}^n(e^{\hat{y_i}}-y_i\hat{y_i}) & p=1\\ \sum_{i=1}^n(\hat{y_i}+y_ie^{-\hat{y_i}}) & p=2 \\ \sum_{i=1}^n(\frac{exp[\hat{y_i}(2-p)]}{2-p}-y_i\frac{exp[\hat{y_i}(1-p)]}{1-p}) & p\neq 1,p\neq 2 \end{matrix}\right. \]

所以，一階導：

\[\frac{\partial L(y,\hat{y})}{\partial \hat{y}}=\left\{\begin{matrix} e^{\hat{y}}-y & p=1\\ 1-ye^{-\hat{y}} & p=2 \\ e^{\hat{y}(2-p)}-ye^{\hat{y}(1-p)} & p\neq 1,p\neq 2 \end{matrix}\right. \]

二階導：

\[\frac{\partial^2 L(y,\hat{y})}{{\partial \hat{y}}^2}=\left\{\begin{matrix} e^{\hat{y}} & p=1\\ ye^{-\hat{y}} & p=2 \\ (2-p)e^{\hat{y}(2-p)}-(1-p)ye^{\hat{y}(1-p)} & p\neq 1,p\neq 2 \end{matrix}\right. \]

五.代碼實現

基於上一節的計算框架，略作調整即可實現....

import os
os.chdir('../')
import matplotlib.pyplot as plt
%matplotlib inline
from ml_models.ensemble import XGBoostBaseTree
from ml_models import utils
import copy
import numpy as np

"""
xgboost回歸樹的實現，封裝到ml_models.ensemble
"""

class XGBoostRegressor(object):
    def __init__(self, base_estimator=None, n_estimators=10, learning_rate=1.0, loss='squarederror', p=2.5):
        """
        :param base_estimator: 基學習器
        :param n_estimators: 基學習器迭代數量
        :param learning_rate: 學習率，降低后續基學習器的權重，避免過擬合
        :param loss:損失函數，支持squarederror、logistic、poisson,gamma,tweedie
        :param p:對tweedie回歸生效
        """
        self.base_estimator = base_estimator
        self.n_estimators = n_estimators
        self.learning_rate = learning_rate
        if self.base_estimator is None:
            # 默認使用決策樹樁
            self.base_estimator = XGBoostBaseTree()
        # 同質分類器
        if type(base_estimator) != list:
            estimator = self.base_estimator
            self.base_estimator = [copy.deepcopy(estimator) for _ in range(0, self.n_estimators)]
        # 異質分類器
        else:
            self.n_estimators = len(self.base_estimator)
        self.loss = loss
        self.p = p

    def _get_gradient_hess(self, y, y_pred):
        """
        獲取一階、二階導數信息
        :param y:真實值
        :param y_pred:預測值
        :return:
        """
        if self.loss == 'squarederror':
            return y_pred - y, np.ones_like(y)
        elif self.loss == 'logistic':
            return utils.sigmoid(y_pred) - utils.sigmoid(y), utils.sigmoid(y_pred) * (1 - utils.sigmoid(y_pred))
        elif self.loss == 'poisson':
            return np.exp(y_pred) - y, np.exp(y_pred)
        elif self.loss == 'gamma':
            return 1.0 - y * np.exp(-1.0 * y_pred), y * np.exp(-1.0 * y_pred)
        elif self.loss == 'tweedie':
            if self.p == 1:
                return np.exp(y_pred) - y, np.exp(y_pred)
            elif self.p == 2:
                return 1.0 - y * np.exp(-1.0 * y_pred), y * np.exp(-1.0 * y_pred)
            else:
                return np.exp(y_pred * (2.0 - self.p)) - y * np.exp(y_pred * (1.0 - self.p)), (2.0 - self.p) * np.exp(
                    y_pred * (2.0 - self.p)) - (1.0 - self.p) * y * np.exp(y_pred * (1.0 - self.p))

    def fit(self, x, y):
        y_pred = np.zeros_like(y)
        g, h = self._get_gradient_hess(y, y_pred)
        for index in range(0, self.n_estimators):
            self.base_estimator[index].fit(x, g, h)
            y_pred += self.base_estimator[index].predict(x) * self.learning_rate
            g, h = self._get_gradient_hess(y, y_pred)

    def predict(self, x):
        rst_np = np.sum(
            [self.base_estimator[0].predict(x)] +
            [self.learning_rate * self.base_estimator[i].predict(x) for i in
             range(1, self.n_estimators - 1)] +
            [self.base_estimator[self.n_estimators - 1].predict(x)]
            , axis=0)
        if self.loss in ["poisson", "gamma", "tweedie"]:
            return np.exp(rst_np)
        else:
            return rst_np

對泊松、gamma、tweedie回歸做測試

data = np.linspace(1, 10, num=100)
target = np.sin(data) + np.random.random(size=100) + 1  # 添加噪聲
data = data.reshape((-1, 1))

model = XGBoostRegressor(loss='poisson')
model.fit(data, target)
plt.scatter(data, target)
plt.plot(data, model.predict(data), color='r')

[<matplotlib.lines.Line2D at 0x29c33c1a9b0>]

model = XGBoostRegressor(loss='gamma')
model.fit(data, target)
plt.scatter(data, target)
plt.plot(data, model.predict(data), color='r')

[<matplotlib.lines.Line2D at 0x29c33c1a400>]

model = XGBoostRegressor(loss='tweedie',p=2.5)
model.fit(data, target)
plt.scatter(data, target)
plt.plot(data, model.predict(data), color='r')

[<matplotlib.lines.Line2D at 0x29c33c1aba8>]

model = XGBoostRegressor(loss='tweedie',p=1.5)
model.fit(data, target)
plt.scatter(data, target)
plt.plot(data, model.predict(data), color='r')

[<matplotlib.lines.Line2D at 0x29c33c43d68>]

上面的擬合結果，看不出明顯區別....,接下來對tweedie分布中p取極端值做一個簡單探索...，可以發現取值過大或者過小都有可能陷入欠擬合

model = XGBoostRegressor(loss='tweedie',p=0.1)
model.fit(data, target)
plt.scatter(data, target)
plt.plot(data, model.predict(data), color='r')

[<matplotlib.lines.Line2D at 0x29c44ed2cc0>]

model = XGBoostRegressor(loss='tweedie',p=20)
model.fit(data, target)
plt.scatter(data, target)
plt.plot(data, model.predict(data), color='r')

[<matplotlib.lines.Line2D at 0x29c44e94f28>]

參考內容：
http://www.doc88.com/p-9029670237688.html
http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/glm.html#families

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 《機器學習Python實現_10_11_集成學習_xgboost_回歸的簡單實現》《機器學習Python實現_10_10_集成學習_xgboost_原理介紹及回歸樹的簡單實現》機器學習-softmax回歸 python實現機器學習(2)：Softmax回歸原理及其實現 python機器學習《回歸一》機器學習算法實現——線性回歸 Machine Learning 學習筆記 (3) —— 泊松回歸與Softmax回歸機器學習（七）—回歸機器學習《回歸二》機器學習之python---Python實現邏輯回歸（LogisticRegression）