xgboost 源碼學習

本文轉載自查看原文 2019-08-19 15:45 1165 C/C++/ 機器學習/深度學習

官方代碼結構解析，README.MD

XGboost 回歸時，損失函數式平方誤差損失

分類時，是對數自燃損失；

Coding Guide
======
This file is intended to be notes about code structure in xgboost

Project Logical Layout // 依賴關系，IO -> LEANER（計算梯度並且傳導給GBM）-> GBM（梯度提升） -> TREE(構建樹的算法)  
=======
* Dependency order: io->learner->gbm->tree
  - All module depends on data.h
* tree are implementations of tree construction algorithms.
* gbm is gradient boosting interface, that takes trees and other base learner to do boosting.
  - gbm only takes gradient as sufficient statistics, it does not compute the gradient.
* learner is learning module that computes gradient for specific object, and pass it to GBM

File Naming Convention // .h定義數據結構和接口，.hpp實現接口
======= 
* .h files are data structures and interface, which are needed to use functions in that layer.
* -inl.hpp files are implementations of interface, like cpp file in most project.
  - You only need to understand the interface file to understand the usage of that layer
* In each folder, there can be a .cpp file, that compiles the module of that layer

How to Hack the Code // 目標函數定義和修改
======
* Add objective function: add to learner/objective-inl.hpp and register it in learner/objective.h ```CreateObjFunction``` 
  - You can also directly do it in python
* Add new evaluation metric: add to learner/evaluation-inl.hpp and register it in learner/evaluation.h ```CreateEvaluator``` 
* Add wrapper for a new language, most likely you can do it by taking the functions in python/xgboost_wrapper.h, which is purely C based, and call these C functions to use xgboost

XGBoost: eXtreme Gradient Boosting

An optimized general purpose gradient boosting library. The library is parallelized, and also provides an optimized distributed version.
It implements machine learning algorithm under gradient boosting framework, including generalized linear model and gradient boosted regression tree (GBDT). XGBoost can also also distributed and scale to Terascale data.

 

  UpdateOneIter流程主要有以下幾個步驟：

  1. LazyInitDMatrix(train); 
  2. PredictRaw(train, &preds_); 
  3. obj_->GetGradient(preds_, train->info(), iter, &gpair_); 
  4. gbm_->DoBoost(train, &gpair_, obj_.get());

objective.h 文件

#ifndef XGBOOST_LEARNER_OBJECTIVE_H_
#define XGBOOST_LEARNER_OBJECTIVE_H_
/*!
 * \file objective.h
 * \brief interface of objective function used for gradient boosting
 * \author Tianqi Chen, Kailong Chen
 */
#include "dmatrix.h"

namespace xgboost {
namespace learner {
/*! \brief interface of objective function */
class IObjFunction{/// 所有目標函數的基類定義
 public:
  /*! \brief virtual destructor */
  virtual ~IObjFunction(void){} /// 虛析構函數，釋放空間
  /*!
   * \brief set parameters from outside
   * \param name name of the parameter
   * \param val value of the parameter
   */
  virtual void SetParam(const char *name, const char *val) = 0;  /// 參數名、參數值
  /*!
   * \brief get gradient over each of predictions, given existing information
   * \param preds prediction of current round
   * \param info information about labels, weights, groups in rank
   * \param iter current iteration number
   * \param out_gpair output of get gradient, saves gradient and second order gradient in
   */
  virtual void GetGradient(const std::vector<float> &preds,
                           const MetaInfo &info,
                           int iter,
                           std::vector<bst_gpair> *out_gpair) = 0; /// 計算梯度
  /*! \return the default evaluation metric for the objective */
  virtual const char* DefaultEvalMetric(void) const = 0; /// 默認評測函數
  // the following functions are optional, most of time default implementation is good enough
  /*!
   * \brief transform prediction values, this is only called when Prediction is called
   * \param io_preds prediction values, saves to this vector as well
   */
  virtual void PredTransform(std::vector<float> *io_preds){}
  /*!
   * \brief transform prediction values, this is only called when Eval is called, 
   *  usually it redirect to PredTransform
   * \param io_preds prediction values, saves to this vector as well
   */
  virtual void EvalTransform(std::vector<float> *io_preds) {
    this->PredTransform(io_preds);
  }
  /*!
   * \brief transform probability value back to margin
   * this is used to transform user-set base_score back to margin 
   * used by gradient boosting
   * \return transformed value
   */
  virtual float ProbToMargin(float base_score) const {
    return base_score;
  }
};
}  // namespace learner
}  // namespace xgboost

// this are implementations of objective functions  /// .hpp中是目標函數的實現
#include "objective-inl.hpp"
// factory function
namespace xgboost {
namespace learner {
/*! \brief factory funciton to create objective function by name */
inline IObjFunction* CreateObjFunction(const char *name) { /// 實現的目標函數，根據傳入名稱確定調用哪個
  using namespace std;
  /// RegLossObj類實現，傳入不同的參數對應不同的損失
  if (!strcmp("reg:linear", name)) return new RegLossObj(LossType::kLinearSquare);
  if (!strcmp("reg:logistic", name)) return new RegLossObj(LossType::kLogisticNeglik);
  if (!strcmp("binary:logistic", name)) return new RegLossObj(LossType::kLogisticClassify);
  if (!strcmp("binary:logitraw", name)) return new RegLossObj(LossType::kLogisticRaw);

  /// PoissonRegression類實現
  if (!strcmp("count:poisson", name)) return new PoissonRegression();

  /// SoftmaxMultiClassObj 類實現
  if (!strcmp("multi:softmax", name)) return new SoftmaxMultiClassObj(0);
  if (!strcmp("multi:softprob", name)) return new SoftmaxMultiClassObj(1);

  /// 分別由LambdaRankObj LambdaRankObjNDCG  LambdaRankObjMAP 實現
  if (!strcmp("rank:pairwise", name )) return new PairwiseRankObj();
  if (!strcmp("rank:ndcg", name)) return new LambdaRankObjNDCG();
  if (!strcmp("rank:map", name)) return new LambdaRankObjMAP();  
  utils::Error("unknown objective function type: %s", name);
  return NULL;
}
}  // namespace learner
}  // namespace xgboost
#endif  // XGBOOST_LEARNER_OBJECTIVE_H_


/// .h定義數據結構和接口，.hpp實現接口

/*
/// 八種定義，針對不同的目標函數有不同的求解結果
“reg:linear” –線性回歸。
“reg:logistic” –邏輯回歸。
“binary:logistic” –二分類的邏輯回歸問題，輸出為概率。
“binary:logitraw” –二分類的邏輯回歸問題，輸出的結果為wTx。
“count:poisson” –計數問題的poisson回歸，輸出結果為poisson分布。 在poisson回歸中，max_delta_step的缺省值為0.7。(used to safeguard optimization)
“multi:softmax” –讓XGBoost采用softmax目標函數處理多分類問題，同時需要設置參數num_class（類別個數）
“multi:softprob” –和softmax一樣，但是輸出的是ndata * nclass的向量，可以將該向量reshape成ndata行nclass列的矩陣。沒行數據表示樣本所屬於每個類別的概率。
“rank:pairwise” –set XGBoost to do ranking task by minimizing the pairwise loss,比如AUC 這類的，就是pairwise
*/

/*
///
  UpdateOneIter流程主要有以下幾個步驟：

  1. LazyInitDMatrix(train); 
  2. PredictRaw(train, &preds_); 
  3. obj_->GetGradient(preds_, train->info(), iter, &gpair_); 
  4. gbm_->DoBoost(train, &gpair_, obj_.get());

*/

objective-inl.hpp 文件：

#ifndef XGBOOST_LEARNER_OBJECTIVE_INL_HPP_
#define XGBOOST_LEARNER_OBJECTIVE_INL_HPP_
/*!
 * \file objective-inl.hpp
 * \brief objective function implementations
 * \author Tianqi Chen, Kailong Chen
 */


/// 關於目標函數的求解可以參看： https://www.cnblogs.com/harvey888/p/7203256.html
/// 算法原理：http://wepon.me/files/gbdt.pdf
/// 目標函數推導分析：https://blog.csdn.net/yuxeaotao/article/details/90378782
/// https://blog.csdn.net/a819825294/article/details/51206410

/// 源碼流程：https://blog.csdn.net/matrix_zzl/article/details/78699605
/// 源碼主要函數：https://blog.csdn.net/weixin_39750084/article/details/83244191


#include <vector>
#include <algorithm>
#include <utility>
#include <cmath>
#include <functional>
#include "../data.h"
#include "./objective.h"
#include "./helper_utils.h"
#include "../utils/random.h"
#include "../utils/omp.h"

namespace xgboost {
namespace learner {/// 實現一些常用的計算功能，並定義為inline
/*! \brief defines functions to calculate some commonly used functions */
struct LossType {
  /*! \brief indicate which type we are using */
  int loss_type;
  // list of constants
  static const int kLinearSquare = 0; /// 線性回歸
  static const int kLogisticNeglik = 1; /// 邏輯回歸,輸出概率
  static const int kLogisticClassify = 2; /// 二分類，輸出概率
  static const int kLogisticRaw = 3; /// 輸出原始的值，sigmoid 之后就能得到概率和上面的兩個相同
  /*!
   * \brief transform the linear sum to prediction
   * \param x linear sum of boosting ensemble
   * \return transformed prediction
   */
  inline float PredTransform(float x) const {/// 0和3 輸出一樣，1和2輸出一樣
    switch (loss_type) {
      case kLogisticRaw:
      case kLinearSquare: return x;
      case kLogisticClassify:
      case kLogisticNeglik: return 1.0f / (1.0f + std::exp(-x));
      default: utils::Error("unknown loss_type"); return 0.0f;
    }
  }
  /*!
   * \brief check if label range is valid
   */
  inline bool CheckLabel(float x) const {/// 判定label是否合理
    if (loss_type != kLinearSquare) {
      return x >= 0.0f && x <= 1.0f;
    }
    return true;
  }
  /*!
   * \brief error message displayed when check label fail
   */
  inline const char * CheckLabelErrorMsg(void) const {
    if (loss_type != kLinearSquare) {
      return "label must be in [0,1] for logistic regression";
    } else {
      return "";
    }
  }
  /*!
   * \brief calculate first order gradient of loss, given transformed prediction
   * \param predt transformed prediction
   * \param label true label
   * \return first order gradient
   */
  inline float FirstOrderGradient(float predt, float label) const {/// 計算不同目標函數的一階導數，可以看到kLogisticClassify 和 kLogisticNeglik 是一樣的返回值
    switch (loss_type) {
      case kLinearSquare: return predt - label;
      case kLogisticRaw: predt = 1.0f / (1.0f + std::exp(-predt));
      case kLogisticClassify:
      case kLogisticNeglik: return predt - label;
      default: utils::Error("unknown loss_type"); return 0.0f;
    }
  }
  /*!
   * \brief calculate second order gradient of loss, given transformed prediction
   * \param predt transformed prediction
   * \param label true label
   * \return second order gradient
   */
  inline float SecondOrderGradient(float predt, float label) const {/// 計算出二階導數
    // cap second order gradient to postive value
    const float eps = 1e-16f;
    switch (loss_type) {
      case kLinearSquare: return 1.0f;
      case kLogisticRaw: predt = 1.0f / (1.0f + std::exp(-predt));
      case kLogisticClassify:
      case kLogisticNeglik: return std::max(predt * (1.0f - predt), eps); /// 設置梯度閾值
      default: utils::Error("unknown loss_type"); return 0.0f;
    }
  }
  /*!
   * \brief transform probability value back to margin
   */
  inline float ProbToMargin(float base_score) const {/// 將概率轉化到范圍內
    if (loss_type == kLogisticRaw ||
        loss_type == kLogisticClassify ||
        loss_type == kLogisticNeglik ) {
      utils::Check(base_score > 0.0f && base_score < 1.0f,
                   "base_score must be in (0,1) for logistic loss");
      base_score = -std::log(1.0f / base_score - 1.0f);
    }
    return base_score;
  }
  /*! \brief get default evaluation metric for the objective */
  inline const char *DefaultEvalMetric(void) const {/// 默認的評測函數
    if (loss_type == kLogisticClassify) return "error";
    if (loss_type == kLogisticRaw) return "auc";
    return "rmse";
  }
};

/*! \brief objective function that only need to */  /// 邏輯回歸
class RegLossObj : public IObjFunction {/// explicit 關鍵字，防止構造函數的隱式自動轉化，IObjFunction 來自objective.h
 public:
  explicit RegLossObj(int loss_type) {/// 原則上應該在所有的構造函數前加explicit關鍵字，這樣可以大大減少錯誤的發生
    loss.loss_type = loss_type;
    scale_pos_weight = 1.0f;
  }
  virtual ~RegLossObj(void) {}/// 基類，虛析構函數（防止被子類繼承在析構時發生內存泄漏）
  virtual void SetParam(const char *name, const char *val) {/// 虛函數，實現多態
    using namespace std;
    if (!strcmp("scale_pos_weight", name)) {
      scale_pos_weight = static_cast<float>(atof(val));
    }
  }
  virtual void GetGradient(const std::vector<float> &preds,
                           const MetaInfo &info,
                           int iter,
                           std::vector<bst_gpair> *out_gpair) {
    utils::Check(info.labels.size() != 0, "label set cannot be empty");
    utils::Check(preds.size() % info.labels.size() == 0,
                 "labels are not correctly provided");
    std::vector<bst_gpair> &gpair = *out_gpair;
    gpair.resize(preds.size());
    // check if label in range
    bool label_correct = true;
    // start calculating gradient
    const unsigned nstep = static_cast<unsigned>(info.labels.size());
    const bst_omp_uint ndata = static_cast<bst_omp_uint>(preds.size());
    #pragma omp parallel for schedule(static) /// 下面的循環，多線程並行編程，靜態調度
    for (bst_omp_uint i = 0; i < ndata; ++i) {
      const unsigned j = i % nstep;
      float p = loss.PredTransform(preds[i]);
      float w = info.GetWeight(j);
      if (info.labels[j] == 1.0f) w *= scale_pos_weight;
      if (!loss.CheckLabel(info.labels[j])) label_correct = false;
      gpair[i] = bst_gpair(loss.FirstOrderGradient(p, info.labels[j]) * w,
                           loss.SecondOrderGradient(p, info.labels[j]) * w);
    }
    utils::Check(label_correct, loss.CheckLabelErrorMsg());
  }
  virtual const char* DefaultEvalMetric(void) const {
    return loss.DefaultEvalMetric();
  }
  virtual void PredTransform(std::vector<float> *io_preds) {
    std::vector<float> &preds = *io_preds;
    const bst_omp_uint ndata = static_cast<bst_omp_uint>(preds.size());
    #pragma omp parallel for schedule(static)
    for (bst_omp_uint j = 0; j < ndata; ++j) {
      preds[j] = loss.PredTransform(preds[j]);
    }
  }
  virtual float ProbToMargin(float base_score) const {
    return loss.ProbToMargin(base_score);
  }
/// 定義的類內變量為protected 可以被該類中的函數、子類的函數、以及其友元函數訪問,但不能被該類的對象訪問
 protected:
  float scale_pos_weight;
  LossType loss;
};

// poisson regression for count   ///泊松回歸
class PoissonRegression : public IObjFunction {
 public:
  explicit PoissonRegression(void) {
    max_delta_step = 0.0f;
  }
  virtual ~PoissonRegression(void) {}
  
  virtual void SetParam(const char *name, const char *val) {
    using namespace std;
    if (!strcmp( "max_delta_step", name )) {
      max_delta_step = static_cast<float>(atof(val));
    }
  }
  virtual void GetGradient(const std::vector<float> &preds,
                           const MetaInfo &info,
                           int iter,
                           std::vector<bst_gpair> *out_gpair) {
    utils::Check(max_delta_step != 0.0f,
                 "PoissonRegression: need to set max_delta_step");
    utils::Check(info.labels.size() != 0, "label set cannot be empty");
    utils::Check(preds.size() == info.labels.size(),
                 "labels are not correctly provided");
    std::vector<bst_gpair> &gpair = *out_gpair;
    gpair.resize(preds.size());
    // check if label in range
    bool label_correct = true;
    // start calculating gradient
    const long ndata = static_cast<bst_omp_uint>(preds.size());
    #pragma omp parallel for schedule(static)
    for (long i = 0; i < ndata; ++i) {
      float p = preds[i];
      float w = info.GetWeight(i);
      float y = info.labels[i];
      if (y >= 0.0f) {
        gpair[i] = bst_gpair((std::exp(p) - y) * w,
                             std::exp(p + max_delta_step) * w);
      } else {
        label_correct = false;
      }
    }
    utils::Check(label_correct,
                 "PoissonRegression: label must be nonnegative");
  }
  virtual void PredTransform(std::vector<float> *io_preds) {
    std::vector<float> &preds = *io_preds;
    const long ndata = static_cast<long>(preds.size());
    #pragma omp parallel for schedule(static)
    for (long j = 0; j < ndata; ++j) {
      preds[j] = std::exp(preds[j]);
    }
  }
  virtual void EvalTransform(std::vector<float> *io_preds) {
    PredTransform(io_preds);
  }
  virtual float ProbToMargin(float base_score) const {
    return std::log(base_score);
  }
  virtual const char* DefaultEvalMetric(void) const {
    return "poisson-nloglik";
  }
  
 private: /// 定義的類內變量為private 只能由該類中的函數、其友元函數訪問,不能被任何其他訪問，該類的對象也不能訪問. 
  float max_delta_step;
};

// softmax multi-class classification   /// 多分類
class SoftmaxMultiClassObj : public IObjFunction {
 public:
  explicit SoftmaxMultiClassObj(int output_prob)
      : output_prob(output_prob) {
    nclass = 0;
  }
  virtual ~SoftmaxMultiClassObj(void) {}
  virtual void SetParam(const char *name, const char *val) {
    using namespace std;
    if (!strcmp( "num_class", name )) nclass = atoi(val);
  }
  virtual void GetGradient(const std::vector<float> &preds,
                           const MetaInfo &info,
                           int iter,
                           std::vector<bst_gpair> *out_gpair) {
    utils::Check(nclass != 0, "must set num_class to use softmax");
    utils::Check(info.labels.size() != 0, "label set cannot be empty");
    utils::Check(preds.size() % (static_cast<size_t>(nclass) * info.labels.size()) == 0,
                 "SoftmaxMultiClassObj: label size and pred size does not match");
    std::vector<bst_gpair> &gpair = *out_gpair;
    gpair.resize(preds.size());
    const unsigned nstep = static_cast<unsigned>(info.labels.size() * nclass);
    const bst_omp_uint ndata = static_cast<bst_omp_uint>(preds.size() / nclass);
    int label_error = 0;
    #pragma omp parallel
    {
      std::vector<float> rec(nclass);
      #pragma omp for schedule(static)
      for (bst_omp_uint i = 0; i < ndata; ++i) {
        for (int k = 0; k < nclass; ++k) {
          rec[k] = preds[i * nclass + k];
        }
        Softmax(&rec);
        const unsigned j = i % nstep;
        int label = static_cast<int>(info.labels[j]);
        if (label < 0 || label >= nclass)  {
          label_error = label; label = 0;
        }
        const float wt = info.GetWeight(j);
        for (int k = 0; k < nclass; ++k) {
          float p = rec[k];
          const float h = 2.0f * p * (1.0f - p) * wt;
          if (label == k) {
            gpair[i * nclass + k] = bst_gpair((p - 1.0f) * wt, h);
          } else {
            gpair[i * nclass + k] = bst_gpair(p* wt, h);
          }
        }
      }
    }
    utils::Check(label_error >= 0 && label_error < nclass,
                 "SoftmaxMultiClassObj: label must be in [0, num_class),"\
                 " num_class=%d but found %d in label", nclass, label_error);
  }
  virtual void PredTransform(std::vector<float> *io_preds) {
    this->Transform(io_preds, output_prob);
  }
  virtual void EvalTransform(std::vector<float> *io_preds) {
    this->Transform(io_preds, 1);
  }
  virtual const char* DefaultEvalMetric(void) const {
    return "merror";
  }

 private:
  inline void Transform(std::vector<float> *io_preds, int prob) {
    utils::Check(nclass != 0, "must set num_class to use softmax");
    std::vector<float> &preds = *io_preds;
    std::vector<float> tmp;
    const bst_omp_uint ndata = static_cast<bst_omp_uint>(preds.size()/nclass);
    if (prob == 0) tmp.resize(ndata);
    #pragma omp parallel
    {
      std::vector<float> rec(nclass);
      #pragma omp for schedule(static)
      for (bst_omp_uint j = 0; j < ndata; ++j) {
        for (int k = 0; k < nclass; ++k) {
          rec[k] = preds[j * nclass + k];
        }
        if (prob == 0) {
          tmp[j] = static_cast<float>(FindMaxIndex(rec));
        } else {
          Softmax(&rec);
          for (int k = 0; k < nclass; ++k) {
            preds[j * nclass + k] = rec[k];
          }
        }
      }
    }
    if (prob == 0) preds = tmp;
  }
  // data field
  int nclass;
  int output_prob;
};

/*! \brief objective for lambda rank */   /// LambdaRankObj 排序目標函數
class LambdaRankObj : public IObjFunction {
 public:
  LambdaRankObj(void) {
    loss.loss_type = LossType::kLogisticRaw;
    fix_list_weight = 0.0f;
    num_pairsample = 1;
  }
  virtual ~LambdaRankObj(void) {}
  virtual void SetParam(const char *name, const char *val) {
    using namespace std;
    if (!strcmp( "loss_type", name )) loss.loss_type = atoi(val);
    if (!strcmp( "fix_list_weight", name)) fix_list_weight = static_cast<float>(atof(val));
    if (!strcmp( "num_pairsample", name)) num_pairsample = atoi(val);
  }
  virtual void GetGradient(const std::vector<float> &preds,
                           const MetaInfo &info,
                           int iter,
                           std::vector<bst_gpair> *out_gpair) {
    utils::Check(preds.size() == info.labels.size(), "label size predict size not match");
    std::vector<bst_gpair> &gpair = *out_gpair;
    gpair.resize(preds.size());
    // quick consistency when group is not available
    std::vector<unsigned> tgptr(2, 0); tgptr[1] = static_cast<unsigned>(info.labels.size());
    const std::vector<unsigned> &gptr = info.group_ptr.size() == 0 ? tgptr : info.group_ptr;
    utils::Check(gptr.size() != 0 && gptr.back() == info.labels.size(),
                 "group structure not consistent with #rows");
    const bst_omp_uint ngroup = static_cast<bst_omp_uint>(gptr.size() - 1);
    #pragma omp parallel
    {
      // parall construct, declare random number generator here, so that each
      // thread use its own random number generator, seed by thread id and current iteration
      random::Random rnd; rnd.Seed(iter* 1111 + omp_get_thread_num());
      std::vector<LambdaPair> pairs;
      std::vector<ListEntry>  lst;
      std::vector< std::pair<float, unsigned> > rec;
      #pragma omp for schedule(static)
      for (bst_omp_uint k = 0; k < ngroup; ++k) {
        lst.clear(); pairs.clear();
        for (unsigned j = gptr[k]; j < gptr[k+1]; ++j) {
          lst.push_back(ListEntry(preds[j], info.labels[j], j));
          gpair[j] = bst_gpair(0.0f, 0.0f);
        }
        std::sort(lst.begin(), lst.end(), ListEntry::CmpPred);
        rec.resize(lst.size());
        for (unsigned i = 0; i < lst.size(); ++i) {
          rec[i] = std::make_pair(lst[i].label, i);
        }
        std::sort(rec.begin(), rec.end(), CmpFirst);
        // enumerate buckets with same label, for each item in the lst, grab another sample randomly
        for (unsigned i = 0; i < rec.size(); ) {
          unsigned j = i + 1;
          while (j < rec.size() && rec[j].first == rec[i].first) ++j;
          // bucket in [i,j), get a sample outside bucket
          unsigned nleft = i, nright = static_cast<unsigned>(rec.size() - j);
          if (nleft + nright != 0) {
            int nsample = num_pairsample;
            while (nsample --) {
              for (unsigned pid = i; pid < j; ++pid) {
                unsigned ridx = static_cast<unsigned>(rnd.RandDouble() * (nleft+nright));
                if (ridx < nleft) {
                  pairs.push_back(LambdaPair(rec[ridx].second, rec[pid].second));
                } else {
                  pairs.push_back(LambdaPair(rec[pid].second, rec[ridx+j-i].second));
                }
              }
            }
          }
          i = j;
        }
        // get lambda weight for the pairs
        this->GetLambdaWeight(lst, &pairs);
        // rescale each gradient and hessian so that the lst have constant weighted
        float scale = 1.0f / num_pairsample;
        if (fix_list_weight != 0.0f) {
          scale *= fix_list_weight / (gptr[k+1] - gptr[k]);
        }
        for (size_t i = 0; i < pairs.size(); ++i) {
          const ListEntry &pos = lst[pairs[i].pos_index];
          const ListEntry &neg = lst[pairs[i].neg_index];
          const float w = pairs[i].weight * scale;
          float p = loss.PredTransform(pos.pred - neg.pred);
          float g = loss.FirstOrderGradient(p, 1.0f);
          float h = loss.SecondOrderGradient(p, 1.0f);
          // accumulate gradient and hessian in both pid, and nid
          gpair[pos.rindex].grad += g * w;
          gpair[pos.rindex].hess += 2.0f * w * h;
          gpair[neg.rindex].grad -= g * w;
          gpair[neg.rindex].hess += 2.0f * w * h;
        }
      }
    }
  }
  virtual const char* DefaultEvalMetric(void) const {
    return "map";
  }

 protected:
  /*! \brief helper information in a list */
  struct ListEntry {
    /*! \brief the predict score we in the data */
    float pred;
    /*! \brief the actual label of the entry */
    float label;
    /*! \brief row index in the data matrix */
    unsigned rindex;
    // constructor
    ListEntry(float pred, float label, unsigned rindex)
        : pred(pred), label(label), rindex(rindex) {}
    // comparator by prediction
    inline static bool CmpPred(const ListEntry &a, const ListEntry &b) {
      return a.pred > b.pred;
    }
    // comparator by label
    inline static bool CmpLabel(const ListEntry &a, const ListEntry &b) {
      return a.label > b.label;
    }
  };
  /*! \brief a pair in the lambda rank */
  struct LambdaPair {
    /*! \brief positive index: this is a position in the list */
    unsigned pos_index;
    /*! \brief negative index: this is a position in the list */
    unsigned neg_index;
    /*! \brief weight to be filled in */
    float weight;
    // constructor
    LambdaPair(unsigned pos_index, unsigned neg_index)
        : pos_index(pos_index), neg_index(neg_index), weight(1.0f) {}
  };
  /*!
   * \brief get lambda weight for existing pairs 
   * \param list a list that is sorted by pred score
   * \param io_pairs record of pairs, containing the pairs to fill in weights
   */
  virtual void GetLambdaWeight(const std::vector<ListEntry> &sorted_list,
                               std::vector<LambdaPair> *io_pairs) = 0;

 private:
  // loss function
  LossType loss;
  // number of samples peformed for each instance
  int num_pairsample;
  // fix weight of each elements in list
  float fix_list_weight;
};

class PairwiseRankObj: public LambdaRankObj{
 public:
  virtual ~PairwiseRankObj(void) {}

 protected:
  virtual void GetLambdaWeight(const std::vector<ListEntry> &sorted_list,
                               std::vector<LambdaPair> *io_pairs) {}
};

// beta version: NDCG lambda rank
class LambdaRankObjNDCG : public LambdaRankObj {
 public:
  virtual ~LambdaRankObjNDCG(void) {}

 protected:
  virtual void GetLambdaWeight(const std::vector<ListEntry> &sorted_list,
                               std::vector<LambdaPair> *io_pairs) {
    std::vector<LambdaPair> &pairs = *io_pairs;
    float IDCG;
    {
      std::vector<float> labels(sorted_list.size());
      for (size_t i = 0; i < sorted_list.size(); ++i) {
        labels[i] = sorted_list[i].label;
      }
      std::sort(labels.begin(), labels.end(), std::greater<float>());
      IDCG = CalcDCG(labels);
    }
    if (IDCG == 0.0) {
      for (size_t i = 0; i < pairs.size(); ++i) {
        pairs[i].weight = 0.0f;
      }
    } else {
      IDCG = 1.0f / IDCG;
      for (size_t i = 0; i < pairs.size(); ++i) {
        unsigned pos_idx = pairs[i].pos_index;
        unsigned neg_idx = pairs[i].neg_index;
        float pos_loginv = 1.0f / std::log(pos_idx + 2.0f);
        float neg_loginv = 1.0f / std::log(neg_idx + 2.0f);
        int pos_label = static_cast<int>(sorted_list[pos_idx].label);
        int neg_label = static_cast<int>(sorted_list[neg_idx].label);
        float original =
            ((1 << pos_label) - 1) * pos_loginv + ((1 << neg_label) - 1) * neg_loginv;
        float changed  =
            ((1 << neg_label) - 1) * pos_loginv + ((1 << pos_label) - 1) * neg_loginv;
        float delta = (original - changed) * IDCG;
        if (delta < 0.0f) delta = - delta;
        pairs[i].weight = delta;
      }
    }
  }
  inline static float CalcDCG(const std::vector<float> &labels) {
    double sumdcg = 0.0;
    for (size_t i = 0; i < labels.size(); ++i) {
      const unsigned rel = static_cast<unsigned>(labels[i]);
      if (rel != 0) {
        sumdcg += ((1 << rel) - 1) / std::log(static_cast<float>(i + 2));
      }
    }
    return static_cast<float>(sumdcg);
  }
};

// map LambdaRank
class LambdaRankObjMAP : public LambdaRankObj {
 public:
  virtual ~LambdaRankObjMAP(void) {}

 protected:
  struct MAPStats {
    /*! \brief the accumulated precision */
    float ap_acc;
    /*!
     * \brief the accumulated precision,
     *   assuming a positive instance is missing 
     */
    float ap_acc_miss;
    /*! 
     * \brief the accumulated precision,
     * assuming that one more positive instance is inserted ahead
     */
    float ap_acc_add;
    /* \brief the accumulated positive instance count */
    float hits;
    MAPStats(void) {}
    MAPStats(float ap_acc, float ap_acc_miss, float ap_acc_add, float hits)
        : ap_acc(ap_acc), ap_acc_miss(ap_acc_miss), ap_acc_add(ap_acc_add), hits(hits) {}
  };
  /*!
   * \brief Obtain the delta MAP if trying to switch the positions of instances in index1 or index2
   *        in sorted triples
   * \param sorted_list the list containing entry information
   * \param index1,index2 the instances switched
   * \param map_stats a vector containing the accumulated precisions for each position in a list
   */
  inline float GetLambdaMAP(const std::vector<ListEntry> &sorted_list,
                            int index1, int index2,
                            std::vector<MAPStats> *p_map_stats) {
    std::vector<MAPStats> &map_stats = *p_map_stats;
    if (index1 == index2 || map_stats[map_stats.size() - 1].hits == 0) {
      return 0.0f;
    }
    if (index1 > index2) std::swap(index1, index2);
    float original = map_stats[index2].ap_acc;
    if (index1 != 0) original -= map_stats[index1 - 1].ap_acc;
    float changed = 0;
    float label1 = sorted_list[index1].label > 0.0f ? 1.0f : 0.0f;
    float label2 = sorted_list[index2].label > 0.0f ? 1.0f : 0.0f;
    if (label1 == label2) {
      return 0.0;
    } else if (label1 < label2) {
      changed += map_stats[index2 - 1].ap_acc_add - map_stats[index1].ap_acc_add;
      changed += (map_stats[index1].hits + 1.0f) / (index1 + 1);
    } else {
      changed += map_stats[index2 - 1].ap_acc_miss - map_stats[index1].ap_acc_miss;
      changed += map_stats[index2].hits / (index2 + 1);
    }
    float ans = (changed - original) / (map_stats[map_stats.size() - 1].hits);
    if (ans < 0) ans = -ans;
    return ans;
  }
  /*
   * \brief obtain preprocessing results for calculating delta MAP
   * \param sorted_list the list containing entry information
   * \param map_stats a vector containing the accumulated precisions for each position in a list
   */
  inline void GetMAPStats(const std::vector<ListEntry> &sorted_list,
                          std::vector<MAPStats> *p_map_acc) {
    std::vector<MAPStats> &map_acc = *p_map_acc;
    map_acc.resize(sorted_list.size());
    float hit = 0, acc1 = 0, acc2 = 0, acc3 = 0;
    for (size_t i = 1; i <= sorted_list.size(); ++i) {
      if (sorted_list[i - 1].label > 0.0f) {
        hit++;
        acc1 += hit / i;
        acc2 += (hit - 1) / i;
        acc3 += (hit + 1) / i;
      }
      map_acc[i - 1] = MAPStats(acc1, acc2, acc3, hit);
    }
  }
  virtual void GetLambdaWeight(const std::vector<ListEntry> &sorted_list,
                               std::vector<LambdaPair> *io_pairs) {
    std::vector<LambdaPair> &pairs = *io_pairs;
    std::vector<MAPStats> map_stats;
    GetMAPStats(sorted_list, &map_stats);
    for (size_t i = 0; i < pairs.size(); ++i) {
      pairs[i].weight =
          GetLambdaMAP(sorted_list, pairs[i].pos_index,
                       pairs[i].neg_index, &map_stats);
    }
  }
};

}  // namespace learner
}  // namespace xgboost
#endif  // XGBOOST_LEARNER_OBJECTIVE_INL_HPP_

/// 關於目標函數的求解可以參看： https://www.cnblogs.com/harvey888/p/7203256.html
/// 算法原理：http://wepon.me/files/gbdt.pdf
/// 目標函數推導分析：https://blog.csdn.net/yuxeaotao/article/details/90378782
/// https://blog.csdn.net/a819825294/article/details/51206410

/// 源碼流程：https://blog.csdn.net/matrix_zzl/article/details/78699605
/// 源碼主要函數：https://blog.csdn.net/weixin_39750084/article/details/83244191

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 XGboost學習總結集成學習之Xgboost xgboost學習與總結集成學習-xgboost 機器學習——XGBoost 集成學習之Boosting —— XGBoost [ML學習筆記] XGBoost算法 XGBoost 學習調參的例子機器學習（四）--- 從gbdt到xgboost XGBoost