XGBoost--3--CPU、GPU與Multi-GPU安裝

本文轉載自查看原文 2020-10-27 15:57 635 XGBoost

本文主要介紹的是XGBoost的CPU、GPU與Multi-GPU的安裝，以下幾點說明：

linux平台、源碼編譯、支持python

補充：相比於本文，XGBoost文檔提供了更為詳細、豐富的的安裝過程，其實完全可以參考原文；那么，該文目的在哪呢，想了一下主要有兩點：

一方面是中文介紹，
另一方面就是對一些可能發生的問題進行避免，並特別說明，這一點是比較重要的；

另外，全文無圖；

1. 依賴環境

雖然XGBoost可以提供多種語言的接口，如Python、R、JVM、Ruby等等，但本文僅涉及Python；

因此，需要創建一個獨立的虛擬python環境，用於測試；

創建獨立python運行環境，並激活使用；

virtualenv -p /usr/bin/python3 env
source env/bin/activate

virtualenv用於創建多個獨立的python運行環境；
-p /usr/bin/python用於指定采用基礎python環境，不包含任何其他第三方庫；
env表示將新建的python運行環境所需文件指定到該路徑；
source env/bin/activate用於激活；

另外，不管進行哪個版本的安裝，都需要下載XGBoost的源碼；

git clone --recursive https://github.com/dmlc/xgboost  # 挺慢的

注意下載的xgboost下文件夾cub/、dmlc-core、gputreeshap是否包含文件，如果是空的需要在指定倉庫下下載；

另外，在下面的驗證代碼中，使用了scikit-learn與pandas，因此這里也一並安裝上；

pip install -i https://pypi.tuna.tsinghua.edu.cn/simple scikit-learn
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple pandas

2. XGBoost的CPU版本安裝

除了源碼編譯安裝之外，其實還有一個更為簡單的方式使用XGBoost CPU的版本，相比大家也可以想的到，那么就是使用pip安裝；

2.1 pip安裝

這里借助的是清華源鏡像：

pip install -i https://pypi.tuna.tsinghua.edu.cn/simple xgboost

安裝成功后，使用下面代碼進行驗證；

import xgboost as xgb
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

# 創建數據集
dataset, labels = make_classification(n_samples=10000, n_features=50, n_informative=3, n_classes=3)
print(dataset.shape, labels.shape)
# 拆分數據集
x_train, x_test, y_train, y_test = train_test_split(dataset, labels, test_size=0.3, random_state=7)
print("x_train: {}, x_test: {}".format(x_train.shape, x_test.shape))

# 構建DMatrix
dtrain = xgb.DMatrix(x_train, y_train)
dtest = xgb.DMatrix(x_test, y_test)

# 參數設置
params = {
        'tree_method': "hist",
        'booster': 'gbtree',
        'objective': 'multi:softmax',
        'num_class': 3,
        'max_depth': 6,
        'eval_metric': 'merror',
        'eta': 0.01,
        # 'gpu_id': cr.gpu_id
    }

# 訓練
evals = [(dtrain, 'train'), (dtest, 'val')]
model = xgb.train(params, dtrain, num_boost_round=100,
                  evals=evals)

輸出如下：

(10000, 50) (10000,)
x_train: (7000, 50), x_test: (3000, 50)
[0]     train-merror:0.12200    val-merror:0.15467
[1]     train-merror:0.12371    val-merror:0.15267
[2]     train-merror:0.12286    val-merror:0.15333
[3]     train-merror:0.12343    val-merror:0.15467
...

2.2 源碼編譯XGBoost CPU版本

2.2.1 編譯XGBoost

CPU版本最簡單的方法就是使用pip安裝，這里再一次利用源碼編譯是為了熟悉整個流程；

進入到已經下載並解壓（或者git clone）成功的xgboost文件夾，依次執行：

cd xgboost
mkdir build
cd build
cmake ..
make -j12

build
make -j12中的12表示機器的核心數，可以通過在命令行中，執行nproc進行查看；根據具體情況設置大小；

如果沒有報錯，那么就表示編譯成功；

2.2.2 配置Python Package

那么接下來進行Python Package的安裝配置；

需要注意的是，下載完成的xgboost默認是master分支，如果想要使用最新版本的python包，則不需要切換分支；如果想要安裝指定版本的python包，則需要切換分支，具體操作如下：

首先，定位到xgboost根目錄下：

git branch -a  # 查看分支
git checkout remotes/origin/release_1.2.0  # 切換指定版本的分支

其次，定位到python_packages：

cd python_packages

執行：

python setup.py install  # Install the XGBoost to your current Python environment.
python setup.py build    # Build the Python package.
python setup.py build_ext # Build only the C++ core.
python setup.py sdist     # Create a source distribution
python setup.py bdist     # Create a binary distribution
python setup.py bdist_wheel # Create a binary distribution with wheel format

成功執行后，通過pip list查看已經安裝成功；

2.2.3 驗證

使用下面例子（同上）進行驗證：

import xgboost as xgb
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

# 創建數據集
dataset, labels = make_classification(n_samples=10000, n_features=50, n_informative=3, n_classes=3)
print(dataset.shape, labels.shape)
# 拆分數據集
x_train, x_test, y_train, y_test = train_test_split(dataset, labels, test_size=0.3, random_state=7)
print("x_train: {}, x_test: {}".format(x_train.shape, x_test.shape))

# 構建DMatrix
dtrain = xgb.DMatrix(x_train, y_train)
dtest = xgb.DMatrix(x_test, y_test)

# 參數設置
params = {
        'tree_method': "hist",
        'booster': 'gbtree',
        'objective': 'multi:softmax',
        'num_class': 3,
        'max_depth': 6,
        'eval_metric': 'merror',
        'eta': 0.01,
        # 'gpu_id': cr.gpu_id
    }

# 訓練
evals = [(dtrain, 'train'), (dtest, 'val')]
model = xgb.train(params, dtrain, num_boost_round=100,
                  evals=evals)

3. XGBoost的GPU版本安裝

3.1 編譯XGBoost

與CPU版本相比，GPU版本訓練過程會更高效一些；幾倍之差吧；

配置過程與CPU版本類似：

cd xgboost
mkdir build
cd build
cmake .. -DUSE_CUDA=ON
make -j12

-DUSE_CUDA=ON表示支持CUDA加速；因此，需要注意的是在此之前需要Nvidia的顯卡及其驅動、CUDA等的成功配置；

3.2 配置Python Package

注意：如果需要安裝指定版本的XGBoost，需要切換到相應的分支下；

git branch -a
git checkout remotes/origin/release_1.2.0

然后，定位到python-packages目錄：

cd python-packages

執行（注意，這里和CPU的略有不同）：

python setup.py install --use-cuda
python setup.py build   
python setup.py build_ext
python setup.py sdist     
python setup.py bdist    
python setup.py bdist_wheel

配置成功，使用pip list查看；

3.3 驗證

配置成功，使用代碼進行驗證：

（這里就不放完整的代碼了，與CPU的驗證類似，差別之處在下面指出）

# 參數設置
params = {
        'tree_method': "gpu_hist",  # 與CPU不同
        'booster': 'gbtree',
        'objective': 'multi:softmax',
        'num_class': 3,
        'max_depth': 6,
        'eval_metric': 'merror',
        'eta': 0.01,
        'gpu_id': 0  # 與CPU不同
    }

需要將tree_method對應的值改成gpu_hist；
指定使用的GPU，gpu_id指定為0；

4. XGBoost的Multi-GPU版本安裝

XGBoost的CPU和GPU版本兩者相差不大，僅計算平台不同；相比於CPU、GPU，Multi-GPU有以下不同：

相對復雜的安裝配置過程；
不同的代碼實現；

4.1 安裝配置NCCL

安裝Multi-GPU版本的XGBoost，那么就需要多個GPU的通信，需要NCCL；

NCCL（NVIDA Collective Communications Library）目的是為了實現Multi-GPU或Multi-node之間的通信；

有兩種安裝方式，一是具有root權限的安裝，二是不具有root權限的安裝配置；

安裝方式見：Linux下NCCL源碼編譯安裝

4.2 編譯XGBoost

與CPU版本相比，GPU版本訓練過程會更高效一些；幾倍之差吧；

配置過程與CPU版本類似：

cd xgboost
mkdir build
cd build
cmake .. -DUSE_CUDA=ON -DUSE_NCCL=ON -DNCCL_ROOT=/home/chenz/software/nccl
make -j12

-DUSE_CUDA=ON表示支持CUDA加速；因此，需要注意的是在此之前需要Nvidia的顯卡及其驅動、CUDA等的成功配置；
-DUSE_NCCL=ON表示使用NCCL；
-DNCCL_ROOT表示nccl的安裝位置；

4.3 配置Python Package

注意：如果需要安裝指定版本的XGBoost，需要切換到相應的分支下；

git branch -a
git checkout remotes/origin/release_1.2.0

然后，定位到python-packages目錄：

cd python-packages

執行（注意，這里和CPU/單GPU的略有不同）：

python setup.py install --use-cuda --use-nccl
python setup.py build   
python setup.py build_ext
python setup.py sdist     
python setup.py bdist    
python setup.py bdist_wheel

配置成功，使用pip list查看；

4.4 Dask與dask-cuda的安裝

使用pip list成功看到xgboost，表明編譯成功；

XGBoost支持使用Dask進行分布式訓練，那么接下來就需要安裝兩個庫，分別是：Dask與dask-cuda；

Dask是一個基於Python的並行計算庫，能夠更容易的管理分布式的worker...

安裝Dask

pip install dask==2.21.0

我這邊安裝的是2.21.0

另外，還需要dask-cuda，可以從dask-cuda頁面下載，我下載的是0.14.1版本；

pip install dask_cuda-0.14.1-py3-none-any.whl

4.5 驗證

配置成功，使用代碼進行驗證：

XGBoost的Multi-GPU的代碼與CPU/單GPU有很大的不同，如下：

import time
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from dask_cuda import LocalCUDACluster
from dask.distributed import Client
from xgboost.dask import DaskDMatrix
import xgboost as xgb
from dask import array as da


def train():

    dataset, labels = make_classification(n_samples=10000, n_features=50, n_informative=3, n_classes=3)
    print(dataset.shape, labels.shape)

    x_train, x_test, y_train, y_test = train_test_split(dataset, labels, test_size=0.3, random_state=7)
    print("x_train: {}, x_test: {}".format(x_train.shape, x_test.shape))

    X_train_da = da.from_array(x_train, chunks=(100))
    X_test_da = da.from_array(x_test, chunks=(100))
    y_train_da = da.from_array(y_train, chunks=(100))
    y_test_da = da.from_array(y_test, chunks=(100))

    dtrain = DaskDMatrix(client, X_train_da, y_train_da)
    dtest = DaskDMatrix(client, X_test_da, y_test_da)

    params = {
        'tree_method': 'gpu_hist',
        'booster': 'gbtree',
        'objective': 'multi:softmax',
        'num_class': 3,
        'max_depth': 6,
        'eval_metric': 'merror',
        'eta': 0.01
    }

    evals = [(dtrain, 'train'), (dtest, 'val')]

    res = xgb.dask.train(client, params, dtrain, num_boost_round=100,
                           evals=evals)

    train_error = res['history']['train']['merror']
    eval_error = res['history']['val']['merror']
    for tra, eva in zip(train_error, eval_error):
        print("train: {}, eval: {}".format(tra, eva))

if __name__ == "__main__":
    with LocalCUDACluster(n_workers=1, threads_per_worker=3, CUDA_VISIBLE_DEVICES="0,1,2,3") as cluster:
        with Client(cluster) as client:
            train()

輸出結果：

task [xgboost.dask]:tcp://127.0.0.1:43551 connected to the tracker
task [xgboost.dask]:tcp://127.0.0.1:43551 got new rank 0
train: 0.091143, eval: 0.117333
train: 0.091571, eval: 0.117333
train: 0.090857, eval: 0.117333
train: 0.090714, eval: 0.117667
...

好了，XGBoost的CPU、GPU、Multi-GPU三個版本的安裝到此完成；

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 [tf] tensorflow中multi-GPU小坑記錄 NCCL(Nvidia Collective multi-GPU Communication Library) Nvidia英偉達的Multi-GPU多卡通信框架NCCL 學習；PCIe 速率調研； Xgboost GPU 加速 faiss CPU版本+GPU版本安裝 Windows環境Tensorflow-cpu/gpu安裝教程監控CPU與GPU的工具 CPU與GPU的區別 GPU、CPU的異同 anaconda下安裝xgboost xgboost在windows上的安裝